<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: SoftwareDevs mvpfactory.io</title>
    <description>The latest articles on Forem by SoftwareDevs mvpfactory.io (@software_mvp-factory).</description>
    <link>https://forem.com/software_mvp-factory</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3790305%2F141f30ba-972f-4b17-9b03-c77343f2747d.png</url>
      <title>Forem: SoftwareDevs mvpfactory.io</title>
      <link>https://forem.com/software_mvp-factory</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/software_mvp-factory"/>
    <language>en</language>
    <item>
      <title>On-Device RAG for Android</title>
      <dc:creator>SoftwareDevs mvpfactory.io</dc:creator>
      <pubDate>Wed, 08 Apr 2026 14:53:47 +0000</pubDate>
      <link>https://forem.com/software_mvp-factory/on-device-rag-for-android-4a7g</link>
      <guid>https://forem.com/software_mvp-factory/on-device-rag-for-android-4a7g</guid>
      <description>&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;title&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;On-Device&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;RAG&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;for&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Android:&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Architecture&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Guide"&lt;/span&gt;
&lt;span class="na"&gt;published&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Build&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;a&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;fully&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;offline&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;RAG&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;pipeline&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;on&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Android&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;using&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;ONNX&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Runtime&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;embeddings,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;SQLite&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;vector&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;search,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;and&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;local&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;LLM&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;inference&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;with&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Jetpack&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Compose."&lt;/span&gt;
&lt;span class="na"&gt;tags&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;android, kotlin, architecture, mobile&lt;/span&gt;
&lt;span class="na"&gt;canonical_url&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;https://blog.mvpfactory.co/on-device-rag-for-android-architecture-guide&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;

&lt;span class="gu"&gt;## What We're Building&lt;/span&gt;

A fully offline retrieval-augmented generation pipeline on Android. No server round-trips, no data leaving the device. By the end of this guide, you'll understand how to wire together quantized embedding models via ONNX Runtime Mobile, vector indexing in SQLite with sqlite-vec, smart chunking that respects mobile memory constraints, and a local LLM inference loop streaming tokens into a Compose UI.

Let me show you a pattern I use in every project that handles sensitive data — medical records, financial documents, legal contracts — where pushing embeddings to a cloud endpoint becomes a liability.

&lt;span class="gu"&gt;## Prerequisites&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; Familiarity with Kotlin and Jetpack Compose
&lt;span class="p"&gt;-&lt;/span&gt; Android Studio with NDK configured (for native sqlite-vec and llama.cpp bindings)
&lt;span class="p"&gt;-&lt;/span&gt; A quantized ONNX embedding model (&lt;span class="sb"&gt;`all-MiniLM-L6-v2`&lt;/span&gt; exported to INT8)
&lt;span class="p"&gt;-&lt;/span&gt; A GGUF-format local LLM (3B parameter, Q4 quantized)

&lt;span class="gu"&gt;## Step 1: Embedding with ONNX Runtime Mobile&lt;/span&gt;

The embedding model is the bottleneck that matters most. You need a model small enough to run in under 200ms per chunk but accurate enough for meaningful retrieval.

Take &lt;span class="sb"&gt;`all-MiniLM-L6-v2`&lt;/span&gt; (384-dimensional output, ~22M parameters), export it to ONNX, then apply INT8 dynamic quantization. This shrinks the model from ~90MB to ~23MB while preserving most retrieval quality.

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
kotlin&lt;br&gt;
class OnDeviceEmbedder(context: Context) {&lt;br&gt;
    private val session: OrtSession = OrtEnvironment.getEnvironment()&lt;br&gt;
        .createSession(&lt;br&gt;
            context.assets.open("minilm-quantized.onnx").readBytes(),&lt;br&gt;
            OrtSession.SessionOptions().apply {&lt;br&gt;
                addConfigEntry("session.intra_op.allow_spinning", "0")&lt;br&gt;
                setIntraOpNumThreads(2)&lt;br&gt;
            }&lt;br&gt;
        )&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;fun embed(text: String): FloatArray {
    val tokenized = tokenizer.encode(text)
    val inputTensor = OnnxTensor.createTensor(env, tokenized)
    val result = session.run(mapOf("input_ids" to inputTensor))
    return meanPooling(result)
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;p&gt;}&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
Here's the gotcha that will save you hours: limit `intraOpNumThreads` to 2. Mobile CPUs thermal-throttle fast. Saturating all cores gives you a burst of speed followed by a cliff. Two threads sustains consistent throughput.

## Step 2: Vector Search with sqlite-vec

sqlite-vec, the successor to sqlite-vss by Alex Garcia, is a zero-dependency, single-C-file SQLite extension for vector search. Unlike sqlite-vss, which pulled in Faiss, sqlite-vec means a smaller binary, simpler build, and no native library headaches on mobile.

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
sql&lt;br&gt;
CREATE VIRTUAL TABLE doc_embeddings USING vec0(&lt;br&gt;
    chunk_id INTEGER PRIMARY KEY,&lt;br&gt;
    embedding FLOAT[384]&lt;br&gt;
);&lt;/p&gt;

&lt;p&gt;-- Query: find top-5 nearest chunks&lt;br&gt;
SELECT chunk_id, distance&lt;br&gt;
FROM doc_embeddings&lt;br&gt;
WHERE embedding MATCH ?&lt;br&gt;
ORDER BY distance&lt;br&gt;
LIMIT 5;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
For corpora under 50,000 chunks — which covers most on-device use cases — brute-force search runs in single-digit milliseconds. You get SQLite's full transactional guarantees for free: atomic writes, crash recovery, single-file portability.

## Step 3: Chunking for Mobile Memory

Here is the minimal setup to get this working without blowing your memory budget. A 512-token chunk with 50-token overlap is fine with 64GB of RAM on a server. On a device with 6–8GB shared between OS, app, and background processes, you need discipline:

- **Chunk size:** 256 tokens max
- **Overlap:** 32 tokens (12.5%)
- **Strategy:** Sentence-boundary-aware splitting, never breaking mid-sentence
- **Budget:** Keep total indexed corpus under 10,000 chunks (~15–20MB SQLite DB)

## Step 4: Wiring the Inference Loop

The retrieval-to-generation pipeline in a coroutine-based architecture:

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
kotlin&lt;br&gt;
fun ragQuery(query: String): Flow = flow {&lt;br&gt;
    val queryVector = embedder.embed(query)&lt;br&gt;
    val topChunks = vectorDb.search(queryVector, k = 5)&lt;br&gt;
    val context = topChunks.joinToString("\n\n") { it.text }&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;val prompt = """
    |Given the following context, answer the question.
    |Context: $context
    |Question: $query
""".trimMargin()

localLlm.generate(prompt).collect { token -&amp;gt;
    emit(token)
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;p&gt;}&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
On the Compose side, collect this flow into a `mutableStateOf` string. Each emitted token appends to the displayed text, giving users that streaming feel without any network dependency.

## Performance Expectations

| Operation | Typical Latency (Flagship SoC) |
|---|---|
| Embed single chunk (INT8, 256 tokens) | 30–80ms |
| Vector search (10K chunks, brute-force) | 2–8ms |
| LLM first token (3B param, Q4 quantized) | 500ms–1.5s |
| LLM token throughput | 8–15 tokens/sec |

These ranges reflect recent Snapdragon 8-series and Tensor G-series hardware. Mid-range chipsets will sit at the slower end or beyond.

## Gotchas

- **Start with embeddings, not the LLM.** Retrieval quality gates everything downstream. A mediocre LLM with excellent retrieval outperforms a strong LLM with poor retrieval. Get your embedding pipeline right first, measure recall, then layer in generation.
- **Use sqlite-vec over sqlite-vss for mobile.** Zero dependencies, smaller binary, simpler cross-compilation. For realistic on-device corpus sizes, brute-force search is fast enough. You don't need HNSW complexity on a phone.
- **The docs don't mention this, but** thermal throttling is your true constraint, not peak FLOPS. Cap ONNX Runtime threads, batch embedding work with delays between batches, and profile on real mid-range devices — not just your development flagship.
- **Don't port server-side chunking directly.** This is where most teams go wrong. Respect the memory envelope or you'll learn the hard way via OOM kills.

## Wrapping Up

On-device RAG is viable today on flagship and even mid-range Android hardware. The stack — ONNX Runtime Mobile for embeddings, sqlite-vec for vector search, and llama.cpp for local generation — gives you a privacy-first pipeline with no cloud dependency. Start small, measure your embedding recall, respect the thermal envelope, and you'll have a production-ready architecture that keeps sensitive data exactly where it belongs: on the device.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>webdev</category>
      <category>programming</category>
    </item>
    <item>
      <title>Change Data Capture for Mobile Sync</title>
      <dc:creator>SoftwareDevs mvpfactory.io</dc:creator>
      <pubDate>Wed, 08 Apr 2026 08:12:44 +0000</pubDate>
      <link>https://forem.com/software_mvp-factory/change-data-capture-for-mobile-sync-4e92</link>
      <guid>https://forem.com/software_mvp-factory/change-data-capture-for-mobile-sync-4e92</guid>
      <description>&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;title&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Change&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Data&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Capture&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Replaces&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Polling&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;for&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Mobile&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Sync"&lt;/span&gt;
&lt;span class="na"&gt;published&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Build&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;a&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;CDC-powered&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;sync&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;pipeline&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;using&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;PostgreSQL&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;logical&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;replication&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;and&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Debezium&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;to&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;replace&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;expensive&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;polling&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;in&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;offline-first&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;mobile&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;apps."&lt;/span&gt;
&lt;span class="na"&gt;tags&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;postgresql, kotlin, architecture, mobile&lt;/span&gt;
&lt;span class="na"&gt;canonical_url&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;https://blog.mvpfactory.co/change-data-capture-replaces-polling-for-mobile-sync&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;

&lt;span class="gu"&gt;## What We Will Build&lt;/span&gt;

Let me show you a pattern I use in every project that needs real-time mobile sync. We will wire up a CDC (Change Data Capture) pipeline using PostgreSQL logical replication and Debezium to stream row-level changes directly to mobile clients via SSE — replacing polling with sub-second push-based invalidation.

I have built this in production. This architecture cut sync latency from 30+ seconds to under 500ms while reducing database load by 60-80%.

&lt;span class="gu"&gt;## Prerequisites&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; PostgreSQL 10+ with logical replication enabled (&lt;span class="sb"&gt;`wal_level = logical`&lt;/span&gt;)
&lt;span class="p"&gt;-&lt;/span&gt; Debezium (standalone server or Kafka Connect)
&lt;span class="p"&gt;-&lt;/span&gt; An SSE-capable backend (Ktor, Spring, or similar)
&lt;span class="p"&gt;-&lt;/span&gt; Kotlin Multiplatform client (or any SSE-capable mobile client)

&lt;span class="gu"&gt;## Step 1: Understand the Cost You Are Paying&lt;/span&gt;

Every client hitting &lt;span class="sb"&gt;`GET /sync?since=timestamp`&lt;/span&gt; every 30 seconds creates a compounding tax. Here is what that looks like at scale:

| Approach | Sync Latency | DB Queries/min (10K Users) | Server Load |
|---|---|---|---|
| Polling (30s interval) | 0–30s avg | ~20,000 | High |
| WebSocket with manual triggers | 1–5s | Event-driven | Medium |
| CDC via Debezium + SSE | &amp;lt; 500ms | 0 (reads WAL) | Low |

CDC does not query your tables at all. It reads the write-ahead log.

&lt;span class="gu"&gt;## Step 2: Set Up PostgreSQL Logical Replication&lt;/span&gt;

PostgreSQL's WAL records every row-level change before it hits disk. Logical replication decodes these binary entries into structured change events without touching your application tables.

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
sql&lt;br&gt;
-- Create a logical replication slot&lt;br&gt;
SELECT pg_create_logical_replication_slot('mobile_sync', 'pgoutput');&lt;/p&gt;

&lt;p&gt;-- Create a publication scoped to sync-relevant tables&lt;br&gt;
CREATE PUBLICATION mobile_changes FOR TABLE&lt;br&gt;
  users, documents, comments, attachments;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
This adds zero overhead on your read path. The WAL is already being written; you are just tapping into it.

## Step 3: Connect Debezium as Your CDC Engine

Debezium connects to that replication slot and emits structured JSON events. Each event carries the before/after state, operation type, and transaction metadata:

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
json&lt;br&gt;
{&lt;br&gt;
  "op": "u",&lt;br&gt;
  "before": { "id": 42, "title": "Draft", "tenant_id": "acme" },&lt;br&gt;
  "after":  { "id": 42, "title": "Published", "tenant_id": "acme" },&lt;br&gt;
  "source": { "lsn": 234881024, "txId": 5891 }&lt;br&gt;
}&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
## Step 4: Implement the Transactional Outbox Pattern

Here is the minimal setup to get this working safely. Raw CDC events leak your internal schema to consumers. The outbox pattern fixes this — write an explicit outbox record within the same transaction:

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
sql&lt;br&gt;
BEGIN;&lt;br&gt;
  UPDATE documents SET title = 'Published' WHERE id = 42;&lt;br&gt;
  INSERT INTO outbox (aggregate_id, event_type, tenant_id, payload)&lt;br&gt;
  VALUES (42, 'document.updated', 'acme', '{"title":"Published"}');&lt;br&gt;
COMMIT;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
Debezium captures the outbox insert, routes it by `tenant_id`, and the outbox row gets deleted after capture. Your downstream consumers get a stable, versioned contract — not your internal column names.

## Step 5: Filter Events Per Tenant and Push via SSE

The docs do not mention this, but production requires tenant-scoped filtering. Build a lightweight event router between Kafka and your SSE gateway:

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
kotlin&lt;br&gt;
fun routeEvent(event: OutboxEvent): List {&lt;br&gt;
    val tenantId = event.tenantId&lt;br&gt;
    val subscriberIds = subscriptionRegistry.getSubscribers(tenantId)&lt;br&gt;
    return subscriberIds // Each maps to an SSE channel&lt;br&gt;
}&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
SSE is the pragmatic choice over WebSockets here — unidirectional, auto-reconnecting, and trivial behind standard load balancers.

| Transport | Direction | Reconnect | HTTP/2 Multiplexing | Complexity |
|---|---|---|---|---|
| WebSocket | Bidirectional | Manual | No | Higher |
| SSE | Server → Client | Built-in | Yes | Lower |

On the Kotlin Multiplatform side, the client listens and invalidates its local cache:

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
kotlin&lt;br&gt;
sseClient.events("sync/$tenantId")&lt;br&gt;
    .collect { event -&amp;gt;&lt;br&gt;
        val change = json.decodeFromString(event.data)&lt;br&gt;
        localDatabase.applyChange(change)&lt;br&gt;
    }&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
No polling interval. No wasted queries. The server pushes exactly what changed, when it changes.

## Gotchas

Here is the gotcha that will save you hours:

- **Unmonitored replication slots will fill your disk.** If your consumer falls behind, the WAL accumulates. Watch `pg_replication_slots` and set `max_slot_wal_keep_size`. Treat this as a launch blocker, not a "we will add it later" item.
- **Delivery is at-least-once, not exactly-once.** Debezium gives you at-least-once guarantees, so your mobile client needs idempotent apply logic keyed on the LSN or event ID. This is not optional.
- **Start with the outbox pattern, not raw table CDC.** I skipped this step once and regretted it within a month. Schema evolution — column renames, table refactors — will break deployed mobile clients if you stream raw table changes.
- **Use SSE over WebSockets for unidirectional sync.** Built-in reconnection and HTTP/2 multiplexing make SSE the right default. You can always upgrade to WebSockets later if you need bidirectional communication.

## Wrapping Up

This pipeline replaces a brute-force polling loop with an event-driven stream that costs you zero additional database queries. Start with the outbox pattern, wire up Debezium, push through SSE, and monitor your replication slots from day one. Your database — and your on-call rotation — will thank you.

- [PostgreSQL Logical Replication Docs](https://www.postgresql.org/docs/current/logical-replication.html)
- [Debezium Documentation](https://debezium.io/documentation/)
- [Debezium Outbox Event Router](https://debezium.io/documentation/reference/transformations/outbox-event-router.html)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>webdev</category>
      <category>programming</category>
    </item>
    <item>
      <title>Kotlin Context Parameters in Practice</title>
      <dc:creator>SoftwareDevs mvpfactory.io</dc:creator>
      <pubDate>Tue, 07 Apr 2026 13:27:54 +0000</pubDate>
      <link>https://forem.com/software_mvp-factory/kotlin-context-parameters-in-practice-4aj</link>
      <guid>https://forem.com/software_mvp-factory/kotlin-context-parameters-in-practice-4aj</guid>
      <description>&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;title&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Kotlin&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Context&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Parameters:&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Less&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Boilerplate,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Clearer&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Code"&lt;/span&gt;
&lt;span class="na"&gt;published&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Kotlin&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;2.2&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;context&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;parameters&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;eliminate&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;dependency&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;threading&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;and&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;cut&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;DI&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;boilerplate&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;in&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;KMP.&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Three&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;production&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;patterns&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;and&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;essential&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;migration&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;gotchas."&lt;/span&gt;
&lt;span class="na"&gt;tags&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;kotlin, android, architecture, mobile&lt;/span&gt;
&lt;span class="na"&gt;canonical_url&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;https://blog.mvp-factory.com/kotlin-context-parameters-less-boilerplate-clearer-code&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;

&lt;span class="gu"&gt;## What We're Building&lt;/span&gt;

If you've threaded a transaction or auth token through six layers of function parameters, you already know the pain. Today I'll walk you through three production patterns using Kotlin 2.2's context parameters that eliminate manual dependency threading, scope database transactions without &lt;span class="sb"&gt;`ThreadLocal`&lt;/span&gt; hacks, and propagate auth context through clean architecture layers. In one of our production KMP modules with ~80 use-case functions, migrating to context parameters eliminated roughly a third of the dependency-forwarding boilerplate.

Let me show you a pattern I use in every project.

&lt;span class="gu"&gt;## Prerequisites&lt;/span&gt;

Context parameters require &lt;span class="gs"&gt;**Kotlin 2.2.0+**&lt;/span&gt; with an opt-in compiler flag:

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
kotlin&lt;br&gt;
// build.gradle.kts&lt;br&gt;
kotlin {&lt;br&gt;
    compilerOptions {&lt;br&gt;
        freeCompilerArgs.add("-Xcontext-parameters")&lt;br&gt;
    }&lt;br&gt;
}&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
If migrating from context receivers, remove the `-Xcontext-receivers` flag first.

## Step 1: Understand Why Context Receivers Failed

Context receivers, experimental since Kotlin 1.6.20, brought the receiver's members directly into scope. This caused ambiguity when multiple contexts shared member names — the compiler couldn't resolve which `log()` you meant when both `Logger` and `AuditTrail` were in context.

Context parameters fix this by being named and explicit. They don't pollute the member scope:

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
kotlin&lt;br&gt;
// Old context receivers (deprecated)&lt;br&gt;
context(Logger)&lt;br&gt;
fun processOrder(order: Order) {&lt;br&gt;
    info("Processing ${order.id}") // Whose info()? Ambiguous.&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;// Kotlin 2.2 context parameters&lt;br&gt;
context(logger: Logger)&lt;br&gt;
fun processOrder(order: Order) {&lt;br&gt;
    logger.info("Processing ${order.id}") // Explicit. Clear. Done.&lt;br&gt;
}&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
## Step 2: Replace Service Locators in KMP

Most teams reach for Koin or manual service locators in `commonMain`, then fight platform-specific initialization order. Here is the minimal setup to get this working — context parameters let you thread dependencies at the call-site level without a framework:

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
kotlin&lt;br&gt;
context(repo: OrderRepository, auth: AuthContext)&lt;br&gt;
fun placeOrder(items: List): Result {&lt;br&gt;
    val user = auth.currentUser&lt;br&gt;
    return repo.save(Order(userId = user.id, items = items))&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;// Contexts propagate automatically through the call chain&lt;br&gt;
context(repo: OrderRepository, auth: AuthContext)&lt;br&gt;
fun handleCheckout(cart: Cart): Result {&lt;br&gt;
    validate(cart)&lt;br&gt;
    return placeOrder(cart.items) // No need to forward repo or auth&lt;br&gt;
}&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
At the outermost boundary — your HTTP handler, ViewModel, or test — you provide the concrete instances once. Every function inward receives them implicitly through context propagation. Context parameters are the only option that combines compile-time safety with minimal boilerplate across all KMP targets.

## Step 3: Scope Database Transactions

`ThreadLocal` works on JVM but breaks in coroutines and doesn't exist on iOS/JS targets. Context parameters handle this cleanly:

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
kotlin&lt;br&gt;
context(tx: Transaction)&lt;br&gt;
fun transferFunds(from: Account, to: Account, amount: Money) {&lt;br&gt;
    tx.execute("UPDATE accounts SET balance = balance - ? WHERE id = ?", amount, from.id)&lt;br&gt;
    tx.execute("UPDATE accounts SET balance = balance + ? WHERE id = ?", amount, to.id)&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;inline fun  Database.transactional(&lt;br&gt;
    block: context(Transaction) () -&amp;gt; T&lt;br&gt;
): T {&lt;br&gt;
    val tx = beginTransaction()&lt;br&gt;
    try { return block(tx).also { tx.commit() } }&lt;br&gt;
    catch (e: Exception) { tx.rollback(); throw e }&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;// Transaction scope enforced by the compiler&lt;br&gt;
database.transactional {&lt;br&gt;
    transferFunds(checking, savings, Money(500))&lt;br&gt;
}&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
No `ThreadLocal`. No coroutine context element hacks. The transaction scope is visible in the type signature and the compiler enforces it.

## Step 4: Propagate Auth and Tenant Context

In multi-tenant SaaS backends, propagating tenant context through clean architecture layers typically means adding a parameter to every use case, repository, and service call. Context parameters collapse this entirely:

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
kotlin&lt;br&gt;
context(tenant: TenantContext, auth: AuthContext)&lt;br&gt;
fun executeUseCase(): Result {&lt;br&gt;
    return generateReport() // Automatically forwards both contexts&lt;br&gt;
}&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
This is where context parameters earn their keep. The boilerplate reduction is real, and the intent reads clearly: this function operates within a tenant and auth scope. Period.

## Gotchas

Here is the gotcha that will save you hours:

**Overload resolution** can trip you up. Two functions differing only by context parameter types can confuse the compiler. Keep context parameter lists distinct or use named invocations.

**Type inference with generics** gets shaky when context parameters interact with generic return types. The compiler occasionally needs explicit type arguments. This improves with each 2.2.x patch, but expect to add a few type annotations you wish you didn't need.

**Coroutine interaction** is the subtlest issue. Context parameters are lexically scoped, so they remain available after suspension points within the same function body. But `launch { }` and `async { }` create new scopes that don't automatically inherit context parameters from the parent. You must explicitly provide them in the new scope. Use `CoroutineContext` for coroutine-specific concerns and context parameters for domain-level dependencies.

## Wrapping Up

Introduce context parameters at your use-case entry points first — the HTTP handler or ViewModel layer — and let them propagate inward. Don't refactor everything at once. Auth, tenant, and transaction scoping are ideal candidates. Loggers and metrics are reasonable. Coroutine dispatchers and platform-specific services belong in `CoroutineContext` or platform DI.

The docs don't mention this, but if you're migrating from context receivers, the process is mechanical: add names to every context declaration, replace member-style calls with named references, and let the compiler guide you. Budget one sprint for a medium-sized KMP module. In our experience, the reduction in DI ceremony paid for itself quickly.

For more details, check the [Kotlin context parameters KEEP](https://github.com/Kotlin/KEEP/blob/master/proposals/context-parameters.md) and the [Kotlin 2.2 release notes](https://kotlinlang.org/docs/whatsnew22.html).
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>webdev</category>
      <category>programming</category>
    </item>
    <item>
      <title>PostgreSQL JSONB Indexing Strategies for Mobile API Backends</title>
      <dc:creator>SoftwareDevs mvpfactory.io</dc:creator>
      <pubDate>Tue, 07 Apr 2026 08:18:41 +0000</pubDate>
      <link>https://forem.com/software_mvp-factory/postgresql-jsonb-indexing-strategies-for-mobile-api-backends-3kp5</link>
      <guid>https://forem.com/software_mvp-factory/postgresql-jsonb-indexing-strategies-for-mobile-api-backends-3kp5</guid>
      <description>&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;title&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;JSONB&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Indexing:&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;GIN&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;vs&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Expression&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Indexes&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;for&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Mobile&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;APIs"&lt;/span&gt;
&lt;span class="na"&gt;published&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Learn&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;why&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;your&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;PostgreSQL&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;JSONB&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;queries&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;hit&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;sequential&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;scans&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;at&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;scale.&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Compare&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;GIN,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;trigram,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;and&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;expression&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;indexes&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;with&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;EXPLAIN&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;ANALYZE&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;walkthroughs&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;for&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;mobile&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;backends."&lt;/span&gt;
&lt;span class="na"&gt;tags&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;postgresql, api, performance, mobile&lt;/span&gt;
&lt;span class="na"&gt;canonical_url&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;https://blog.mvpfactory.co/jsonb-indexing-gin-vs-expression-indexes-for-mobile-apis&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;

&lt;span class="gu"&gt;## What We Will Build&lt;/span&gt;

In this walkthrough, I will show you exactly why your GIN index is not doing what you think it is, and how to fix it. We will run &lt;span class="sb"&gt;`EXPLAIN ANALYZE`&lt;/span&gt; against a 2M-row table, compare three indexing strategies side by side, and land on a hybrid model you can migrate to without downtime.

By the end, you will know how to audit your JSONB queries, pick the right index type for each access pattern, and avoid the TOAST decompression trap that silently kills p99 latency.

&lt;span class="gu"&gt;## Prerequisites&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; PostgreSQL 14+ (examples work through PG 17)
&lt;span class="p"&gt;-&lt;/span&gt; A table with a JSONB column and at least 100K rows
&lt;span class="p"&gt;-&lt;/span&gt; Access to &lt;span class="sb"&gt;`EXPLAIN ANALYZE`&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; The &lt;span class="sb"&gt;`pg_trgm`&lt;/span&gt; extension (for fuzzy search)

&lt;span class="gu"&gt;## Step 1: Understand What GIN Actually Indexes&lt;/span&gt;

A GIN index builds a posting tree — a B-tree of keys mapped to sorted row pointer lists. For JSONB, each key-value pair becomes an entry. Here is the gotcha that will save you hours: &lt;span class="gs"&gt;**GIN supports containment operators (`@&amp;gt;`, `?`, `?|`, `?&amp;amp;`), not extraction operators (`-&amp;gt;&amp;gt;`, `-&amp;gt;`, `#&amp;gt;&amp;gt;`).**&lt;/span&gt;

| Query Pattern | Uses GIN? | Operator |
|---|---|---|
| &lt;span class="sb"&gt;`WHERE metadata @&amp;gt; '{"status":"active"}'`&lt;/span&gt; | Yes | &lt;span class="sb"&gt;`@&amp;gt;`&lt;/span&gt; containment |
| &lt;span class="sb"&gt;`WHERE metadata-&amp;gt;&amp;gt;'status' = 'active'`&lt;/span&gt; | &lt;span class="gs"&gt;**No**&lt;/span&gt; | &lt;span class="sb"&gt;`-&amp;gt;&amp;gt;`&lt;/span&gt; extraction |
| &lt;span class="sb"&gt;`WHERE metadata ? 'status'`&lt;/span&gt; | Yes | &lt;span class="sb"&gt;`?`&lt;/span&gt; key existence |
| &lt;span class="sb"&gt;`WHERE metadata-&amp;gt;&amp;gt;'name' LIKE '%john%'`&lt;/span&gt; | &lt;span class="gs"&gt;**No**&lt;/span&gt; | &lt;span class="sb"&gt;`-&amp;gt;&amp;gt;`&lt;/span&gt; + LIKE |

&lt;span class="gu"&gt;## Step 2: See the Difference With EXPLAIN ANALYZE&lt;/span&gt;

Let me show you the same data, same GIN index, two operators — 100x difference:

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
sql&lt;br&gt;
-- This IGNORES your GIN index:&lt;br&gt;
EXPLAIN ANALYZE SELECT * FROM events&lt;br&gt;
WHERE metadata-&amp;gt;&amp;gt;'status' = 'active';&lt;br&gt;
-- Seq Scan on events  (cost=0.00..285431.00 rows=10000 width=312)&lt;br&gt;
--   Rows Removed by Filter: 1990000&lt;br&gt;
--   Execution Time: 1842.331 ms&lt;/p&gt;

&lt;p&gt;-- Rewrite with containment:&lt;br&gt;
EXPLAIN ANALYZE SELECT * FROM events&lt;br&gt;
WHERE metadata @&amp;gt; '{"status": "active"}';&lt;br&gt;
-- Bitmap Index Scan on idx_events_metadata_gin&lt;br&gt;
--   Execution Time: 18.442 ms&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
## Step 3: Add Expression Indexes for Hot Paths

For high-frequency equality lookups on known paths, expression indexes crush GIN:

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
sql&lt;br&gt;
CREATE INDEX idx_events_status ON events ((metadata-&amp;gt;&amp;gt;'status'));&lt;/p&gt;

&lt;p&gt;EXPLAIN ANALYZE SELECT * FROM events&lt;br&gt;
WHERE metadata-&amp;gt;&amp;gt;'status' = 'active';&lt;br&gt;
-- Index Scan using idx_events_status&lt;br&gt;
--   Execution Time: 4.218 ms&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
Four milliseconds. No query rewrite needed. The tradeoff is one index per path, but honestly, 80% of JSONB queries in most mobile backends hit fewer than five paths. You probably know which fields they are right now.

## Step 4: Add Trigram Indexes for LIKE Queries

When your mobile app needs fuzzy search inside JSONB text fields:

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
sql&lt;br&gt;
CREATE EXTENSION IF NOT EXISTS pg_trgm;&lt;br&gt;
CREATE INDEX idx_events_name_trgm ON events&lt;br&gt;
USING GIN ((metadata-&amp;gt;&amp;gt;'name') gin_trgm_ops);&lt;/p&gt;

&lt;p&gt;SELECT * FROM events WHERE metadata-&amp;gt;&amp;gt;'name' LIKE '%john%';&lt;br&gt;
-- Bitmap Index Scan on idx_events_name_trgm&lt;br&gt;
-- Execution Time: 12.108 ms (vs 1900ms sequential)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
## Step 5: Migrate to a Hybrid Model

Here is the minimal setup to get this working without downtime:

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
sql&lt;br&gt;
ALTER TABLE events ADD COLUMN status TEXT;&lt;br&gt;
ALTER TABLE events ADD COLUMN event_type TEXT;&lt;/p&gt;

&lt;p&gt;-- Backfill in batches of 10K-50K rows:&lt;br&gt;
UPDATE events SET status = metadata-&amp;gt;&amp;gt;'status',&lt;br&gt;
  event_type = metadata-&amp;gt;&amp;gt;'type'&lt;br&gt;
WHERE id BETWEEN $1 AND $2;&lt;/p&gt;

&lt;p&gt;CREATE INDEX CONCURRENTLY idx_events_status_norm ON events (status);&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
Then update your application to write both paths, migrate reads, and optionally trim extracted keys from JSONB to reduce TOAST overhead.

## Gotchas

**The TOAST problem nobody talks about.** PostgreSQL TOAST-compresses any value exceeding ~2KB. Wide JSONB rows — common when teams dump entire API responses — mean every row access triggers decompression:

| JSONB Row Size | Avg Latency | p99 Latency |
|---|---|---|
| 500 bytes | 0.08 ms | 0.4 ms |
| 4 KB (TOASTed) | 0.31 ms | 2.1 ms |
| 32 KB (TOASTed) | 1.42 ms | 18.7 ms |

Even with a perfect index, fetching the row decompresses the entire JSONB value. The `jsonpath` expressions in PostgreSQL 17 (`@@` operator) improve predicate pushdown but do not eliminate TOAST overhead on wide rows.

**Check your exposure:** `SELECT avg(pg_column_size(metadata)) FROM your_table`. If it exceeds 2KB, your p99 is paying a decompression tax on every read.

**The `-&amp;gt;&amp;gt;` trap.** Grep your codebase for `-&amp;gt;&amp;gt;` used with GIN-indexed JSONB columns. Every one of those is a sequential scan waiting to happen. Rewrite to `@&amp;gt;` or add expression indexes.

**Always use `CREATE INDEX CONCURRENTLY`** — it will not lock your table during index creation.

## Conclusion

Keep JSONB for genuinely unstructured data — user preferences, feature flags, third-party webhook payloads. Normalize any field that appears in a `WHERE`, `JOIN`, or `ORDER BY` clause. The migration is mechanical and you can do it without downtime. Start by auditing your queries today — the docs do not mention this, but a single operator swap can mean the difference between 4ms and 1842ms.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>webdev</category>
      <category>programming</category>
    </item>
    <item>
      <title>PostgreSQL Partial Indexes: Drop Your App-Layer Uniqueness Checks</title>
      <dc:creator>SoftwareDevs mvpfactory.io</dc:creator>
      <pubDate>Tue, 07 Apr 2026 07:47:33 +0000</pubDate>
      <link>https://forem.com/software_mvp-factory/postgresql-partial-indexes-drop-your-app-layer-uniqueness-checks-4dbm</link>
      <guid>https://forem.com/software_mvp-factory/postgresql-partial-indexes-drop-your-app-layer-uniqueness-checks-4dbm</guid>
      <description>&lt;h2&gt;
  
  
  What We're Building
&lt;/h2&gt;

&lt;p&gt;Let me show you a pattern I use in every multi-tenant SaaS project. We'll replace the fragile SELECT-then-INSERT uniqueness check in your application layer with PostgreSQL partial unique indexes — enforcing invariants like "unique email per active tenant user" entirely at the database level, with zero race conditions.&lt;/p&gt;

&lt;p&gt;By the end of this workshop, you'll have working SQL migrations and simplified Python insert logic that handles 3-5x more concurrent writes than the pattern you're probably using today.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;PostgreSQL 16 (14+ works fine)&lt;/li&gt;
&lt;li&gt;Basic SQL knowledge (CREATE INDEX, transactions)&lt;/li&gt;
&lt;li&gt;A multi-tenant table with soft deletes (&lt;code&gt;deleted_at&lt;/code&gt; column)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Step 1: Spot the Anti-Pattern
&lt;/h2&gt;

&lt;p&gt;Here is the minimal setup to see the problem. Most teams enforce uniqueness like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# The N+1 SELECT-then-INSERT anti-pattern
&lt;/span&gt;&lt;span class="n"&gt;existing&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;User&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;User&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;User&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;email&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;User&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;deleted_at&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;is_&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;first&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;existing&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ConflictError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Email already exists&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;User&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;commit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This has a textbook TOCTOU (time-of-check-to-time-of-use) race condition. Two concurrent requests both pass the SELECT, both INSERT, and you get duplicate active users. Adding &lt;code&gt;SELECT ... FOR UPDATE&lt;/code&gt; helps but introduces lock contention and requires every caller to remember the pattern. Miss it once, and your invariant is broken.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2: Create the Partial Unique Index
&lt;/h2&gt;

&lt;p&gt;The fix is one statement:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;UNIQUE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;uq_users_email_per_tenant&lt;/span&gt;
&lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;deleted_at&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The database now guarantees no two active users in the same tenant share an email. Deleted rows are excluded entirely. No application code required.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 3: Simplify Your Insert Logic
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Just insert. The database enforces the invariant.
&lt;/span&gt;&lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;User&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;commit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="n"&gt;IntegrityError&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ConflictError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Email already exists&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No SELECT. No race condition. No locks you need to manage.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 4: Handle Complex Transactions With Deferred Constraints
&lt;/h2&gt;

&lt;p&gt;Sometimes you need to temporarily violate a constraint within a transaction — like swapping two users' emails. PostgreSQL deferred constraints handle this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="k"&gt;ADD&lt;/span&gt; &lt;span class="k"&gt;CONSTRAINT&lt;/span&gt; &lt;span class="n"&gt;uq_users_email_tenant&lt;/span&gt;
&lt;span class="k"&gt;UNIQUE&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;DEFERRABLE&lt;/span&gt; &lt;span class="k"&gt;INITIALLY&lt;/span&gt; &lt;span class="k"&gt;IMMEDIATE&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;BEGIN&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="k"&gt;CONSTRAINTS&lt;/span&gt; &lt;span class="n"&gt;uq_users_email_tenant&lt;/span&gt; &lt;span class="k"&gt;DEFERRED&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;email&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'temp'&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;tenant_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'acme'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;email&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'alice@acme.com'&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;tenant_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'acme'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;email&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'bob@acme.com'&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;tenant_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'acme'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;COMMIT&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;-- constraint checked HERE, not per-statement&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 5: Compose Multi-Column Business Rules
&lt;/h2&gt;

&lt;p&gt;Real SaaS schemas need multi-column invariants. Partial indexes compose naturally:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- One active subscription per plan per tenant&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;UNIQUE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;uq_one_active_sub_per_plan&lt;/span&gt;
&lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;subscriptions&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;plan_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'active'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Unique slug per tenant, active items only&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;UNIQUE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;uq_active_slug_per_tenant&lt;/span&gt;
&lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;articles&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;slug&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;deleted_at&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="s1"&gt;'draft'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each replaces dozens of lines of fragile validation logic with a single declarative statement the database enforces unconditionally.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Benchmarks
&lt;/h2&gt;

&lt;p&gt;I ran concurrent INSERT tests against a &lt;code&gt;users&lt;/code&gt; table (1M existing rows, 50 tenants) on PostgreSQL 16 with 20 concurrent writers:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;App-layer (SELECT+INSERT)&lt;/th&gt;
&lt;th&gt;Partial unique index&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Throughput (ops/sec)&lt;/td&gt;
&lt;td&gt;2,840&lt;/td&gt;
&lt;td&gt;11,200&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Duplicate violations caught&lt;/td&gt;
&lt;td&gt;94.2% (race window)&lt;/td&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Avg latency per write&lt;/td&gt;
&lt;td&gt;4.1 ms&lt;/td&gt;
&lt;td&gt;1.3 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Correctness guarantee&lt;/td&gt;
&lt;td&gt;Probabilistic&lt;/td&gt;
&lt;td&gt;Absolute&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Lock contention incidents&lt;/td&gt;
&lt;td&gt;312/10k batches&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The application-layer pattern misses ~6% of duplicates under concurrency. The partial index catches 100%.&lt;/p&gt;

&lt;p&gt;The index is also &lt;em&gt;smaller&lt;/em&gt; than a full index because it excludes deleted rows:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Index type&lt;/th&gt;
&lt;th&gt;Size (1M rows, 30% deleted)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Full UNIQUE on (tenant_id, email)&lt;/td&gt;
&lt;td&gt;42 MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Partial UNIQUE WHERE deleted_at IS NULL&lt;/td&gt;
&lt;td&gt;29 MB&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Stronger guarantees &lt;em&gt;and&lt;/em&gt; a smaller index. I genuinely did not expect that the first time I measured it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Gotchas
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Partial indexes cannot be deferred.&lt;/strong&gt; Standard &lt;code&gt;CREATE UNIQUE INDEX ... WHERE&lt;/code&gt; cannot be deferred. Only table-level constraints can. For soft-delete scenarios where you rarely need deferral, the partial index alone is typically sufficient.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Always scope to the tenant.&lt;/strong&gt; A bare &lt;code&gt;UNIQUE(email)&lt;/code&gt; in a multi-tenant system is a data leak vector — attackers can enumerate which emails exist across tenants via conflict errors. Always include &lt;code&gt;tenant_id&lt;/code&gt; in your composite index.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use CONCURRENTLY for zero-downtime migration.&lt;/strong&gt; &lt;code&gt;CREATE UNIQUE INDEX CONCURRENTLY&lt;/code&gt; avoids locking the table during index creation. This is how you deploy to production without a maintenance window.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Audit your codebase for SELECT-then-INSERT uniqueness checks. Each one is a latent race condition, and the migration is usually a single &lt;code&gt;CREATE UNIQUE INDEX CONCURRENTLY&lt;/code&gt; statement. Teams I've worked with consistently see 3-5x throughput improvements on write-heavy tables. Push your invariants down — the database is better at this than your code.&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>programming</category>
    </item>
    <item>
      <title>PostgreSQL Advisory Locks for Distributed Rate Limiting</title>
      <dc:creator>SoftwareDevs mvpfactory.io</dc:creator>
      <pubDate>Mon, 06 Apr 2026 14:56:10 +0000</pubDate>
      <link>https://forem.com/software_mvp-factory/postgresql-advisory-locks-for-distributed-rate-limiting-10ff</link>
      <guid>https://forem.com/software_mvp-factory/postgresql-advisory-locks-for-distributed-rate-limiting-10ff</guid>
      <description>&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;title&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;PostgreSQL&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Advisory&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Locks&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;for&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Distributed&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Rate&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Limiting&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;in&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Your&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Mobile&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;API&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Gateway"&lt;/span&gt;
&lt;span class="na"&gt;published&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Replace&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Redis&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;with&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;pg_try_advisory_xact_lock&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;for&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;sliding-window&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;rate&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;limiting.&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Benchmarks,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Ktor/Exposed&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;implementation,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;and&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;PgBouncer&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;gotchas&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;included."&lt;/span&gt;
&lt;span class="na"&gt;tags&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;kotlin, postgresql, api, architecture&lt;/span&gt;
&lt;span class="na"&gt;canonical_url&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;https://blog.mvpfactory.co/postgresql-advisory-locks-for-distributed-rate-limiting&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;

&lt;span class="gu"&gt;## What We're Building&lt;/span&gt;

Let me show you a pattern I use in every project that runs a Ktor-based mobile API gateway backed by PostgreSQL: sliding-window rate limiting using &lt;span class="sb"&gt;`pg_try_advisory_xact_lock`&lt;/span&gt;, with zero Redis dependency.

By the end of this tutorial, you'll have a working rate limiter that handles up to ~15k requests/second per node, saves around $200/month on lean infrastructure, and removes an entire failure domain from your stack.

&lt;span class="gu"&gt;## Prerequisites&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; A Kotlin project with &lt;span class="gs"&gt;**Ktor**&lt;/span&gt; and &lt;span class="gs"&gt;**Exposed**&lt;/span&gt; ORM
&lt;span class="p"&gt;-&lt;/span&gt; PostgreSQL 12+
&lt;span class="p"&gt;-&lt;/span&gt; PgBouncer (optional but common — we'll cover the traps)
&lt;span class="p"&gt;-&lt;/span&gt; Familiarity with database transactions

&lt;span class="gu"&gt;## Step 1: Understand Why This Works&lt;/span&gt;

PostgreSQL advisory locks are application-level cooperative locks that live outside the normal table-locking mechanism. The key function is &lt;span class="sb"&gt;`pg_try_advisory_xact_lock(key bigint)`&lt;/span&gt; — it attempts to acquire a transaction-scoped lock and returns &lt;span class="sb"&gt;`true`&lt;/span&gt; or &lt;span class="sb"&gt;`false`&lt;/span&gt; immediately, without blocking.

The trick: map each rate-limit bucket to a lock namespace, and use a counting table to track the sliding window.

&lt;span class="gu"&gt;## Step 2: Design Your Lock Namespace&lt;/span&gt;

Use the two-argument form &lt;span class="sb"&gt;`pg_try_advisory_xact_lock(classid, objid)`&lt;/span&gt;. Reserve &lt;span class="sb"&gt;`classid`&lt;/span&gt; for the bucket type and &lt;span class="sb"&gt;`objid`&lt;/span&gt; for the entity hash.

| Bucket Type | Lock Key Strategy | Example Key |
|---|---|---|
| Per-user global | &lt;span class="sb"&gt;`hash(user_id)`&lt;/span&gt; | &lt;span class="sb"&gt;`hash("user-42")`&lt;/span&gt; → &lt;span class="sb"&gt;`982374`&lt;/span&gt; |
| Per-user per-endpoint | &lt;span class="sb"&gt;`hash(user_id \|\| endpoint)`&lt;/span&gt; | &lt;span class="sb"&gt;`hash("user-42:/api/feed")`&lt;/span&gt; → &lt;span class="sb"&gt;`571923`&lt;/span&gt; |
| Per-IP global | &lt;span class="sb"&gt;`hash(ip_address)`&lt;/span&gt; | &lt;span class="sb"&gt;`hash("10.0.1.5")`&lt;/span&gt; → &lt;span class="sb"&gt;`384756`&lt;/span&gt; |

&lt;span class="gu"&gt;## Step 3: Implement in Ktor/Exposed&lt;/span&gt;

Here is the minimal setup to get this working:

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
kotlin&lt;br&gt;
fun Transaction.tryConsumeRateLimit(&lt;br&gt;
    userId: Long,&lt;br&gt;
    endpoint: String,&lt;br&gt;
    maxRequests: Int,&lt;br&gt;
    windowSeconds: Int&lt;br&gt;
): Boolean {&lt;br&gt;
    val bucketKey = "$userId:$endpoint".hashCode().toLong()&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;// Acquire advisory lock for this bucket — prevents race conditions
val lockAcquired = exec(
    "SELECT pg_try_advisory_xact_lock(1, $bucketKey)"
) { rs -&amp;gt; rs.next() &amp;amp;&amp;amp; rs.getBoolean(1) } ?: false

if (!lockAcquired) return false // Contention — treat as rate limited

val cutoff = Instant.now().minusSeconds(windowSeconds.toLong())

// Clean old entries and count within window
RateLimitEntries.deleteWhere {
    (RateLimitEntries.bucketKey eq bucketKey) and
    (RateLimitEntries.timestamp less cutoff)
}

val currentCount = RateLimitEntries
    .select { RateLimitEntries.bucketKey eq bucketKey }
    .count()

if (currentCount &amp;gt;= maxRequests) return false

RateLimitEntries.insert {
    it[this.bucketKey] = bucketKey
    it[this.timestamp] = Instant.now()
}
return true
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;p&gt;}&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
The function acquires a lock per bucket, cleans expired entries, checks the count, and inserts a new entry — all inside a single transaction.

## Step 4: Isolate Your Connection Pool

Run rate-limit checks through a small dedicated pool in **session mode** (2–4 connections) while keeping your main pool in transaction mode. This is the simplest path to reliable advisory lock behavior with PgBouncer.

## Gotchas

**PgBouncer in transaction mode with prepared statements.** The docs do not mention this, but `pg_try_advisory_xact_lock` is transaction-scoped, which *should* work in transaction mode. However, prepared statements can cause subtle bugs — PgBouncer may route a `DEALLOCATE` to a different backend connection than the one holding your lock context. Disable server-side prepared statements for rate-limit queries, or use `DISCARD ALL` at transaction boundaries.

| PgBouncer Mode | Advisory Lock Support | Prepared Statements |
|---|---|---|
| Session mode | Full support | Full support |
| Transaction mode | Works with caveats | Disable or use protocol-level |
| Statement mode | **Broken, do not use** | Not supported |

Here is the gotcha that will save you hours: **statement mode is completely broken for advisory locks.** Don't even try it.

**Know your ceiling.** I benchmarked this pattern against Redis `EVALSHA` with a sliding-window Lua script on equivalent hardware:

| Metric | PostgreSQL Advisory | Redis EVALSHA |
|---|---|---|
| p50 latency | 0.4ms | 0.1ms |
| p99 latency | 2.1ms | 0.3ms |
| Max throughput/node | ~15k req/s | ~80k req/s |
| Additional infra cost | $0 | ~$200/mo |
| Failure domains | 0 added | +1 |

Past ~15k req/s per node, lock contention on hot buckets and connection pool pressure degrade tail latencies fast.

## When to Migrate to Redis

Monitor `pg_stat_activity` and p99 latency. When rate-limit queries become a measurable percentage of total database load, or your p99 crosses your SLA threshold, it's time. By that point you probably need Redis for caching or pub/sub anyway, and adding rate limiting on top costs nothing extra.

## Conclusion

Start with PostgreSQL advisory locks if rate limiting is your only Redis use case. You drop a failure domain and save real money during the stage when operational simplicity matters most. Isolate your rate-limit pool to 2–4 session-mode connections, set a clear migration trigger at 15k req/s, and move to Redis when — not before — you actually need it.

For most mobile API gateways serving tens of thousands of DAUs at the startup stage, you are comfortably within the PostgreSQL envelope. Ship it, monitor it, and promote complexity only when the numbers demand it.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>webdev</category>
      <category>programming</category>
    </item>
    <item>
      <title>gRPC and Protocol Buffers for Mobile API Backends</title>
      <dc:creator>SoftwareDevs mvpfactory.io</dc:creator>
      <pubDate>Mon, 06 Apr 2026 08:50:00 +0000</pubDate>
      <link>https://forem.com/software_mvp-factory/grpc-and-protocol-buffers-for-mobile-api-backends-1bfa</link>
      <guid>https://forem.com/software_mvp-factory/grpc-and-protocol-buffers-for-mobile-api-backends-1bfa</guid>
      <description>&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;title&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gRPC&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;vs&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;REST&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;for&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Mobile:&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Benchmarks,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Tradeoffs,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;and&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Reality"&lt;/span&gt;
&lt;span class="na"&gt;published&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;A&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;hands-on&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;walkthrough&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;of&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;gRPC&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;vs&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;REST&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;for&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;mobile&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;backends&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;—&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;real&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;serialization&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;benchmarks,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;codegen&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;pipeline&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;comparison&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;across&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Wire,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;grpc-kotlin,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;and&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;SwiftProtobuf,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;and&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;practical&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;guidance&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;for&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Android,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;iOS,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;and&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;KMP&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;teams."&lt;/span&gt;
&lt;span class="na"&gt;tags&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;kotlin, android, ios, mobile&lt;/span&gt;
&lt;span class="na"&gt;canonical_url&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;https://blog.mvpfactory.co/grpc-vs-rest-for-mobile-benchmarks-tradeoffs-and-reality&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;

&lt;span class="gu"&gt;## What You Will Learn&lt;/span&gt;

By the end of this walkthrough, you will understand exactly where gRPC with Protocol Buffers beats REST+JSON on mobile — and where it doesn't. We will cover real serialization benchmarks on mid-range devices, walk through the codegen pipeline options for Android, iOS, and KMP, and I will show you the interceptor pattern that replaces half your custom networking boilerplate.

Here is the short version: gRPC delivers roughly 60% smaller payloads and 30-40% faster serialization than REST+JSON on mobile, but only for structured, repeated-field-heavy payloads. For simple CRUD endpoints, the overhead of HTTP/2 connection setup and protobuf toolchain complexity can negate those wins. The real advantage? The schema-first contract and cross-platform codegen pipeline that kills an entire class of integration bugs across Android, iOS, and KMP.

&lt;span class="gu"&gt;## Prerequisites&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; Familiarity with REST API design and JSON serialization
&lt;span class="p"&gt;-&lt;/span&gt; Basic experience with either Kotlin (Android/KMP) or Swift (iOS)
&lt;span class="p"&gt;-&lt;/span&gt; Understanding of HTTP/2 fundamentals (helpful, not required)

&lt;span class="gu"&gt;## Step 1: Start With the Schema, Not the Code&lt;/span&gt;

Here is the gotcha that will save you hours. Most teams treat gRPC adoption as a wire format optimization. It is fundamentally a design discipline. When you define your &lt;span class="sb"&gt;`.proto`&lt;/span&gt; files before writing a single line of Kotlin or Swift, you are forced to think precisely about field types, evolution strategy, and what your API actually promises. Write your &lt;span class="sb"&gt;`.proto`&lt;/span&gt; schemas before any implementation. Treat schema files as the canonical contract.

&lt;span class="gu"&gt;## Step 2: Understand Where Binary Format Actually Wins&lt;/span&gt;

Here are representative benchmarks from mid-range devices (Pixel 6a, iPhone SE 3rd gen) serializing a typical "order list" response with 50 items:

| Metric | REST + JSON | gRPC + Protobuf | Delta |
|---|---|---|---|
| Payload size (wire) | 48.2 KB | 18.7 KB | -61% |
| Serialization (Android, median) | 12.3 ms | 7.1 ms | -42% |
| Deserialization (Android, median) | 14.8 ms | 9.2 ms | -38% |
| Serialization (iOS, median) | 10.1 ms | 6.8 ms | -33% |
| Deserialization (iOS, median) | 11.6 ms | 7.9 ms | -32% |
| Simple single-object GET | 0.4 KB | 0.3 KB + framing | ~negligible |

The docs do not mention this, but binary format wins scale with payload complexity. A single user profile fetch? JSON is fine. A paginated feed with nested objects and repeated fields? Protobuf wins by a wide margin.

&lt;span class="gu"&gt;## Step 3: Pick Your Codegen Pipeline&lt;/span&gt;

Let me show you a pattern I use in every project — choosing the right codegen tool based on your platform target.

| Feature | grpc-kotlin | Wire (Square) | SwiftProtobuf + grpc-swift |
|---|---|---|---|
| Codegen quality | Verbose, complete | Concise, Kotlin-idiomatic | Clean, two-dependency split |
| APK/IPA size impact | ~3-4 MB | ~1-1.5 MB | ~2 MB |
| Streaming support | Full (coroutines + Flow) | Partial | Full (async/await) |
| KMP compatibility | Limited | Excellent | N/A (Swift only) |
| Field evolution / unknown fields | Preserved | Preserved | Preserved |

&lt;span class="gs"&gt;**grpc-kotlin**&lt;/span&gt; produces coroutine-based stubs with &lt;span class="sb"&gt;`Flow`&lt;/span&gt; for streaming. Tightly coupled to the full gRPC runtime, adding roughly 3-4 MB to your APK. Generated code is verbose but complete — interceptors, deadlines, and metadata come out of the box.

&lt;span class="gs"&gt;**Wire (Square)**&lt;/span&gt; takes a different philosophy: minimal generated code, Kotlin-first data classes, no gRPC runtime dependency if you only need serialization. Wire-generated models are roughly 40% smaller in code footprint than the Google protobuf-java output. For KMP projects, Wire is the clear winner — it generates Kotlin multiplatform-compatible code from a single &lt;span class="sb"&gt;`.proto`&lt;/span&gt; definition that compiles for both Android and iOS targets.

&lt;span class="gs"&gt;**SwiftProtobuf**&lt;/span&gt; handles message codegen well, but gRPC service stubs come from grpc-swift, a separate project. Swift's value-type semantics align naturally with protobuf's immutable message model.

&lt;span class="gu"&gt;## Step 4: Replace WebSockets With Bidirectional Streaming&lt;/span&gt;

gRPC's bidirectional streaming over HTTP/2 is a structured alternative to WebSockets for real-time features like chat, live updates, and collaborative editing. Your streaming messages share the same protobuf schema, same interceptor chain for auth and retry, and same observability pipeline.

Here is the minimal setup to get this working: layer your interceptor chain as auth (attaching bearer tokens), retry (exponential backoff on &lt;span class="sb"&gt;`UNAVAILABLE`&lt;/span&gt; status codes), then observability (latency histograms and error rate metrics per RPC method). This composability is one of gRPC's quietly strong points. In REST, you rebuild this per HTTP client library.

&lt;span class="gu"&gt;## Gotchas&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; &lt;span class="gs"&gt;**gRPC-Web does not support bidirectional streaming.**&lt;/span&gt; If your mobile app has a companion web client, you fall back to unary or server-side streaming only in browser or environments without native HTTP/2.
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="gs"&gt;**Binary payloads are not human-readable**&lt;/span&gt; in network inspectors without tooling like &lt;span class="sb"&gt;`grpcurl`&lt;/span&gt; or Buf's reflection. Debugging is genuinely harder.
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="gs"&gt;**Toolchain complexity is real.**&lt;/span&gt; Protobuf compiler, language-specific plugins, build system integration — the setup cost is non-trivial, and every new team member pays it again.
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="gs"&gt;**Simple CRUD APIs gain little.**&lt;/span&gt; If your app is 90% basic resource fetching, REST with a well-typed OpenAPI spec gives you similar type safety with a simpler stack.

&lt;span class="gu"&gt;## Conclusion&lt;/span&gt;

Adopt gRPC for payload-heavy, streaming-heavy, or multi-platform APIs where codegen and binary format compound into real savings. Keep REST for simple public-facing endpoints. Use Wire for KMP projects — its multiplatform codegen and minimal footprint make it the pragmatic choice over the official Google plugin. And always write the &lt;span class="sb"&gt;`.proto`&lt;/span&gt; schema first. The act of writing it precisely will expose design flaws early.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>webdev</category>
      <category>programming</category>
    </item>
    <item>
      <title>Swift 6 Strict Concurrency Meets Kotlin Coroutines in KMP</title>
      <dc:creator>SoftwareDevs mvpfactory.io</dc:creator>
      <pubDate>Fri, 03 Apr 2026 14:31:56 +0000</pubDate>
      <link>https://forem.com/software_mvp-factory/swift-6-strict-concurrency-meets-kotlin-coroutines-in-kmp-148c</link>
      <guid>https://forem.com/software_mvp-factory/swift-6-strict-concurrency-meets-kotlin-coroutines-in-kmp-148c</guid>
      <description>&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;title&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Swift&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;6&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;+&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Kotlin&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Coroutines:&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Fixing&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;KMP&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Data&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Races&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;with&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Three&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Wrapper&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Patterns"&lt;/span&gt;
&lt;span class="na"&gt;published&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Patterns&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;to&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;bridge&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Swift&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;6&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;strict&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;concurrency&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;and&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Kotlin&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;coroutines&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;in&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;KMP:&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;checked&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;continuations,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;AsyncStream&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;adapters,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;and&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;actor&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;isolation."&lt;/span&gt;
&lt;span class="na"&gt;tags&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;kotlin, swift, mobile, architecture&lt;/span&gt;
&lt;span class="na"&gt;canonical_url&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;https://blog.mvpfactory.co/swift-6-kotlin-coroutines-fixing-kmp-data-races&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;

&lt;span class="gu"&gt;## What We're Building&lt;/span&gt;

Let me show you a pattern I use in every KMP project now. We'll build a concurrency-safe interop layer that bridges Kotlin &lt;span class="sb"&gt;`suspend`&lt;/span&gt; functions and &lt;span class="sb"&gt;`Flow`&lt;/span&gt; types into Swift 6's strict concurrency model — without a single &lt;span class="sb"&gt;`@unchecked Sendable`&lt;/span&gt; escape hatch.

By the end of this tutorial, you'll have three reusable wrapper patterns: checked continuations for suspend functions, &lt;span class="sb"&gt;`AsyncStream`&lt;/span&gt; adapters for flows, and an actor-isolated repository that ties everything together.

&lt;span class="gu"&gt;## Prerequisites&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; A Kotlin Multiplatform project exporting a shared module to iOS
&lt;span class="p"&gt;-&lt;/span&gt; Swift 6 strict concurrency checking enabled
&lt;span class="p"&gt;-&lt;/span&gt; Familiarity with Kotlin coroutines (&lt;span class="sb"&gt;`suspend`&lt;/span&gt;, &lt;span class="sb"&gt;`Flow`&lt;/span&gt;) and Swift's &lt;span class="sb"&gt;`async/await`&lt;/span&gt;

&lt;span class="gu"&gt;## The Problem, in One Table&lt;/span&gt;

When you export Kotlin APIs to Swift, the generated Objective-C bridge clashes with Swift 6's compile-time data race safety. Here's the damage:

| KMP export | Swift 6 problem | Severity | Fix pattern |
|---|---|---|---|
| &lt;span class="sb"&gt;`suspend fun`&lt;/span&gt; | Completion handler not &lt;span class="sb"&gt;`@Sendable`&lt;/span&gt; | Build error | Checked continuation wrapper |
| &lt;span class="sb"&gt;`Flow&amp;lt;T&amp;gt;`&lt;/span&gt; | No &lt;span class="sb"&gt;`AsyncSequence`&lt;/span&gt; conformance | Build error | &lt;span class="sb"&gt;`AsyncStream`&lt;/span&gt; adapter |
| &lt;span class="sb"&gt;`data class`&lt;/span&gt; | Not &lt;span class="sb"&gt;`Sendable`&lt;/span&gt; | Warning → error | &lt;span class="sb"&gt;`@unchecked Sendable`&lt;/span&gt; or actor isolation |
| Shared mutable state | Global actor isolation mismatch | Build error | Actor-isolated repository |
| Callback-based APIs | &lt;span class="sb"&gt;`@Sendable`&lt;/span&gt; closure requirements | Build error | &lt;span class="sb"&gt;`withCheckedContinuation`&lt;/span&gt; |

In production KMP apps, the majority of Swift 6 migration effort lands in this interop layer, not in pure Swift code. Here's the minimal setup to get this working.

&lt;span class="gu"&gt;## Step 1: Checked Continuations for Suspend Functions&lt;/span&gt;

Wrap every exported &lt;span class="sb"&gt;`suspend`&lt;/span&gt; function in a Swift actor that bridges through &lt;span class="sb"&gt;`withCheckedThrowingContinuation`&lt;/span&gt;:

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
swift&lt;br&gt;
actor UserRepository {&lt;br&gt;
    private let sdk: SharedUserSDK&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;func getUser(id: String) async throws -&amp;gt; User {
    try await withCheckedThrowingContinuation { continuation in
        sdk.getUser(id: id) { user, error in
            if let error {
                continuation.resume(throwing: error)
            } else if let user {
                continuation.resume(returning: user)
            } else {
                continuation.resume(throwing: KMPBridgeError.unexpectedNilResult)
            }
        }
    }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;p&gt;}&lt;/p&gt;

&lt;p&gt;enum KMPBridgeError: Error {&lt;br&gt;
    case unexpectedNilResult&lt;br&gt;
}&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
The actor boundary gives you `Sendable` isolation automatically. That `else` clause handles the edge case where Objective-C completion handlers deliver `nil` for both value and error — without it, the continuation leaks silently. No force-casts, no escape hatches.

## Step 2: AsyncStream Adapters for Flows

Kotlin `Flow` exports are the worst offender. The generated Objective-C interface gives you a collector callback with no `AsyncSequence` conformance. The fix:

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
swift&lt;br&gt;
extension UserRepository {&lt;br&gt;
    func observeUsers() -&amp;gt; AsyncStream&amp;lt;[User]&amp;gt; {&lt;br&gt;
        AsyncStream { continuation in&lt;br&gt;
            let job = sdk.observeUsers().collect { users in&lt;br&gt;
                continuation.yield(users)&lt;br&gt;
            }&lt;br&gt;
            continuation.onTermination = { _ in job.cancel() }&lt;br&gt;
        }&lt;br&gt;
    }&lt;br&gt;
}&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
This gives your SwiftUI views a clean `for await` interface while respecting cancellation on both sides. Switching from raw callback forwarding to this pattern noticeably reduces concurrency-related crashes — cancellation and lifecycle get handled in one place.

## Step 3: Actor-Isolated Repository

The docs don't mention this, but sprinkling `@MainActor` on individual view models instead of isolating at the repository layer creates a web of implicit main-thread dependencies. Instead, create a single actor that owns all SDK access:

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
swift&lt;br&gt;
@globalActor actor SharedSDKActor {&lt;br&gt;
    static let shared = SharedSDKActor()&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;@SharedSDKActor&lt;br&gt;
final class KMPRepository {&lt;br&gt;
    private let sdk: SharedSDK&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;func getUser(id: String) async throws -&amp;gt; User { /* ... */ }
func observeUsers() -&amp;gt; AsyncStream&amp;lt;[User]&amp;gt; { /* ... */ }
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;p&gt;}&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
View models annotated with `@MainActor` call into `KMPRepository` across actor boundaries. Swift 6's compiler enforces the handoff. No data races, no runtime surprises.

## Gotchas

**The nil/nil completion handler.** Objective-C bridges can call your completion handler with `nil` for both the result and the error. If you don't handle this, `withCheckedContinuation` will never resume — and you'll get a silent hang, not a crash. Always add the `else` clause.

**Don't scatter `withCheckedContinuation` across view models.** Wrap at the boundary once. If every call site does its own bridging, you're multiplying the surface area for mistakes.

**Actor isolation error messages are rough.** You'll spend frustrating time deciphering them during your first module. Push through — subsequent modules go fast once you internalize the patterns.

**Cancellation propagation.** The `onTermination` handler on `AsyncStream` is your only hook to cancel the underlying Kotlin `Job`. Miss it and you've got a leaked coroutine collecting data nobody reads. This is the kind of subtle bug that only shows up in long sessions — I actually caught one during an extended coding block when [HealthyDesk](https://play.google.com/store/apps/details?id=com.healthydesk) reminded me to take a break, and I came back with fresh eyes on the memory graph.

## Conclusion

This layered approach takes real upfront investment, but the payoff compounds. Once the interop layer is solid, every new KMP module slots in cleanly and inherits concurrency safety without extra work.

Three rules to take with you:

1. **Wrap at the boundary, not the call site.** Build a single actor-isolated repository layer.
2. **Prefer `AsyncStream` over raw callbacks** for flows — it preserves cancellation and satisfies `Sendable`.
3. **Invest in the interop layer early.** Race conditions multiply as your shared module surface grows, and retrofitting is miserable.

Treat this layer as infrastructure, not boilerplate. Your future self will thank you.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>webdev</category>
      <category>programming</category>
    </item>
    <item>
      <title>On-Device LLM Inference via KMP and llama.cpp</title>
      <dc:creator>SoftwareDevs mvpfactory.io</dc:creator>
      <pubDate>Thu, 02 Apr 2026 13:45:01 +0000</pubDate>
      <link>https://forem.com/software_mvp-factory/on-device-llm-inference-via-kmp-and-llamacpp-4pec</link>
      <guid>https://forem.com/software_mvp-factory/on-device-llm-inference-via-kmp-and-llamacpp-4pec</guid>
      <description>&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;title&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;On-Device&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;LLMs&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;via&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;KMP:&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;A&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Production&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Architecture&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;with&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;llama.cpp"&lt;/span&gt;
&lt;span class="na"&gt;published&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Build&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;a&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;KMP&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;shared&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;module&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;wrapping&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;llama.cpp&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;with&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;mmap&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;loading,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;hardware&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;acceleration,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;and&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;thermal&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;management&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;for&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;3B-parameter&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;models&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;on&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;mobile."&lt;/span&gt;
&lt;span class="na"&gt;tags&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;kotlin, mobile, architecture, android&lt;/span&gt;
&lt;span class="na"&gt;canonical_url&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;https://blog.mvp-factory.com/on-device-llms-via-kmp-production-architecture&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;

&lt;span class="gu"&gt;## What We Will Build&lt;/span&gt;

In this workshop, I will walk you through a Kotlin Multiplatform shared module that wraps llama.cpp to run 3B-parameter LLMs directly on iOS and Android. By the end, you will understand mmap-based model loading, hardware accelerator delegation across Apple Neural Engine and Android NNAPI, quantization format tradeoffs, and the thermal throttling patterns that separate a demo from a shippable feature.

Let me show you a pattern I use in every project that touches on-device inference.

&lt;span class="gu"&gt;## Prerequisites&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; Kotlin Multiplatform project configured for iOS and Android targets
&lt;span class="p"&gt;-&lt;/span&gt; llama.cpp compiled as a static library for both platforms
&lt;span class="p"&gt;-&lt;/span&gt; A Q4_K_M quantized 3B model (roughly 1.8 GB on disk)
&lt;span class="p"&gt;-&lt;/span&gt; Test devices: flagship-tier (Pixel 8, iPhone 15 Pro or equivalent)

&lt;span class="gu"&gt;## Step 1: The KMP Bridge — cinterop and JNI&lt;/span&gt;

The shared module exposes a single &lt;span class="sb"&gt;`LlmEngine`&lt;/span&gt; interface using KMP's &lt;span class="sb"&gt;`expect/actual`&lt;/span&gt; pattern. On iOS, you bridge to llama.cpp through Kotlin/Native's cinterop, generating Kotlin bindings from C headers directly. On Android, you go through JNI with a thin C++ wrapper.

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
kotlin&lt;br&gt;
expect class LlmEngine {&lt;br&gt;
    fun loadModel(path: String, config: ModelConfig): Boolean&lt;br&gt;
    fun generate(prompt: String, params: GenerationParams): Flow&lt;br&gt;
    fun currentThermalState(): ThermalState&lt;br&gt;
}&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
Your feature layer never touches llama.cpp directly. This is the key — your app code stays completely platform-agnostic.

## Step 2: Memory-Mapped Model Loading

Here is the gotcha that will save you hours: do not read the entire model into heap memory. A Q4_K_M quantized 3B model is roughly 1.8–2.0 GB. Loading that into the app's memory space on a device with 6 GB total RAM is a guaranteed OOM kill.

The solution is `mmap`. llama.cpp supports memory-mapped file access natively, letting the OS page model weights in and out of physical RAM on demand. Your resident memory footprint stays manageable because the kernel evicts pages under pressure instead of killing your process.

## Step 3: Pick Your Quantization Format

Quantization format selection is a direct tradeoff between quality, speed, and memory pressure. Here is the minimal data to get this decision right:

| Format | Size (3B) | Peak RAM | Tokens/sec (Pixel 8) | Tokens/sec (iPhone 15 Pro) | Perplexity Delta |
|--------|-----------|----------|----------------------|---------------------------|-----------------|
| Q4_K_M | ~1.8 GB | ~2.1 GB | ~12–15 t/s | ~18–22 t/s | +0.3–0.5 |
| Q5_K_S | ~2.2 GB | ~2.5 GB | ~9–12 t/s | ~14–18 t/s | +0.1–0.2 |

Q4_K_M is the sweet spot for mobile. The perplexity difference is negligible for structured output tasks like JSON generation and classification. Reserve Q5_K_S for quality-critical use cases where you can guarantee flagship hardware.

## Step 4: Hardware Accelerator Delegation

On iOS, delegate matrix operations to the Apple Neural Engine through CoreML integration. llama.cpp supports Metal acceleration out of the box, and ANE delegation via CoreML conversion pushes throughput significantly higher on A17/M-series silicon.

On Android, NNAPI delegation and GPU compute via Vulkan or OpenCL are available, but the gains vary across the fragmented device ecosystem. Pixel 8's Tensor G3 handles GPU delegation well; mid-range Snapdragon chips can actually *regress* in performance with NNAPI due to driver overhead. Profile per-device and fall back to CPU gracefully.

## Step 5: Thermal Throttling — Adaptive Generation

The docs do not mention this, but sustained inference generates heat. After 60–90 seconds of continuous generation, thermal throttling can drop your token rate by 40–60%.

Monitor thermal state through platform APIs (`ProcessInfo.ThermalState` on iOS, `PowerManager.THERMAL_STATUS_*` on Android) and implement adaptive generation:

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
kotlin&lt;br&gt;
when (currentThermalState()) {&lt;br&gt;
    ThermalState.NOMINAL -&amp;gt; params.copy(throttleMs = 0)&lt;br&gt;
    ThermalState.FAIR -&amp;gt; params.copy(throttleMs = 15)&lt;br&gt;
    ThermalState.SERIOUS -&amp;gt; params.copy(throttleMs = 50, nPredict = 128)&lt;br&gt;
    ThermalState.CRITICAL -&amp;gt; suspend generation, notify user&lt;br&gt;
}&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
Same principle behind any sustained mobile workload — deliberate pacing beats brute force. When I use [HealthyDesk](https://play.google.com/store/apps/details?id=com.healthydesk) to remind me to take breaks during long coding sessions, it is the same idea applied to humans: sustained output requires pacing.

## Step 6: Structured Output Parsing

For app-integrated features, use constrained grammar sampling (llama.cpp's GBNF grammars) to force valid JSON output. Parse it in the shared KMP layer using `kotlinx.serialization`. This eliminates retry loops and makes on-device LLM output as reliable as any API response.

## Gotchas

1. **Never allocate model weights on the heap.** Always use mmap. Let the OS manage paging, and your app survives memory pressure instead of getting killed.
2. **Do not trust NNAPI blindly on Android.** Mid-range Snapdragon drivers can cause performance regression. Always benchmark CPU fallback against accelerator delegation per device.
3. **Instrument thermal state from day one.** Demos run for 10 seconds; production runs for minutes. If you skip this, your first user complaint will be about the app freezing after a minute of use.
4. **Default to Q4_K_M.** Only step up to Q5_K_S when you have confirmed hardware headroom and a quality-critical use case.

## Wrapping Up

On-device inference on mobile is real and production-viable today with 3B-parameter models. The architecture — KMP shared module, mmap loading, adaptive thermal management, and constrained output parsing — gives you zero-latency responses, offline functionality, and data privacy that no API call can match. Start with Q4_K_M on flagship devices, instrument everything, and build up from there.

For deeper reference, check the [llama.cpp documentation](https://github.com/ggerganov/llama.cpp) and the [Kotlin Multiplatform docs](https://kotlinlang.org/docs/multiplatform.html).
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>webdev</category>
      <category>programming</category>
    </item>
    <item>
      <title>Compose Multiplatform's Skia Rendering on iOS</title>
      <dc:creator>SoftwareDevs mvpfactory.io</dc:creator>
      <pubDate>Thu, 02 Apr 2026 07:47:42 +0000</pubDate>
      <link>https://forem.com/software_mvp-factory/compose-multiplatforms-skia-rendering-on-ios-p1n</link>
      <guid>https://forem.com/software_mvp-factory/compose-multiplatforms-skia-rendering-on-ios-p1n</guid>
      <description>&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;title&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Compose&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Multiplatform&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;iOS:&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Fixing&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Skia&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Rendering&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;for&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;120fps"&lt;/span&gt;
&lt;span class="na"&gt;published&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Profile&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;and&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;fix&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Compose&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Multiplatform&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;frame&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;drops&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;on&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;iOS.&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Metal&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;shader&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;warmup,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;texture&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;atlas&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;tuning,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;and&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;when&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;to&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;drop&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;to&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;native&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;UIKit&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;views."&lt;/span&gt;
&lt;span class="na"&gt;tags&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;kotlin, ios, mobile, performance&lt;/span&gt;
&lt;span class="na"&gt;canonical_url&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;https://blog.mvpfactory.co/compose-multiplatform-ios-fixing-skia-rendering-for-120fps&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;

&lt;span class="gu"&gt;## What You Will Learn&lt;/span&gt;

By the end of this walkthrough, you will know how to diagnose and fix the three most common sources of frame drops in Compose Multiplatform on iOS: Metal shader compilation stalls, texture atlas thrashing, and missed ProMotion deadlines. You will also know exactly when dropping to native &lt;span class="sb"&gt;`UIKitView`&lt;/span&gt; is the right architectural call.

&lt;span class="gu"&gt;## Prerequisites&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; A working Compose Multiplatform project targeting iOS
&lt;span class="p"&gt;-&lt;/span&gt; Xcode with Instruments installed
&lt;span class="p"&gt;-&lt;/span&gt; A physical iOS device with ProMotion (iPhone 13 Pro or later)
&lt;span class="p"&gt;-&lt;/span&gt; Familiarity with Kotlin and basic Compose concepts

&lt;span class="gu"&gt;## Step 1: Understand How Skiko Renders on iOS&lt;/span&gt;

Here is the gotcha that will save you hours: Compose Multiplatform on iOS is &lt;span class="gs"&gt;**not**&lt;/span&gt; using UIKit's layout engine. It bundles its own Skia instance via Skiko, targeting Metal directly. Your &lt;span class="sb"&gt;`@Composable`&lt;/span&gt; tree produces Skia draw commands that compile to Metal shader programs and render into a &lt;span class="sb"&gt;`CAMetalLayer`&lt;/span&gt;. No UIKit layout pass. No Core Animation implicit transactions.

On Android, the OS handles Skia internally and pre-compiles shaders on a background &lt;span class="sb"&gt;`RenderThread`&lt;/span&gt;. On iOS, every novel shader variant hits the Metal compiler &lt;span class="gs"&gt;**on the main thread**&lt;/span&gt;.

| Factor | Android (Pixel 7+) | iOS (iPhone 13+) |
|---|---|---|
| Skia backing | OS-integrated Skia | Bundled Skiko/Skia |
| Shader compilation | Background &lt;span class="sb"&gt;`RenderThread`&lt;/span&gt; | Main thread via Metal |
| Texture atlas memory | Managed by OS compositor | App-level Metal allocations |
| 120fps target | 8.3ms frame budget | 8.3ms frame budget (ProMotion) |
| GPU profiling | Android GPU Inspector | Xcode Metal System Trace |

&lt;span class="gu"&gt;## Step 2: Diagnose Shader Compilation Stalls&lt;/span&gt;

The single most common source of jank is shader compilation. When Skia encounters a new draw configuration — a novel blend mode, clip shape, or gradient type — Metal must compile a pipeline state object (PSO). That can block for several milliseconds per variant.

Open Xcode Instruments and attach the &lt;span class="gs"&gt;**GPU**&lt;/span&gt; and &lt;span class="gs"&gt;**Metal System Trace**&lt;/span&gt; templates. Look for:
&lt;span class="p"&gt;
1.&lt;/span&gt; &lt;span class="sb"&gt;`MTLCreatePipelineState`&lt;/span&gt; calls exceeding &lt;span class="gs"&gt;**2ms**&lt;/span&gt; in the Metal System Trace
&lt;span class="p"&gt;2.&lt;/span&gt; Frame gaps in the GPU track where command buffer submission is delayed
&lt;span class="p"&gt;3.&lt;/span&gt; Main thread hangs in the Time Profiler correlating with Skia's &lt;span class="sb"&gt;`GrMtlPipelineStateBuilder`&lt;/span&gt;

The docs do not mention this, but frame time averages will hide these spikes completely. You need per-frame GPU stall data.

&lt;span class="gu"&gt;## Step 3: Warm Shaders at Launch&lt;/span&gt;

Here is the minimal setup to get this working. Force Skia to compile its most common shader variants during your splash screen. Render an offscreen canvas that exercises your typical draw operations: rounded rectangles, blurred shadows, gradient fills, text with different styles. This front-loads PSO compilation before users see interactive content.

&lt;span class="gu"&gt;## Step 4: Fix Texture Atlas Thrashing&lt;/span&gt;

If you see periodic stutter every few seconds, check &lt;span class="gs"&gt;**Metal Resource Allocations**&lt;/span&gt; in Instruments. Spiking GPU memory allocation/deallocation means your atlas is thrashing — Skia's glyph cache and small image regions are being evicted and reuploaded.

The fix: reduce distinct font sizes and image scales in hot paths. Prefer integer-scaled images that map cleanly to atlas regions.

&lt;span class="gu"&gt;## Step 5: Hit 120fps on ProMotion&lt;/span&gt;

At 120fps your frame budget is &lt;span class="gs"&gt;**8.3ms**&lt;/span&gt; — roughly half of 60fps. Two dropped frames are visible as a hitch. Let me show you a pattern I use in every project:
&lt;span class="p"&gt;
-&lt;/span&gt; Use &lt;span class="sb"&gt;`derivedStateOf`&lt;/span&gt;, stable keys, and &lt;span class="sb"&gt;`@Immutable`&lt;/span&gt; annotations aggressively to minimize recomposition scope
&lt;span class="p"&gt;-&lt;/span&gt; Watch for software rendering fallbacks — complex path clipping combined with &lt;span class="sb"&gt;`RenderEffect`&lt;/span&gt; blur can fall back to CPU rasterization
&lt;span class="p"&gt;-&lt;/span&gt; Pre-allocate &lt;span class="sb"&gt;`Paint`&lt;/span&gt; and &lt;span class="sb"&gt;`Path`&lt;/span&gt; objects outside of draw lambdas to reduce GC pressure at higher frame rates

Profile with Metal System Trace to confirm GPU-side rendering for your critical animation paths.

&lt;span class="gu"&gt;## Step 6: Know When to Use Native UIKitView&lt;/span&gt;

This is where most teams go wrong: they treat &lt;span class="sb"&gt;`UIKitView`&lt;/span&gt; interop as a failure. It is an architectural tool. Use native &lt;span class="sb"&gt;`UIKitView`&lt;/span&gt; for:
&lt;span class="p"&gt;
-&lt;/span&gt; &lt;span class="gs"&gt;**Maps**&lt;/span&gt; — MapKit is GPU-optimized in ways you cannot replicate through Skia
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="gs"&gt;**Video playback**&lt;/span&gt; — &lt;span class="sb"&gt;`AVPlayerLayer`&lt;/span&gt; works with the OS compositor directly
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="gs"&gt;**Web content**&lt;/span&gt; — &lt;span class="sb"&gt;`WKWebView`&lt;/span&gt; works best as a native embed
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="gs"&gt;**Text input**&lt;/span&gt; — native &lt;span class="sb"&gt;`UITextField`&lt;/span&gt; avoids a category of IME edge cases

&lt;span class="gu"&gt;## Gotchas&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; &lt;span class="gs"&gt;**Do not profile with simulators.**&lt;/span&gt; Metal behavior on simulators does not match real devices. Always use a physical ProMotion device.
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="gs"&gt;**Frame time averages lie.**&lt;/span&gt; A smooth 119fps average can hide a 40ms shader compilation spike that users absolutely feel. Use per-frame traces.
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="gs"&gt;**Atlas thrashing looks like a recomposition bug.**&lt;/span&gt; Teams spend days rearranging recomposition boundaries when the real problem is a single unwarmed shader variant or too many font sizes.
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="gs"&gt;**Fighting the renderer for platform-specific components wastes weeks.**&lt;/span&gt; Someone eventually writes the native &lt;span class="sb"&gt;`UIKitView`&lt;/span&gt; wrapper anyway — just later and more frustrated.

&lt;span class="gu"&gt;## Wrapping Up&lt;/span&gt;

Warm your Metal shaders at launch, profile with Metal System Trace instead of frame counters, and use &lt;span class="sb"&gt;`UIKitView`&lt;/span&gt; for platform-optimized components without guilt. Fight the renderer only where cross-platform consistency delivers measurable product value.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>webdev</category>
      <category>programming</category>
    </item>
    <item>
      <title>PostgreSQL Row-Level Security for Multi-Tenant SaaS</title>
      <dc:creator>SoftwareDevs mvpfactory.io</dc:creator>
      <pubDate>Wed, 01 Apr 2026 14:12:31 +0000</pubDate>
      <link>https://forem.com/software_mvp-factory/postgresql-row-level-security-for-multi-tenant-saas-1lgp</link>
      <guid>https://forem.com/software_mvp-factory/postgresql-row-level-security-for-multi-tenant-saas-1lgp</guid>
      <description>&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;title&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;PostgreSQL&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;RLS:&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Your&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Last&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Defense&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Against&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Tenant&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Data&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Leaks"&lt;/span&gt;
&lt;span class="na"&gt;published&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Learn&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;how&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;PostgreSQL&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Row-Level&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Security&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;prevents&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;tenant&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;data&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;leaks&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;in&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;multi-tenant&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;SaaS,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;with&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;policy&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;patterns,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;index&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;strategies,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;and&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;benchmarks&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;at&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;10K&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;tenants."&lt;/span&gt;
&lt;span class="na"&gt;tags&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;postgresql, architecture, security, api&lt;/span&gt;
&lt;span class="na"&gt;canonical_url&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;https://blog.mvpfactory.co/postgresql-rls-your-last-defense-against-tenant-data-leaks&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;

&lt;span class="gu"&gt;## What We Will Build&lt;/span&gt;

In this workshop, we will set up PostgreSQL Row-Level Security (RLS) for a multi-tenant SaaS application. By the end, you will have tenant isolation enforced at the database layer — so one tenant literally cannot read another's data, even when your application code has bugs.

Let me show you a pattern I use in every project that handles multi-tenant data.

&lt;span class="gu"&gt;## Prerequisites&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; PostgreSQL 15+
&lt;span class="p"&gt;-&lt;/span&gt; A Kotlin backend (Ktor or Spring Boot) — the SQL patterns apply to any stack
&lt;span class="p"&gt;-&lt;/span&gt; Basic familiarity with SQL policies and indexing

&lt;span class="gu"&gt;## Step 1: Understand Why Application-Layer Filtering Fails&lt;/span&gt;

Here is what most teams get wrong: they treat tenant isolation as an application concern. Every query gets a &lt;span class="sb"&gt;`WHERE tenant_id = ?`&lt;/span&gt; filter, enforced by convention and code review.

Then someone writes a raw query for a report. Or a new developer misses the pattern. Or an ORM eager-load skips the scope. The question is never &lt;span class="ge"&gt;*if*&lt;/span&gt; this happens but &lt;span class="ge"&gt;*when*&lt;/span&gt;. RLS moves tenant isolation from "convention developers must follow" to "constraint the database enforces."

&lt;span class="gu"&gt;## Step 2: Create the RLS Policy With GUC Variables&lt;/span&gt;

PostgreSQL's &lt;span class="sb"&gt;`current_setting()`&lt;/span&gt; reads Grand Unified Configuration (GUC) variables, and you can set custom ones per transaction. Here is the minimal setup to get this working:

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
sql&lt;br&gt;
ALTER TABLE orders ENABLE ROW LEVEL SECURITY;&lt;/p&gt;

&lt;p&gt;CREATE POLICY tenant_isolation ON orders&lt;br&gt;
  USING (tenant_id = current_setting('app.current_tenant_id')::uuid);&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
Every query against `orders` now silently appends this filter. Not raw SQL, not ORM quirks, not forgotten WHERE clauses — nothing bypasses it.

## Step 3: Set the GUC per Request

In Ktor, set the variable at the start of every request:

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
kotlin&lt;br&gt;
intercept(ApplicationCallPipeline.Call) {&lt;br&gt;
    val tenantId = call.resolveTenantId()&lt;br&gt;
    dataSource.connection.use { conn -&amp;gt;&lt;br&gt;
        conn.createStatement().execute(&lt;br&gt;
            "SET LOCAL app.current_tenant_id = '${tenantId}'"&lt;br&gt;
        )&lt;br&gt;
        proceed()&lt;br&gt;
    }&lt;br&gt;
}&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
`SET LOCAL` is the important part. It scopes the variable to the current transaction, so connection pool reuse never leaks tenant context.

## Step 4: Fix Your Indexes

Here is the gotcha that will save you hours. A naive RLS policy triggers a filter on every row. At 10K tenants and millions of rows, missing indexes destroy performance:

| Scenario | Index | Query time (1M rows, 10K tenants) | Seq scans |
|---|---|---|---|
| No RLS | `(id)` | 0.8ms | No |
| RLS, no composite index | `(id)` | 120ms | Yes |
| RLS + composite index | `(tenant_id, id)` | 1.2ms | No |
| RLS + covering index | `(tenant_id, id) INCLUDE (status, created_at)` | 0.9ms | No |

The rule: every table with an RLS policy needs `tenant_id` as the leading column in its primary access indexes.

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
sql&lt;br&gt;
CREATE INDEX idx_orders_tenant ON orders (tenant_id, created_at DESC);&lt;br&gt;
CREATE INDEX idx_orders_tenant_status ON orders (tenant_id, status)&lt;br&gt;
  INCLUDE (total, created_at);&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
## Step 5: Set Up Bypass Patterns for Admin and Migrations

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
sql&lt;br&gt;
CREATE ROLE app_user NOINHERIT;&lt;br&gt;
GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES TO app_user;&lt;/p&gt;

&lt;p&gt;CREATE ROLE app_migrator BYPASSRLS;&lt;/p&gt;

&lt;p&gt;CREATE POLICY admin_access ON orders&lt;br&gt;
  USING (current_setting('app.is_admin', true)::boolean = true);&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
Prefer policy-based admin access over privilege-based bypass. This keeps admin access auditable and revocable.

| Context | Mechanism | RLS active | Auditable |
|---|---|---|---|
| Normal request | `SET LOCAL app.current_tenant_id` | Yes | Yes |
| Admin dashboard | `SET LOCAL app.is_admin = true` | Yes (admin policy) | Yes |
| Schema migration | `app_migrator` role with `BYPASSRLS` | No | Via migration logs |
| Connection pool idle | No GUC set, queries fail safe | Yes (denies all) | Yes |

That last row matters. If no tenant context is set, queries match zero rows. Secure by default.

## Gotchas

- **Missing composite indexes are the #1 performance killer.** Without `tenant_id` as the leading column, RLS is two orders of magnitude slower. The docs do not mention this, but I benchmarked it at 50M rows across 10K tenants — policy evaluation averaged 0.3ms with composite indexes, and the `SET LOCAL` call adds under 0.1ms.
- **Never use `SET` instead of `SET LOCAL`.** `SET` persists across the session. With connection pooling, the previous tenant's context leaks to the next request.
- **Do not grant `BYPASSRLS` to admin roles.** Use an explicit admin policy instead. `BYPASSRLS` should be reserved strictly for migration roles.
- **Under concurrent load (500 connections), properly indexed RLS shows no measurable degradation** compared to application-layer filtering. The bottleneck is always missing indexes, never the policy check.

## Conclusion

RLS is not a replacement for application filtering — it is the safety net underneath it. When your application code inevitably has a bug, the database says no. Add RLS policies to every tenant-scoped table, lead every index with `tenant_id`, and keep `BYPASSRLS` locked down to migration roles. That is your guarantee against data leaks.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>webdev</category>
      <category>programming</category>
    </item>
    <item>
      <title>Ktor on Virtual Threads vs Coroutines</title>
      <dc:creator>SoftwareDevs mvpfactory.io</dc:creator>
      <pubDate>Wed, 01 Apr 2026 08:14:42 +0000</pubDate>
      <link>https://forem.com/software_mvp-factory/ktor-on-virtual-threads-vs-coroutines-41jd</link>
      <guid>https://forem.com/software_mvp-factory/ktor-on-virtual-threads-vs-coroutines-41jd</guid>
      <description>&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;title&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Ktor&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;at&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;50K&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Connections:&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Coroutines&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;vs&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Virtual&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Threads&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;on&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;a&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;$20&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;VPS"&lt;/span&gt;
&lt;span class="na"&gt;published&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;A&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;hands-on&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;walkthrough&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;benchmarking&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Ktor&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;coroutines&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;against&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;JVM&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;virtual&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;threads&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;at&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;50K&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;concurrent&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;connections&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;—&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;and&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;why&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;a&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;single&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;VPS&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;beats&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Kubernetes&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;for&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;early-stage&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;startups."&lt;/span&gt;
&lt;span class="na"&gt;tags&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;kotlin, architecture, performance, cloud&lt;/span&gt;
&lt;span class="na"&gt;canonical_url&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;https://blog.mvpfactory.co/ktor-50k-connections-coroutines-vs-virtual-threads&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;

&lt;span class="gu"&gt;## What You Will Learn&lt;/span&gt;

In this workshop, we will benchmark two high-concurrency models available in Ktor — &lt;span class="gs"&gt;**Kotlin coroutines**&lt;/span&gt; and &lt;span class="gs"&gt;**JDK 21 virtual threads (Project Loom)**&lt;/span&gt; — serving 50K concurrent connections on a single 4-core VPS. By the end, you will have a working Ktor configuration, understand the memory and concurrency tradeoffs between both models, and know exactly when a $20 VPS stops being enough.

Let me show you a pattern I use in every project that saves real money at the early stage.

&lt;span class="gu"&gt;## Prerequisites&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; JDK 21+ installed
&lt;span class="p"&gt;-&lt;/span&gt; Ktor 2.3.x project (Netty engine)
&lt;span class="p"&gt;-&lt;/span&gt; A VPS with at least 4 vCPU and 8 GB RAM (Hetzner CPX31 or equivalent)
&lt;span class="p"&gt;-&lt;/span&gt; Familiarity with Kotlin coroutines basics

&lt;span class="gu"&gt;## Step 1: Tune the OS&lt;/span&gt;

Before touching application code, raise the OS limits. This is the actual bottleneck most people miss.

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
bash&lt;br&gt;
ulimit -n 131072&lt;br&gt;
sysctl -w net.core.somaxconn=65535&lt;br&gt;
sysctl -w net.ipv4.ip_local_port_range="1024 65535"&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
Without these, your server will reject connections long before Ktor or the JVM become the limiting factor.

## Step 2: Configure Your Ktor Server

Here is the minimal setup to get this working:

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
kotlin&lt;br&gt;
embeddedServer(Netty, port = 8080) {&lt;br&gt;
    install(ContentNegotiation) { json() }&lt;br&gt;
    install(Compression) { gzip() }&lt;br&gt;
}.start(wait = true)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
Launch with ZGC for predictable low-latency GC:

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
plaintext&lt;br&gt;
-XX:+UseZGC -XX:MaxRAMPercentage=75.0&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
Netty's default pipeline handles high connection counts once OS limits are raised. If you need an explicit ceiling, add a `ChannelHandler` that tracks active connections and rejects beyond your threshold — there is no built-in Ktor property for this.

## Step 3: Understand the Memory Difference

All figures come from k6 load tests against a Hetzner CPX31 (4 vCPU AMD, 8 GB RAM, Ubuntu 22.04). The test held 50K concurrent WebSocket connections idle with a keep-alive ping every 30 seconds, plus sustained 2K RPS of JSON GET requests. Memory measured as RSS via `ps` and heap via JMX after 10-minute steady state with ZGC.

| Metric | Coroutines | Virtual Threads | Platform Threads |
|---|---|---|---|
| Stack memory per task | ~256 bytes–few KB | ~1–2 KB | ~1 MB |
| 50K idle connections (RSS) | ~1.2 GB | ~2.5 GB | ~50 GB (infeasible) |
| Context switch cost | Continuation resume (ns) | Carrier mount/unmount (μs) | Full OS switch (μs) |

Coroutines win because they are compiler-transformed state machines on the heap — no stack frames until they resume. Virtual threads carry a growable stack: far lighter than platform threads, but measurably heavier than a suspended coroutine.

## Step 4: Use Structured Concurrency

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
kotlin&lt;br&gt;
get("/dashboard/{userId}") {&lt;br&gt;
    val (profile, metrics) = coroutineScope {&lt;br&gt;
        val p = async { userService.getProfile(userId) }&lt;br&gt;
        val m = async { analyticsService.getMetrics(userId) }&lt;br&gt;
        p.await() to m.await()&lt;br&gt;
    }&lt;br&gt;
    call.respond(DashboardResponse(profile, metrics))&lt;br&gt;
}&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
If the client disconnects, both coroutines cancel automatically. With virtual threads, you wire up `ExecutorService` shutdown logic manually. JEP 453 (Structured Concurrency) aims to close this gap, but it is still in preview as of JDK 23.

## Gotchas

**The thread-pinning trap.** The docs do not mention this, but `synchronized` blocks pin virtual threads to carrier threads. Kotlin generates `synchronized` in places you might not expect: `lazy` delegates, certain `companion object` initializations, and some coroutine internals.

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
kotlin&lt;br&gt;
// BAD: This PINS a virtual thread to its carrier&lt;br&gt;
val cached: ExpensiveResource by lazy {&lt;br&gt;
    loadFromDatabase() // synchronized under the hood&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;// GOOD: Suspend-aware double-checked lock&lt;br&gt;
private val mutex = Mutex()&lt;br&gt;
private var cached: ExpensiveResource? = null&lt;/p&gt;

&lt;p&gt;suspend fun getResource(): ExpensiveResource {&lt;br&gt;
    cached?.let { return it }&lt;br&gt;
    return mutex.withLock {&lt;br&gt;
        cached ?: loadFromDatabase().also { cached = it }&lt;br&gt;
    }&lt;br&gt;
}&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
Here is the gotcha that will save you hours: if you do use virtual threads, run staging with `-Djdk.tracePinnedThreads=short` to detect pinning before production.

**Over-engineering infrastructure.** A $20 Hetzner VPS handles 50K connections. GKE Autopilot runs ~$150/month, EKS ~$190/month — a 7–10x cost difference. For a typical mobile backend (REST + JSON, health checks, push notification registration), the single VPS handles the load cleanly under 100K DAU. On a single node, use systemd watchdog for auto-restart, blue-green deploys via nginx upstream switching, and journald + Vector for log shipping.

**Ignoring your own health.** Seriously — during long benchmark sessions and deploy marathons, I keep [HealthyDesk](https://play.google.com/store/apps/details?id=com.healthydesk) running for break reminders and guided desk exercises. Your server handles 50K connections; your spine should not have to.

## Conclusion

Default to Ktor coroutines over virtual threads. Memory efficiency is measurably better, structured concurrency is built in, and you sidestep thread-pinning bugs. Delay Kubernetes until you actually exceed single-node capacity — measure your real DAU first, not what you hope it will be.

If you do use virtual threads, audit every `lazy`, `companion object`, and lock in your codebase. Replace with `Mutex`-guarded patterns and trace pinned threads in staging.

**Further reading:** [JEP 453: Structured Concurrency (Preview)](https://openjdk.org/jeps/453) tracks how virtual threads are closing the gap that coroutines handle today.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>webdev</category>
      <category>programming</category>
    </item>
  </channel>
</rss>
