<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Jacob Mei</title>
    <description>The latest articles on Forem by Jacob Mei (@notoriouslab).</description>
    <link>https://forem.com/notoriouslab</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3891574%2Ff101d8bc-2e90-4428-af6d-b464d146d800.jpg</url>
      <title>Forem: Jacob Mei</title>
      <link>https://forem.com/notoriouslab</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/notoriouslab"/>
    <language>en</language>
    <item>
      <title>"AI as Retrieval, Not Generation: Why I Stopped Letting AI Into My Vault (And the Plugin That Came Out of It)"</title>
      <dc:creator>Jacob Mei</dc:creator>
      <pubDate>Mon, 27 Apr 2026 03:38:40 +0000</pubDate>
      <link>https://forem.com/notoriouslab/ai-as-retrieval-not-generation-why-i-stopped-letting-ai-into-my-vault-and-the-plugin-that-came-5fhc</link>
      <guid>https://forem.com/notoriouslab/ai-as-retrieval-not-generation-why-i-stopped-letting-ai-into-my-vault-and-the-plugin-that-came-5fhc</guid>
      <description>&lt;p&gt;Six months into using AI heavily with my notes, I caught myself staring at an Obsidian entry I couldn't remember writing.&lt;/p&gt;

&lt;p&gt;Not in the "wow, I forgot I wrote that" sense. In the practical sense: I genuinely couldn't tell whether the paragraph in front of me was a thought I'd had, or an AI summary I'd accepted, or a paraphrase I'd nodded along with somewhere between the two.&lt;/p&gt;

&lt;p&gt;The skill was intact. I could still write, still think, still synthesize. What had quietly dissolved was the boundary between &lt;em&gt;what I think&lt;/em&gt; and &lt;em&gt;what I read and accepted&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;This post is about why that happens, why "AI should elevate, not replace your thinking" needs a more practical translation, and the three-layer rebuild — plus an open-source Obsidian plugin — that fixed it for me.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Failure Mode Isn't Skill Atrophy. It's Authorship Loss.
&lt;/h2&gt;

&lt;p&gt;The mainstream AI critique frames the danger as skill atrophy: lean on the LLM too much, your underlying ability rots. There's something to that, but it wasn't my failure mode.&lt;/p&gt;

&lt;p&gt;My failure mode was authorship loss — and authorship loss is sneakier, because it doesn't show up in your ability to perform tasks. It shows up months later, when you're trying to retrieve a position you held and realize you can't reconstruct &lt;em&gt;whose&lt;/em&gt; position it actually was.&lt;/p&gt;

&lt;p&gt;If you outsource thinking to AI, you can rebuild the skill. If you let AI ghostwrite into the same container as your own thinking, the contamination is permanent. There's no clean way to subtract back.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Went Wrong: A Single Vault for Everything
&lt;/h2&gt;

&lt;p&gt;My original setup was elegant on paper. One Obsidian vault. AI-generated summaries flowed in (article distillations, meeting notes, reading reflections). My own writing flowed in. Both were tagged, linked, semantic-searched.&lt;/p&gt;

&lt;p&gt;The problem: AI output and human authorship have wildly different epistemic statuses, but in plain markdown they look identical. Six months of accumulation later, I had thousands of notes where the answer to &lt;em&gt;"did I think this, or did the AI summarize this?"&lt;/em&gt; was effectively unrecoverable.&lt;/p&gt;

&lt;p&gt;The "subtle wrongness" wasn't immediate. It crept in via dozens of small moments where I'd skim an AI summary, internally agree, and let it sit in my vault. Later, the agreement got remembered as a position. The position got referenced. The reference got built upon.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Three-Layer Rebuild
&lt;/h2&gt;

&lt;p&gt;I rebuilt with explicit boundaries:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 1: AI for fast processing.&lt;/strong&gt;&lt;br&gt;
LLM tools handle ingestion — long article summaries, transcript condensation, dense PDF extraction. High-throughput, low-judgment work. The output stays in a clearly-labeled scratch directory, &lt;em&gt;not&lt;/em&gt; the main vault.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 2: A slow reflection layer where I rewrite in my own words.&lt;/strong&gt;&lt;br&gt;
This is the step everyone skips. Nothing crosses from Layer 1 to my main vault without me rewriting it. Not paraphrasing — rewriting. Different sentence structure, different emphasis, my own framing of why it matters.&lt;/p&gt;

&lt;p&gt;The friction of rewriting turned out to be the point. &lt;em&gt;That's where the elevation actually happens.&lt;/em&gt; Without that step, AI output passes through me without leaving fingerprints.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 3: Plain markdown as the source of truth.&lt;/strong&gt;&lt;br&gt;
The main vault is plain &lt;code&gt;.md&lt;/code&gt; files in Git. No AI-generated prose lives here unless I've rewritten it. If GitHub disappeared tomorrow, I'd still have my thinking on disk.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Plugin: AI as Retrieval, Not Generation
&lt;/h2&gt;

&lt;p&gt;The rebuild left one open question: &lt;em&gt;can AI play any role inside the main vault, or must it be exiled entirely?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;It can — but only as retrieval, not generation. AI helps me find things I already wrote, without ever generating new content into the vault.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fca7orql6p6al12w9wevu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fca7orql6p6al12w9wevu.png" alt=" " width="800" height="461"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I built &lt;a href="https://github.com/notoriouslab/vault-search" rel="noopener noreferrer"&gt;Vault Search&lt;/a&gt;, an Obsidian plugin that does local semantic search across your notes using embeddings. The use case is the one keyword search keeps failing: I remember thinking about money but my note is titled &lt;code&gt;預算規劃&lt;/code&gt; (budget planning). Keyword search returns nothing. Semantic search returns the right note immediately.&lt;/p&gt;

&lt;p&gt;Three design constraints made it suitable for the three-layer architecture:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. It only retrieves, never generates.&lt;/strong&gt;&lt;br&gt;
The plugin surfaces notes you wrote. It will not write new ones. There is no "AI assistant" that synthesizes a response. Synthesis stays your job, with the right material in front of you.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. It runs locally.&lt;/strong&gt;&lt;br&gt;
Embeddings via Ollama, 8GB RAM is enough. No cloud API key, no upload, no provider lock-in. If Anthropic / OpenAI / whoever shuts down tomorrow, the index keeps working.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. AI-assisted indexing, not AI-authored content.&lt;/strong&gt;&lt;br&gt;
The recommended workflow uses AI to generate 50-100 character semantic descriptions in each note's frontmatter, then indexes those. The descriptions are AI-written but they live in metadata, not in the body. Your prose stays yours.&lt;/p&gt;

&lt;p&gt;There's a hot/cold layering on top — recently-edited and well-linked notes get prioritized in search results, so the plugin nudges you toward what you're actively thinking about, not what's been gathering dust for three years.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Three Rules I Now Follow
&lt;/h2&gt;

&lt;p&gt;The structural translation of "elevate, not replace" comes down to three rules:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Keep AI output in a different container from your own thinking.&lt;/strong&gt; Different folder, different vault, different file convention — whatever lets you tell at a glance which is which.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Force a rewriting step before anything crosses over.&lt;/strong&gt; Not paraphrasing — full rewriting. The friction is the elevation.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Constrain AI's role inside your corpus to retrieval.&lt;/strong&gt; If AI lives in your vault at all, it should help you find what you wrote, never generate new content into the body.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The first two are about preventing contamination. The third is about choosing what role AI gets to play &lt;em&gt;after&lt;/em&gt; the boundary is in place.&lt;/p&gt;

&lt;h2&gt;
  
  
  Closing: Who's Standing at the Shelf
&lt;/h2&gt;

&lt;p&gt;The library metaphor I keep returning to: AI can fetch any book in the library, instantly, in any language. That's a real superpower and I'm not giving it up.&lt;/p&gt;

&lt;p&gt;But the person standing in front of the shelves, picking up a book, deciding what it means to them — that role doesn't transfer. The whole point of a personal knowledge base is that someone is doing that standing-at-the-shelf work, and the someone is you.&lt;/p&gt;

&lt;p&gt;If the AI quietly takes over the standing-at-the-shelf, you don't have a knowledge base anymore. You have a stranger's library labeled with your name.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Vault Search is open-source on &lt;a href="https://github.com/notoriouslab/vault-search" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;. The full origin story: &lt;a href="https://jacobmei.com/blog/2026/0405-3zsgf5/" rel="noopener noreferrer"&gt;why I separated AI from my thinking&lt;/a&gt; | &lt;a href="https://jacobmei.com/blog/2026/0404-n6nst4/" rel="noopener noreferrer"&gt;building retrieval-only AI for Obsidian&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>obsidian</category>
      <category>opensource</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Rescuing EXIF GPS from iPhone 17 HEIC, in a browser-only app</title>
      <dc:creator>Jacob Mei</dc:creator>
      <pubDate>Wed, 22 Apr 2026 02:35:29 +0000</pubDate>
      <link>https://forem.com/notoriouslab/rescuing-exif-gps-from-iphone-17-heic-in-a-browser-only-app-36lj</link>
      <guid>https://forem.com/notoriouslab/rescuing-exif-gps-from-iphone-17-heic-in-a-browser-only-app-36lj</guid>
      <description>&lt;p&gt;My wife teaches outdoor-nature classes in Taipei and drops photos into &lt;a href="https://trailpaint.org/app/" rel="noopener noreferrer"&gt;TrailPaint&lt;/a&gt;, a small browser-only tool I built so she could make trail maps for her class handouts without wrestling with Canva. Drag GPX in, drop photos on the map, export a PNG. The whole thing runs in the tab.&lt;/p&gt;

&lt;p&gt;One of the features she relies on is auto-placement: drop twenty photos and each one lands on its own spot because the browser reads the EXIF GPS tag and drops a card there. It's the part of the product that feels most like magic when it works.&lt;/p&gt;

&lt;p&gt;Then she upgraded to an iPhone 17, and every photo she dropped piled up at the map center.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg77huai6douapkq20n2m.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg77huai6douapkq20n2m.jpg" alt="White-stone lake trail — five spots auto-placed from EXIF GPS"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The diagnosis
&lt;/h2&gt;

&lt;p&gt;First suspicion: exifr. We use &lt;a href="https://github.com/MikeKovarik/exifr" rel="noopener noreferrer"&gt;exifr&lt;/a&gt; as the primary EXIF reader — small, fast, tree-shakes to ~18 KB. It was returning an object with everything &lt;em&gt;except&lt;/em&gt; latitude and longitude. No throw, no warning, just &lt;code&gt;undefined&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;I diffed two HEICs side by side — a 2026 photo from my wife's iPhone 15 Pro (iOS 18, which still parses fine) against a photo from her iPhone 17 (iOS 26.3, which breaks). The raw hex tells the story.&lt;/p&gt;

&lt;p&gt;HEIC is ISOBMFF, the same container format as MP4. The very first box is &lt;code&gt;ftyp&lt;/code&gt;, which declares the major brand plus a list of compatible brands. The first four bytes of the box give its total size in bytes, big-endian:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;iPhone 15 Pro / iOS 18:  ftyp size=44  brands (7):
    heic mif1 MiHB MiHE MiPr miaf heic tmap

iPhone 17      / iOS 26: ftyp size=52  brands (9):
    heic mif1 MiHB MiHA heix MiHE MiPr heic miaf tmap
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note that both files already include &lt;code&gt;tmap&lt;/code&gt;, Apple's marker for tone-mapped adaptive HDR. That isn't what broke things — the iPhone 15 Pro file has it too, and it parses. What broke is the two additional compatible brands in the iPhone 17 output: &lt;code&gt;MiHA&lt;/code&gt; and &lt;code&gt;heix&lt;/code&gt;. Each compatible brand takes four bytes, so two new ones bump the &lt;code&gt;ftyp&lt;/code&gt; box from 44 to 52 bytes.&lt;/p&gt;

&lt;p&gt;Which matters, because exifr has a hard-coded &lt;code&gt;ftyp&lt;/code&gt; length sanity check:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// src/file-parsers/heif.mjs&lt;/span&gt;
&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;ftypLength&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getUint16&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ftypLength&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;iPhone 15 Pro's 44-byte &lt;code&gt;ftyp&lt;/code&gt; passes. iPhone 17's 52-byte &lt;code&gt;ftyp&lt;/code&gt; doesn't. &lt;code&gt;HeifFileParser.canHandle()&lt;/code&gt; returns false, exifr falls through every other file parser, and the caller sees &lt;code&gt;"Unknown file format"&lt;/code&gt; — even though the EXIF payload inside the file is entirely intact. The parser just never got to it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The fix that doesn't bloat the bundle
&lt;/h2&gt;

&lt;p&gt;Three options went through my head. An upstream patch to exifr is the right long-term move, but my wife was sitting on broken photos that weekend. Swapping the whole parser to &lt;a href="https://github.com/mattiasw/ExifReader" rel="noopener noreferrer"&gt;ExifReader&lt;/a&gt; — which does handle these files — would fix it, except ExifReader is roughly five times bigger, and I'd be shipping that weight to every user regardless of whether they're on one of the few HEIC variants that actually triggers the bug. The third option, and the one I went with, is a fallback chain: keep exifr as primary, dynamically import ExifReader only when exifr gives up on a HEIC. (I've also filed this upstream at &lt;a href="https://github.com/MikeKovarik/exifr/issues/138" rel="noopener noreferrer"&gt;exifr#138&lt;/a&gt; with the byte-level repro — the fix is a one-line ceiling bump that would help everyone on exifr.)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;gps&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;meta&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;tryExifr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;file&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;gps&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nx"&gt;meta&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;gps&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;meta&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="c1"&gt;// exifr gave up on the file entirely. Try the heavier parser.&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt; &lt;span class="na"&gt;default&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;ExifReader&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="nx"&gt;buf&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;all&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
  &lt;span class="k"&gt;import&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;exifreader&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="nx"&gt;file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;arrayBuffer&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
&lt;span class="p"&gt;]);&lt;/span&gt;
&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nf"&gt;isBmffSafeToParse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;buf&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;HEIC iloc structure rejected&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;parseWithExifReader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;buf&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;ExifReader&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two things I like about this. First, the main bundle doesn't move — ExifReader lives in its own ~34 KB gzip chunk that only downloads when needed. A user on an older iPhone, on Android, or on a desktop dragging JPEGs pays zero cost for a problem they don't have. Second, the fallback is bounded: it only fires when exifr produced &lt;em&gt;nothing&lt;/em&gt; — no GPS, no metadata — which is the specific failure mode we're trying to rescue. A normal photo that just happens to lack GPS still short-circuits after a single pass.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6vg9wg3nnldluju5gkgc.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6vg9wg3nnldluju5gkgc.jpg" alt="London museum trip — spots auto-placed from HEIC EXIF across multiple venues"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Four hardening doors
&lt;/h2&gt;

&lt;p&gt;Adding a second parser meant adding a second attack surface. TrailPaint's threat model is genuinely small — users drop their own photos into their own browser, the bytes never leave — but "small" isn't "zero", and ExifReader's BMFF code has known rough edges. Four guards, cheapest first:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Null Island rejection.&lt;/strong&gt; Coordinates &lt;code&gt;(0, 0)&lt;/code&gt; sit in the Gulf of Guinea. Almost no one took a photo there. Broken EXIF parsers, on the other hand, return &lt;code&gt;(0, 0)&lt;/code&gt; all the time. We reject it. The cost is that the three people actually photographing buoys near 0°N 0°E have to drag their spots manually; the benefit is that a parser regression can't quietly scatter your photos into the Atlantic.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. &lt;code&gt;DateTimeOriginal&lt;/code&gt; regex anchor.&lt;/strong&gt; The obvious regex &lt;code&gt;/^\d{4}:\d{2}:\d{2}/&lt;/code&gt; looks fine until you realize an earlier draft of mine was &lt;code&gt;/\d{4}:\d{2}:\d{2}/&lt;/code&gt; — no anchor, no &lt;code&gt;$&lt;/code&gt;. That one happily matches the middle of arbitrary strings. Not a security bug, but it was silently accepting garbage dates. Anchor both ends.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. &lt;code&gt;MAX_PHOTO_BYTES&lt;/code&gt; guard.&lt;/strong&gt; The wrapper rejects files larger than 10 MB before any bytes reach the parser. Camera JPEGs run 2–5 MB, recent iPhone adaptive-HDR HEIC with a gain map sometimes pushes 8 MB, Live Photos can nudge higher. 10 MB is generous for a single-frame photo without inviting someone to feed us 50 MB and watch the tab OOM. If a legitimate photo trips it — rare but possible with multi-frame composites — the user's options today are downscale in their Photos app or file an issue and I'll raise the cap; I'd rather start strict and loosen later than the reverse.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. &lt;code&gt;isBmffSafeToParse&lt;/code&gt; pre-scan.&lt;/strong&gt; This is the interesting one. ExifReader walks the ISOBMFF &lt;code&gt;iloc&lt;/code&gt; box to enumerate metadata extents, and it trusts the fields in that box. Two known attack signatures live there. The first is an &lt;code&gt;iloc&lt;/code&gt; where both &lt;code&gt;offset_size&lt;/code&gt; and &lt;code&gt;length_size&lt;/code&gt; are zero; that packs the extent iteration into a &lt;code&gt;65535 × 65535&lt;/code&gt; nested loop whose inner step advances by zero bytes, which pegs the main thread indefinitely. The second is an &lt;code&gt;item_count&lt;/code&gt; inflated to millions or billions — ExifReader dutifully iterates. Real iPhone HEICs carry around 2–15 items (primary image, thumbnail, depth, HDR gain map, EXIF, XMP); anything past that is almost certainly malicious.&lt;/p&gt;

&lt;p&gt;The pre-scan walks the top-level boxes, descends into &lt;code&gt;meta&lt;/code&gt; (which is a FullBox, so its sub-boxes start 4 bytes after the header), finds &lt;code&gt;iloc&lt;/code&gt;, and rejects the two signatures:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Walk top-level boxes, find `meta`&lt;/span&gt;
&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;pos&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;while &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;pos&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="nx"&gt;max&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;size&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;v&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getUint32&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;pos&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;size&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// 64-bit / open-ended — bail, let parser try&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;boxType&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;v&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;pos&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;meta&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// meta is a FullBox: skip 4 bytes of version+flags&lt;/span&gt;
    &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;q&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;pos&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;while &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;q&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="nx"&gt;pos&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;size&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;boxType&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;v&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;q&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;iloc&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;ilocVersion&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;v&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getUint8&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;q&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;sizeByte&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;v&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getUint8&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;q&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;offsetSize&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;sizeByte&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="mh"&gt;0x0f&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;lengthSize&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;sizeByte&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="mh"&gt;0x0f&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="c1"&gt;// Attack 1: non-terminating extent loop&lt;/span&gt;
        &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;offsetSize&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nx"&gt;lengthSize&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="c1"&gt;// Attack 2: absurd item_count&lt;/span&gt;
        &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;itemCount&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;ilocVersion&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;
          &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="nx"&gt;v&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getUint16&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;q&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;14&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
          &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;v&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getUint32&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;q&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;14&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;itemCount&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;ILOC_ITEM_CAP&lt;/span&gt; &lt;span class="cm"&gt;/* 1000 */&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="nx"&gt;q&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="nx"&gt;v&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getUint32&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;q&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="nx"&gt;pos&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="nx"&gt;size&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A few details worth noting. The scan only looks at the first 256 KB of the buffer — metadata boxes sit at the start of an ISOBMFF file, and capping the scan window prevents the pre-scan itself from being a DoS vector on large inputs. If the file doesn't start with &lt;code&gt;ftyp&lt;/code&gt;, the scan returns true and lets the parser decide (we only care about genuine HEIC). The &lt;code&gt;ILOC_ITEM_CAP = 1000&lt;/code&gt; is two orders of magnitude above real files and still cheap to enforce. The scan is roughly 40 lines in the actual source (&lt;code&gt;exifParser.ts&lt;/code&gt;).&lt;/p&gt;

&lt;h2&gt;
  
  
  Why not a Web Worker
&lt;/h2&gt;

&lt;p&gt;The textbook answer to "untrusted parser on user-supplied binary" is "run it in a Worker." I considered it and walked away. The attack surface here is genuinely tiny — users drop their own photos into their own browser, nothing goes to a server, there's no shared link that carries a photo. Against that, a Worker adds a second chunk, buffer transfer via &lt;code&gt;postMessage&lt;/code&gt;, serialization overhead, and async orchestration around what's currently one &lt;code&gt;await&lt;/code&gt;. The &lt;code&gt;iloc&lt;/code&gt; pre-scan is forty synchronous lines that block exactly the class of file I was worried about, at the boundary, before the parser runs. Ceremony without a matching risk is its own cost. If the threat model changes — say I add a server-side path that accepts shared photos — the Worker comes back on the table.&lt;/p&gt;

&lt;h2&gt;
  
  
  Closing notes
&lt;/h2&gt;

&lt;p&gt;The thing I'd take away from this isn't the patch itself but the shape of the problem. BMFF is a moving target: Apple added &lt;code&gt;tmap&lt;/code&gt; with iOS 18, then tacked on &lt;code&gt;MiHA&lt;/code&gt; and &lt;code&gt;heix&lt;/code&gt; somewhere between iOS 18 and iOS 26, and they'll add something else next year. Any parser that reads container metadata is going to need a fallback story rather than a hope, and shipping that fallback via dynamic import is a nice way to keep the happy path cheap for the vast majority of users who never trigger it.&lt;/p&gt;

&lt;p&gt;The other useful shift was matching guards to reality instead of textbook. Hardening a browser tool where the user is both attacker and victim is a different problem than hardening a public upload endpoint. The correct guards here turned out to be smaller than the security-textbook answer would suggest — a cap on file size, two specific &lt;code&gt;iloc&lt;/code&gt; attack signatures, and a couple of value-range checks. No Worker, no WASM sandbox, no quarantine queue. Just the bits that actually matched the risk.&lt;/p&gt;

&lt;p&gt;Source is on GitHub under GPL-3.0: &lt;a href="https://github.com/notoriouslab/trailpaint" rel="noopener noreferrer"&gt;github.com/notoriouslab/trailpaint&lt;/a&gt;. The EXIF pipeline and the four guards live in &lt;code&gt;online/src/core/utils/exifParser.ts&lt;/code&gt;; the size caps are in &lt;code&gt;exifToGeojson.ts&lt;/code&gt;. Upstream issue at &lt;a href="https://github.com/MikeKovarik/exifr/issues/138" rel="noopener noreferrer"&gt;exifr#138&lt;/a&gt; if anyone wants to chime in. Happy to hear what I got wrong — BMFF always has another booby trap.&lt;/p&gt;

</description>
      <category>javascript</category>
      <category>webdev</category>
      <category>ios</category>
      <category>heic</category>
    </item>
  </channel>
</rss>
