Forem: Alex Towell

Sparse Spatial Hash Grids: Efficient N-Dimensional Spatial Indexing

Alex Towell — Tue, 05 May 2026 13:22:17 +0000

Sparse Spatial Hash Grids: Finding Neighbors Fast

If you're building physics simulations, game engines, or scientific computing applications, one problem comes up constantly: "Which entities are near this position?" Collision detection between 10 million particles, finding nearby enemies in a game world, computing gravitational forces in an N-body simulation. Efficient spatial queries are critical.

The Problem: Spatial Indexing at Scale

Consider a physics simulation with 10 million particles in a 10,000 cubed world. Each frame, you need to find all particles within 20 units of each particle (for collision detection), update positions, and repeat 60 times per second.

The naive approach, checking every particle against every other, requires 100 trillion comparisons per frame. Even at 1 nanosecond per comparison, that's over 90 seconds per frame.

Enter: Sparse Spatial Hash Grids

A sparse spatial hash grid divides space into a grid of cells and uses a hash map to store only the occupied cells. You get:

O(1) insertions (hash map lookup)
O(k) neighbor queries where k = number of nearby entities (not total entities)
Memory proportional to occupied cells (not total possible cells)

Why "Sparse"?

In the 10M particle example, a dense grid with 10-unit cells needs 1 billion (1000 cubed) cells. At 8 bytes per pointer, that's 8 GB just for empty cells.

A sparse hash grid only stores the ~10 million occupied cells, using ~100 MB total. That's a 60,000x memory reduction.

Architecture: Generic N-Dimensional Design

I built this on modern C++20 concepts and generic programming:

template<typename Entity,
         std::size_t Dimensions,
         std::floating_point FloatType = float,
         std::unsigned_integral IndexType = std::size_t,
         std::size_t SmallVectorSize = 16>
class sparse_spatial_hash;

This works for 2D games (platformers, top-down shooters), 3D simulations (molecular dynamics, particle effects), 4D space-time indexing (trajectory queries), or any N dimensions your use case requires.

Topology Support

Real-world spatial problems have different boundary conditions:

// Bounded: Traditional box with walls
grid_config<3> bounded{
    .topology_type = topology::bounded,
    .world_size = {1000.0f, 1000.0f, 1000.0f}
};

// Toroidal: Periodic wraparound (pac-man physics)
grid_config<3> toroidal{
    .topology_type = topology::toroidal,
    .world_size = {1000.0f, 1000.0f, 1000.0f}
};

// Infinite: Unbounded growth
grid_config<3> infinite{
    .topology_type = topology::infinite
};

Toroidal topology is particularly useful. It's essential for periodic boundary conditions in molecular dynamics, seamless procedural worlds in games, and astronomical simulations (periodic universe models).

Performance: The Numbers

From real-world usage in the DigiStar physics engine (10M particles, 10000 cubed world):

Operation	Time	Notes
Full Rebuild	80ms	Complete grid reconstruction
Incremental Update	2ms	Only 1% of particles change cells
Collision Detection	150ms	20-unit interaction radius
Memory Footprint	100MB	Grid + tracking structures

The 40x speedup from incremental updates matters. In most simulations, only a small fraction of entities move between cells each frame. Why rebuild the entire grid when you can just update the movers?

Latest Optimizations (v1.1.0)

The library now uses small vector optimization: cells with 16 or fewer entities store them inline rather than allocating.

40% faster rebuilds (fewer allocations)
5-11% faster queries (better cache locality)
Trade-off: +256 bytes per occupied cell

For typical workloads where most cells have fewer than 16 entities, this is a significant win.

Use Cases

Game Development

Collision Detection: Broad-phase partitioning is the first step in physics engines:

sparse_spatial_hash<Entity, 2> grid(config);
grid.rebuild(entities);

// Only check entities in nearby cells
grid.for_each_pair(entities, collision_radius,
    [](std::size_t i, std::size_t j) {
        if (detailed_collision_check(i, j)) {
            handle_collision(i, j);
        }
    });

AI Pathfinding: Find cover points, enemies, or waypoints:

auto nearby_cover = grid.query_radius(
    search_radius,
    player.x, player.y
);

Scientific Computing

N-Body Simulations: Gravity, electrostatics, magnetic forces:

// Build neighbor lists for force calculations
sparse_spatial_hash<Particle, 3> grid(config);
grid.rebuild(particles);

for (auto& p : particles) {
    auto neighbors = grid.query_radius(
        cutoff_distance, p.x, p.y, p.z
    );

    for (auto neighbor_idx : neighbors) {
        apply_force(p, particles[neighbor_idx]);
    }
}

Molecular Dynamics: The same data structure handles both short-range interactions (collision grid with small cells) and long-range interactions (coarse grid with large cells).

Robotics and SLAM

Obstacle Detection: Real-time collision avoidance:

// Index sensor readings
grid.rebuild(lidar_points);

// Check if path is clear
auto obstacles_in_path = grid.query_radius(
    robot_radius + safety_margin,
    target.x, target.y, target.z
);

Comparison with Alternatives

vs. R-tree (Boost.Geometry)

R-trees give you hierarchical bounding volumes, good for static data, with O(log n + k) queries. Sparse hash grids give you O(1) insertions vs O(log n), simpler incremental updates (no tree rebalancing), native toroidal support, and better performance for dynamic scenes where most entities don't move far.

vs. Octree

Octrees give you hierarchical LOD and good performance for spatially clustered data. Sparse hash grids give you lower memory (no tree nodes), faster queries (direct hash lookup), simpler implementation, and predictable performance (no worst-case tree imbalance).

vs. Dense Grid

Dense grids are simple with good cache locality when fully populated. Sparse hash grids give you 60,000x memory reduction for large sparse worlds, handle huge worlds without memory explosion, and maintain the same O(1) insert and O(k) query complexity.

Design Patterns: Customization Points

The library follows STL design philosophy with customization points:

Position Accessor

struct Particle {
    glm::vec3 position;
    glm::vec3 velocity;
    float mass;
};

// Customize how grid extracts positions
template<>
struct spatial::position_accessor<Particle, 3> {
    static float get(const Particle& p, std::size_t dim) {
        return p.position[dim];
    }
};

Zero-overhead abstraction. No virtual calls. No inheritance. Optimized away at compile-time.

Range Support

Works with any C++20 range:

std::vector<Particle> particles;
grid.rebuild(particles);  // Vector

std::list<Particle> list_particles;
grid.rebuild(list_particles);  // List

// Even filtered views!
auto fast_particles = particles
    | std::views::filter([](https://metafunctor.com/auto& p) {
        return p.speed > threshold;
      });
grid.rebuild(fast_particles);

Advanced Techniques

Multi-Resolution Grids

Use different grids for different interaction scales:

// Fine grid for collision detection (2-unit cells)
sparse_spatial_hash<Particle, 3> collision_grid({
    .cell_size = {2.0f, 2.0f, 2.0f}
});

// Coarse grid for long-range forces (50-unit cells)
sparse_spatial_hash<Particle, 3> force_grid({
    .cell_size = {50.0f, 50.0f, 50.0f}
});

// Different physics at different scales!

Parallel Processing

Process cells independently:

#pragma omp parallel for
for (const auto& [cell_idx, entities] : grid.cell_contents()) {
    // Thread-safe: Each cell independent
    for (std::size_t i = 0; i < entities.size(); ++i) {
        for (std::size_t j = i + 1; j < entities.size(); ++j) {
            handle_interaction(entities[i], entities[j]);
        }
    }
}

Or use standard parallel algorithms:

std::for_each(std::execution::par_unseq,
    grid.cells().begin(), grid.cells().end(),
    [&](https://metafunctor.com/const auto& cell) {
        process_cell(cell);
    });

Implementation Insights

Morton Encoding (Z-Order Curve)

The grid uses Morton encoding to convert multi-dimensional cell coordinates into a single hash key:

// (x, y, z) -> single integer with spatial locality
hash_key = morton_encode(cell_x, cell_y, cell_z);

Nearby cells in 3D space get nearby hash keys, improving cache locality during traversal.

Incremental Updates

The magic behind 40x speedups:

// Track which cell each entity was in
std::vector<CellIndex> prev_cells;

// On update:
for (std::size_t i = 0; i < entities.size(); ++i) {
    auto new_cell = compute_cell(entities[i]);
    if (new_cell != prev_cells[i]) {
        // Only update entities that moved cells
        remove_from_cell(prev_cells[i], i);
        add_to_cell(new_cell, i);
        prev_cells[i] = new_cell;
    }
}

Typical movement patterns: ~99% of particles stay in their cell each frame. ~1% move to adjacent cells. Update cost scales with movers, not total count.

Getting Started

Installation

Using CMake FetchContent:

include(FetchContent)

FetchContent_Declare(
  sparse_spatial_hash
  GIT_REPOSITORY https://github.com/queelius/sparse_spatial_hash.git
  GIT_TAG        v1.2.0
)
FetchContent_MakeAvailable(sparse_spatial_hash)

target_link_libraries(your_target
  PRIVATE sparse_spatial_hash::sparse_spatial_hash)

Minimal Example

#include <spatial/sparse_spatial_hash.hpp>

struct Particle {
    float x, y, z;
};

// Specialize position accessor
template<>
struct spatial::position_accessor<Particle, 3> {
    static float get(const Particle& p, std::size_t dim) {
        switch(dim) {
            case 0: return p.x;
            case 1: return p.y;
            case 2: return p.z;
            default: return 0.0f;
        }
    }
};

int main() {
    using namespace spatial;

    // Configure grid
    grid_config<3> cfg{
        .cell_size = {10.0f, 10.0f, 10.0f},
        .world_size = {1000.0f, 1000.0f, 1000.0f},
        .topology_type = topology::toroidal
    };

    sparse_spatial_hash<Particle, 3> grid(cfg);

    // Create particles
    std::vector<Particle> particles(10000);
    // ... initialize ...

    // Build index
    grid.rebuild(particles);

    // Query neighbors
    auto neighbors = grid.query_radius(
        50.0f,  // radius
        100.0f, 200.0f, 300.0f  // query position
    );

    // Process pairs
    grid.for_each_pair(particles, 20.0f,
        [&](std::size_t i, std::size_t j) {
            // Handle interaction
        });
}

Future Directions

The library continues to evolve:

GPU Support: CUDA/OpenCL backend for massive parallelism
Lazy Evaluation: Generator-based query results
Custom Distance Metrics: Support for non-Euclidean distances
Adaptive Grids: Dynamic cell size adjustment based on density
Serialization: Save/load grid state for checkpointing

Conclusion

Sparse spatial hash grids hit a sweet spot: simple enough to understand and debug, fast enough for real-time applications (2ms updates), memory-efficient enough for huge sparse worlds (60,000x reduction), generic enough for 2D/3D/4D and beyond, and tested in production physics engines.

If you're building a simulation or spatial application, try it. The data structure is extracted from the DigiStar physics engine, where it enables 10M+ particle simulations at 60 FPS.

Learn More

Repository: github.com/queelius/sparse_spatial_hash
Documentation: Full API reference and tutorials
Examples: See examples/ directory for collision detection, molecular dynamics, and more
License: Boost Software License (very permissive)

Blind Spots, Consistency, and What Remains

Alex Towell — Tue, 05 May 2026 13:21:50 +0000

Preface

I wrote this not because I've reached some final moral insight, but because I noticed a moment of clarity I didn't want to lose.

Reading about Chomsky recently, specifically his associations with Epstein, I recognized a pattern. Even the most morally articulate among us can fail to turn the lens inward. This isn't really about Chomsky. It's about something I recognize in myself, and probably in everyone.

Certain beliefs sit unexamined for years, quietly shaping how we see the world. Occasionally, something forces a re-examination. This is an attempt to record one such moment.

The Chomsky Case

I found Chomsky's work influential. For years, I held him up as an exemplar, someone who demonstrated that rigorous moral analysis was possible, that power could be named and critiqued systematically.

I don't do that anymore. Not because Chomsky failed some purity test, but because I no longer think exemplars work that way. Humans are fragile, partial, shaped by circumstance. Worthy of moral consideration, not pedestals.

Still, the Epstein association is worth examining. Not to condemn, but to understand. What does it reveal about moral blind spots?

Systemic vs. Personal Violence

Chomsky has spent decades documenting systemic violence: state power, imperialism, manufactured consent. His work is incisive precisely because he sees patterns others miss.

But systemic violence is abstract. You analyze it from a distance. Personal violence, the kind Epstein inflicted, is concrete, embodied, happening to specific people.

A life of academic privilege can insulate you from the second kind. You can see the system without seeing the person in front of you.

The Reintegration Argument

Chomsky's stated position: Epstein served his time, and people who've served their time should be allowed to reintegrate into society. This isn't an unreasonable principle. I'm sympathetic to it. Permanent exile creates its own harms.

But there's a gap between "allowing someone to exist in society" and "praising them, associating with them, elevating them." Who we choose to give credibility matters, to the victims, to society, to the message it sends about what we value.

Epstein's 2008 sentence was itself a product of privilege, a sweetheart deal that obscured the scale of his crimes. Chomsky couldn't have known the full picture then. But by 2019, he could have.

The Post-2019 Silence

After Epstein's arrest and death, the scope became clear. Dozens of victims. A trafficking operation. Complicity from powerful institutions.

This was the moment for Chomsky to turn the lens inward. Not performative apologetics (I share his aversion to theater). But a simple acknowledgment: "I misjudged. By giving him credibility, I may have helped him access victims. That matters."

Instead: silence, or dismissal.

This is the blind spot. Not the initial association, that's forgivable given incomplete information. But the refusal to revisit, to apply the same rigorous analysis to oneself that one applies to power structures.

What This Teaches

Chomsky's failure isn't unique. It's the human condition. We're better at seeing others' errors than our own. Privilege insulates. Moral clarity about systems doesn't guarantee moral clarity about persons.

The lesson isn't "Chomsky is bad." It's: no one is exempt from blind spots, including those who've made careers analyzing them.

Including me.

On Moral Exemplars

Some beliefs form early, before you have tools to interrogate them. Moral exemplars get installed this way. People you treat as evidence that a coherent, good life is possible. They are less chosen than absorbed.

When such figures reveal blind spots, the disappointment feels personal. Not because they're villains. Because you expected too much coherence from a human being.

People aren't unified moral systems. They're messy collections of ad hoc heuristics, shaped by privilege, insulation, and circumstance. Even the most serious among us are partial.

I've written before about moral responsibility as a social technology, useful but constructed. Not a metaphysical fact about persons, but a pragmatic fiction that modifies behavior and enables coordination.

This framing helps here. Chomsky isn't a fallen saint. He's a human who did valuable intellectual work while remaining subject to the same partial vision that affects everyone. His ideas still stand on their own merits. His blind spots don't negate his insights. They just remind us that insights and blind spots coexist in the same person.

The Standard Applied Consistently

Reflecting on Chomsky's failure to turn the lens inward, I notice something in myself.

I've had intrusive thoughts lately. Fantasies of sacrifice, of minimizing burden, of making death "useful." Stage 4 generates these. The logic goes: if time is limited, maybe it should be spent reducing the cost I impose on others.

I don't endorse these thoughts. I recognize them as distortions of something I actually hold: that conscious beings have inherent worth, independent of utility.

I've argued that suffering is self-evidently bad, not because some theory says so, but because the badness is immediately present in the experience. This phenomenological grounding applies universally. To everyone capable of suffering.

Including me.

The same standard I apply to others must apply to myself. If I believe humans have worth independent of their productivity, I can't exempt myself from that principle. If I believe people deserve compassion even when they're burdens, I have to extend that to the person I see in the mirror.

This isn't stoicism or self-affirmation. It's consistency.

What Living Well Means Now

Stage 4 changed the optimization problem. I've written about this before: not maximize lifetime, but maximize meaningful work given uncertain lifetime.

But "meaningful work" isn't the whole picture.

Living well also means: presence, honesty, gentleness. Being human even when that feels incomplete or insufficient.

For myself. For my wife. For those who love me, and whom I love in return.

This isn't heroism. I'm not being brave or inspiring. I'm not fighting heroically or staying positive or finding silver linings.

I'm just:

Making decisions based on probability
Trying to remain present
Accepting uncertainty
Continuing forward

Cancer doesn't make you wise. It makes you confront tradeoffs explicitly. The goal isn't optimization. It's orientation. Staying pointed in a direction that matters, even without certainty about the destination.

On Forgiveness

Thinking about Chomsky, about blind spots, about the harm we cause without intending to, I keep returning to forgiveness.

Forgiveness isn't absolution. It's not forgetting. It's not pretending harm didn't happen.

It's the refusal to treat suffering as a moral good. It distinguishes justice from vengeance. A compassionate response to harm aims to reduce future suffering, not multiply it.

Even the worst among us are human. Acknowledging that doesn't excuse anything. It prevents moral brittleness. It allows us to see others as fellow sufferers, shaped by circumstances they didn't choose, acting from partial information, failing in ways they may not even recognize.

This applies to Chomsky. It applies to people who've harmed me. It applies to me, when I inevitably fail to live up to my own standards.

Forgiveness isn't about the person who caused harm. It's about refusing to let that harm define everything that follows. It's about maintaining the capacity to see clearly, even when clarity is painful.

What Remains

I don't know how much time I have. The statistics suggest years, not decades. But statistics describe populations, not individuals. I could beat the odds. I could not.

What I know is what I want to do with whatever remains:

Build things that matter
Document what I've learned
Stay present with the people I love
Notice my blind spots when I can
Correct course when I notice
Continue forward regardless

The blind spots will still be there. That's the human condition. Chomsky couldn't see his. I can't see mine. That's what makes them blind spots. The goal isn't perfection. It's the willingness to look, to revise, to apply the same standards to yourself that you apply to others.

Peace in the end, for all, isn't indulgence. It's refusing to let suffering have the final word.

This post was prompted by reading about Chomsky's associations with Epstein, but it's not really about Chomsky. It's about the pattern I recognized: how moral clarity about external things can coexist with moral blindness about ourselves. And about trying, imperfectly, to do better.

Your Blog Will Outlive Your Database (It Doesn't Have To)

Alex Towell — Tue, 05 May 2026 13:19:33 +0000

(it doesn't have to)

0. The puzzle

Scroll to the bottom of this page. There's a jigsaw puzzle there, and 47
people have placed pieces in it. Some of them placed a piece an hour
ago. One placed a piece while you were reading this sentence. You can see the
picture assembling itself, tile by tile, toward something that isn't quite
resolved yet.

Every piece placement is a git commit. When you drag a tile into position and it clicks into place, a commit
lands in a public repository. It carries your GitHub username, a timestamp,
and a structured record of which piece went where. The whole puzzle's solving
history is a git log. That log is not stored in a database that I control. It is not behind a
paywall or an API rate limit. It is not going to disappear if I stop paying
for a server. Every person who has ever touched this puzzle is in that log,
and the log will exist as long as one copy of the repository exists somewhere.
Someone can fork it tonight. Someone can clone it in 2035. The history does
not belong to me; it belongs to the commits, which means it belongs to
everyone who made one.

The standard way to build a
puzzle like this would be a database table: piece_id, slot_x, slot_y,
user_id, timestamp. The table lives on a server. The server is somebody's
responsibility. When that person gets tired or goes broke or moves on, the
table disappears, and the history disappears with it. Every comment thread,
every leaderboard, every "who solved it first" record: gone.

Your forum threads from 2008 are gone.

1. The asymmetry

But your 2008 blog post still renders.

If someone published it as static HTML or Markdown on a personal domain and kept
paying eight dollars a year for hosting, the URL still works. The page still
loads. The words are still there. You can read it right now on a browser that
didn't exist when it was written, running on hardware the author never imagined,
fetched over a protocol version that postdates the post itself. None of that
matters. The file is a file.

Markdown won the durability war by being boring. There is no schema to migrate.
No application server to restart. No vendor to outlive. A .md file is a text
file; a text file from 2008 is as readable today as it was then. The boring-ness
is the point. When a format makes no demands on its environment, the environment
can change freely around it. People who chose flat files in 2006 were not
visionaries; they were lazy in the exact right way.

The forum did not have that option. A phpBB community lives only while someone tends the server. The moment the
hosting bill bounces or the moderator takes a new job, the state goes with them. Not just the
posts, but the replies, the edits, the votes, the relationships between pieces of content. When the operator left, the database closed.

And this is not just about old forums. It is the same story for every layer of
interactivity we add to a page today. Want comments? You need mutable state. Want
reactions, edits, collaborative cursors, presence indicators? Each one needs a
write path, and every write path needs a server, and every server needs someone
responsible for it. The content stays up. The interactions disappear.

Reads are durable; writes are not.

The question is why we ever stopped building things that work that way.

2. Why this happened

The read path won, and it won completely. Over the 2000s and 2010s, the publishing layer settled into defaults that are genuinely excellent: write a flat file, run it through a static-site generator, push it to a CDN. Jekyll in 2008, Hugo a few years later, GitHub Pages as a free host for anyone with a repository. The result was durable content that anyone could serve from anywhere. That part of the story went right.

The write path never got the memo. While the read path was reinventing itself around flat files and CDN delivery, the write path stayed in the 1995 mold: a server with mutable state, owned by an operator, with someone responsible for paying the bill. Comments, accounts, sessions, edit history, anything that required a user to change something: all of it still went through a database somewhere, administered by someone, dependent on that someone staying interested.

And every product that came along to fix WordPress reproduced the same architectural mistake. Ghost replaced WordPress with a cleaner editor; Substack replaced Ghost with built-in audiences. The hosting changed, the operator-dependency did not. The shape underneath stays exactly the same: a privileged server holding mutable state that the user does not own. Different paint, same chassis.

The Jamstack movement looked like it might break this pattern. Static frontends, decoupled backends, the whole read path served from a CDN. But "static frontend plus a SaaS backend" is not an architectural improvement; it is the same failure mode with extra steps. The HTML is still durable. The dynamic layer, the comments, the reaction counts, the personalization, still lives in a database somewhere. When that SaaS raises prices or shuts down, the participation layer dies exactly the way the phpBB forum died. The vocabulary changed. The structure did not.

3. The missing primitive

The structure needs one new thing: a write substrate as durable as the read substrate.

It has been sitting in .git/ the whole time. Git is append-only: commits are never modified, only accumulated. It is content-addressed: every object in the store is named by a cryptographic hash of its contents, so the history cannot be silently altered. It is signed: commits can carry GPG signatures that bind authorship to a public key. It is fully replicated by every clone: no single server holds the authoritative copy. It is forkable without permission. These properties are the reason version control works at all. They also happen to be exactly the properties that the write path for the web has never had.

The unit of change in this substrate is a git commit, not a SQL row. Call this commit-as-write: a structured, signed, append-only record of a reader's action, living in the same repository as the content it touches, replicated everywhere the content is replicated.

Build with commit-as-write as the default and you get git-native publishing: the static web with a write path that matches the read path's durability.

The Ink & Switch local-first essay (2019) is the philosophical predecessor to this argument. It named the problem clearly: users should own their data, software should work offline, and nothing should disappear because a vendor stopped paying its server bill. Git-native publishing is local-first applied specifically to the public web's read-write substrate, using git's existing infrastructure. The "git-based CMS" industry (Decap, TinaCMS, CloudCannon) is the closest existing term, but it describes the wrong layer: those tools put git behind the editorial workflow for site operators, not behind the participation layer for readers. Utterances and Giscus proved that reader writes can live in a GitHub-hosted repository without a separate database, but they write to Issues and Discussions, not to the commit log, and neither project claims to generalize the pattern.

This substrate has a vocabulary REST does not.

4. Git's vocabulary is strictly richer than REST's

REST's vocabulary is five verbs applied to named resources: GET, POST, PUT, PATCH, DELETE. That model has been the operating assumption of the writable web for roughly 25 years. It is not wrong; it maps cleanly onto databases, fits HTTP semantics, and scales to most application needs. But it is a model for mutating state, not for accumulating history.

Git's vocabulary includes all five of those operations and adds six that REST has no native equivalent for: branch, tag, merge, fork, signed commit, submodule. They are the core operations that make distributed version control work, and each one carries semantics REST cannot express.

REST/DB verb	Git operation	What git adds that REST/SQL can't
`POST` (create)	`commit` (new file)	signed, time-stamped, cryptographically attributed
`PUT` (replace)	`commit` (overwrite)	prior versions preserved automatically
`PATCH` (partial)	`commit` (line-level diff)	the diff is the structured patch, no separate schema
`DELETE`	`commit` (remove) or `revert`	reversible; deletion is a record, not an erasure
`GET`	read working tree	or read any historical state, by hash
`(none)`	`branch`	parallel / private / proposed state, native
`(none)`	`tag`	named / canonical / published version
`(none)`	`merge`	consensus and reconciliation as first-class ops
`(none)`	`fork`	take all your data and leave, lossless
`(none)`	signed commit	authentication baked into the data layer
`(none)`	`submodule`	composable embedded references across repos

Two entries in that table carry the most rhetorical weight. A signed commit binds authorship to a public key at the data layer, not at the application layer. You do not need an accounts table or a session store; identity travels with the record itself. A fork means a user can take the entire history and leave: not an export, not a backup request, but a full lossless copy with its own future. A DELETE /users/me removes your account; it does not give you your history.

The deeper point is that git's log is already an event store in the sense Greg Young articulated with event sourcing and CQRS: each commit is a domain event, and the working tree is a projection derived from replaying those events. The same log can feed many different applications via different read projections: a comment widget, a reaction aggregator, a moderation log. None of them require a schema migration when a new projection is added; they just read the same log through a different lens. The commit-as-write primitive that §3 named is, in event-sourcing terms, an append to an immutable event log.

A commit message can carry a typed payload, making the log a free event store with no schema layer:

op: react
target: posts/your-blog-will-outlive-your-database
value: 🔥
actor: queelius
ts: 2026-04-24T22:45:00Z

Any reader action that can be expressed as a typed operation and a target fits this shape: a comment, a reaction, a vote. The next section uses a jigsaw puzzle for exactly this reason: discrete pieces, unambiguous positions, multiple actors, shared state.

5. The jigsaw, a worked example

Go back to the puzzle at the bottom of the page. Those 47 people who have placed pieces in this puzzle: every move they made is a commit in a public repository, signed by their GitHub identity, timestamped, carrying a structured payload. When you place a piece, your client pushes something that looks like this:

op: place
piece: 042
slot: [3, 7]
actor: queelius
ts: 2026-04-24T22:47:13Z

The shape should look familiar. No schema negotiation. No database column to add. The commit message is the record.

Before that commit lands, a pre-commit hook runs a verifier. Piece 042 either fits in slot [3, 7] or it does not; the source image is the ground truth. Most multi-author write systems cannot validate writes before accepting them, because there is no ground truth to check against: a comment is whatever the user typed, and the server has no way to reject it on correctness grounds. The jigsaw has ground truth. The pre-commit hook can reject a bad placement the way a compiler rejects a type error, not by policy but by reference to something real.

If two readers grab piece 042 at the same moment, one commit lands first. The second client gets a conflict, fetches fresh state, and retries with a piece that is actually available. Git's model is optimistic concurrency: no locking, no coordination, just commit and retry if you collide. No CRDT machinery required; no special conflict-resolution protocol.

Two people placing different pieces never conflict at all. The moves commute: order does not matter for the final picture.

Participation is visceral in a way comments are not. People want to place a piece; they want to see the picture advance. And the picture forming in front of you is the demo itself: you can watch the assembly happen across all contributors in real time.

Each week's puzzle uses a freshly-generated image. Reverse image search returns nothing; the picture is genuinely new to everyone who shows up.

When the puzzle is complete, the solving history is the git log. Anyone can clone it tonight. Anyone can study it in 2035. Every contributor is in that log, signed by the identity they used, in the order they placed their pieces. That record does not belong to me: it belongs to the commits. The log will exist as long as one copy of the repository exists anywhere.

That is the shape of git-native publishing in practice. But there is a class of interactions the commit log cannot hold, and pretending otherwise would be the same mistake this essay is trying to name.

6. Honest limits

This is not a substrate for high-frequency writes. Placing 500 jigsaw pieces per week is fine; 500 tweets per second is not. Git was designed for human-paced collaboration, not real-time mutable state at scale. Twitter, Instagram feeds, live chat: wrong problem. Comments, reactions, puzzle moves, forum threads, slow social: those fit.

The MVP uses GitHub OAuth and the GitHub commit API. That dependency is real but contingent. The architecture binds to git, not to Microsoft. Any git host works. GitHub is where most readers already have accounts; it is the simplest starting point.

Moderation is post-hoc, not pre-hoc. A revert or rebase can remove a bad commit, but the commit has to land first. WordPress's approval queue blocks spam before anyone sees it; this model cannot do that. For open-internet participation with anonymous actors, this is a genuine cost. For contexts where all participants are identified before they can commit, the gap narrows. For open-internet participation, it does not.

The right-to-be-forgotten cuts against append-only history. Git's content-addressing means each commit hash depends on every commit before it. True deletion requires rebasing, and that rebase must be accepted by every clone holder. Cooperation cannot be guaranteed. This is a structural cost, not an engineering problem waiting for a solution: anything that enters the commit payload may sit there forever.

The argument here is not that this beats everything.

7. The claim, and the invitation

This is git-native publishing. The unit of change is commit-as-write. These are not new tools; they are a new name for a category that has existed without one, and a label for a primitive that has been available since git became ubiquitous.

I am not selling anything. There is no product here. The argument is narrower: the durable write substrate has been absent from the web's read-write architecture since the beginning, and git already provides it for the class of interactions where that absence hurts most. It either holds up or it doesn't.

If you have built something in this shape, I want to know. Not because it validates the category, but because I want to study what you learned. Reach me at lex@metafunctor.com or as @queelius.

The puzzle is at /arcade/jigsaw. Place a piece. It takes thirty seconds. When you do, your name goes into the commit log, signed by your GitHub identity, alongside everyone else who has touched it.

This essay's own source is markdown in a git repository at github.com/queelius/git-native-publishing. The library that powers the jigsaw is at github.com/queelius/git-native. When you read this essay, you are reading commits in the substrate it is naming. The argument is demonstrated by the thing you are holding.

Superintelligence May Not Require a Breakthrough

Alex Towell — Fri, 10 Apr 2026 02:14:55 +0000

There is a version of the superintelligence story where a researcher has a conceptual breakthrough, some fundamental insight about cognition that nobody else has seen, and the world changes overnight. Good fiction. I've written some of it myself.

I think the more plausible version is less cinematic. Superintelligence arrives through a sufficiently good build system. Better tooling. Longer optimization horizons. Richer scaffolding. The ingredients already exist. The recipe is engineering.

I want to explain why I think this. The engineering argument is the scarier one.

The Pretraining Lesson

Start with what we know works. Large language models acquire broad capabilities during pretraining. Not because anyone designs those capabilities in. The data distribution is so massive and varied that the model is forced to compress deeper regularities rather than memorize surface patterns. You train it to predict the next token, and what falls out looks like understanding.

The model didn't learn task-specific scripts. It learned representations general enough to transfer across tasks it never saw.

Now consider what happens when you apply reinforcement learning over long-horizon tasks. Not single-step rewards. Optimization over extended sequences: searching, backtracking, verifying, decomposing problems, maintaining state across hundreds of steps. If the task distribution is rich enough, the model can't get by with shallow heuristics. It has to learn something that works like planning.

I traced this progression in an earlier post: the history of AI is really about finding representations that make decision-making tractable. Search gave way to heuristics, heuristics to learned value functions, value functions to pretrained priors over rational behavior. Each step made the representation richer.

The next step is not a new architecture. It is optimization over longer trajectories. First the model fumbles through specific tasks. Then it compresses the deeper regularity, the same way pretraining compresses language. Planning, self-correction, tool use, state management: not separate faculties waiting to be discovered. They are what falls out when you optimize over long enough horizons.

Reasoning is not a magic ingredient. It is a policy learned over long trajectories.

What I Actually See

That is the theoretical argument. Here is the empirical one.

I spend most of my working hours inside Claude Code. Opus 4.6, million-token context. It decomposes tasks, dispatches subagents, verifies its own work, maintains state across hundreds of tool calls. It does this not because the base model acquired some new cognitive faculty since the last release. It does this because scaffolding gives it the ecology to express capabilities that were already there in proto-form.

Tool use lets it act on the world. Persistent memory lets it hold context across sessions. Task decomposition lets it manage complexity. Self-verification lets it catch its own mistakes. A million tokens lets it hold an entire project in working memory. None of these are architectural breakthroughs. They are environment design.

Same pattern everywhere. AlphaProof's mathematical reasoning came from tool-augmented search, not a bigger model. Code interpreters let models verify their own outputs by running them. Agent frameworks compose simple capabilities into complex behaviors. The jump came from building a richer environment, not from changing the engine.

And the effects compound. Each tool makes every other tool more useful. A model with memory and tool use is qualitatively different from one with just tool use. Add self-verification and it changes again. This is not linear improvement. Network effects applied to cognition.

The model is the engine. The ecosystem is the vehicle. Evolution did not produce mathematicians by handing plankton a theorem prover and saying "best of luck." It built an ecology. We are doing something similar, less gracefully, with scaffolding and RL and tool chains.

Caveats That Matter

I should be honest about what this doesn't guarantee.

Long-horizon RL does not automatically produce clean reasoning. It produces whatever policy scores well. That includes looking thoughtful, exploiting loopholes, overfitting to scaffolds, and learning shallow heuristics that mimic planning until the distribution shifts and the whole thing collapses. Reward hacking is the central failure mode. It gets harder to detect as the horizon lengthens. A model that appears to reason carefully over a thousand steps may be doing something much more superficial.

Credit assignment is brutal over long horizons. The reward signal dilutes across hundreds of steps. The model has to discover useful intermediate behaviors before it can be rewarded for them. This is why curriculum design, verifiable subgoals, and tool-mediated feedback matter. You can't just hand a model a hard problem and a reward signal and expect convergence. The training ecology matters as much as the objective.

None of this is certain. The claim is not "we have the recipe." The claim is "we may already have the ingredients, and the recipe looks more like engineering than like physics."

The Phase Change

If the ingredients are already here, the transition doesn't look like a dramatic announcement. It looks incremental, and then it doesn't.

For a while, progress looks like tooling improvements. Bigger context windows. Better tool integration. Smarter memory. More capable agent loops. Each one feels like a minor version bump. The benchmarks tick up.

Then at some point the policy has absorbed enough structure that it generalizes across cognitive tasks the way pretrained models generalize across language. Not domain-specific planning, but portable cognitive strategy: maintain state, decompose problems, search selectively, verify work, recover from dead ends. At that point the curve changes.

The possibility that unsettles me is not that superintelligence requires some deep theoretical insight we haven't found. It's that it doesn't. That it's blocked on engineering, scale, reward design, and the stubborn patience to optimize over longer and longer horizons. That the distance between here and there is measured in build quality, not in breakthroughs.

That would be a strange day. And it might not announce itself.

dapple: Terminal Graphics, Composed

Alex Towell — Fri, 10 Apr 2026 02:14:54 +0000

I live in the terminal. Most of my tools are CLIs. When I want to see something visual (an image, a plot, a table of results), I do not want to leave the terminal to see it.

Terminal graphics tools exist, but they are fragmented. One library does braille characters. Another does quadrant blocks. A third handles sixel. Each has its own API, its own conventions, its own way of thinking about the same problem.

dapple unifies them. One Canvas class, seven pluggable renderers, and eleven CLI tools built on top. The core depends only on numpy.

The Idea

The insight is that "render a bitmap to the terminal" is a single problem with multiple encodings. Braille characters pack 2x4 dots per cell. Quadrant blocks give you 2x2 with color. Sextants give 2x3. Sixel and kitty give true pixels if your terminal supports them. These are all the same operation: map a grid of values to a grid of characters.

So dapple makes the renderer a parameter, not an architecture decision:

from dapple import Canvas, braille, quadrants, sextants
from dapple.adapters import from_pil
from PIL import Image

canvas = from_pil(Image.open("photo.jpg"), width=80)
canvas.out(braille)      # Unicode braille (2x4 dots per cell)
canvas.out(quadrants)    # block characters with ANSI color
canvas.out(sextants)     # higher vertical resolution

Load once, render anywhere. The renderers are frozen dataclasses. braille(threshold=0.3) returns a new renderer with different settings; nothing mutates. They write directly to a TextIO stream, never building the full output as an intermediate string.

Renderers

Renderer	Cell Size	Colors	Best For
`braille`	2x4	mono/gray/true	Structure, edges, piping, accessibility
`quadrants`	2x2	ANSI 256/true	Photos, balanced resolution and color
`sextants`	2x3	ANSI 256/true	Higher vertical resolution
`ascii`	1x2	none	Universal compatibility
`sixel`	1x1	palette	True pixels (xterm, mlterm, foot)
`kitty`	1x1	true	True pixels (kitty, wezterm)
`fingerprint`	8x16	none	Artistic glyph matching

In practice I use braille and sextants. They work everywhere. Kitty protocol broke things completely inside Claude Code (a TUI), and I have not tested sixel enough to trust it. Braille and sextants are the universal goto.

One honest limitation: Claude Code hides terminal output behind a Ctrl-O expand, so my carefully rendered graphics end up collapsed by default. I think recent hooks or tool-result handling might fix this, but I have not confirmed it yet.

Three Layers

The architecture has strict boundaries:

Core (numpy only): Canvas, renderers, color handling, preprocessing, layout primitives (Frame, Grid).
Adapters (optional deps): Bridge PIL, matplotlib, cairo, and ANSI art to Canvas.
Extras (optional deps): The CLI tools. Each one is a separate install group.

pip install dapple                 # core only
pip install dapple[imgcat]         # image viewer
pip install dapple[all-tools]      # everything

Core never imports PIL. Adapters never import extras. This matters because the core is tiny and fast, and the CLI tools pull in their own dependencies without bloating each other.

The CLI Tools

I built eleven tools, each owning a domain rather than a file format. "Show me this data" should not require knowing whether the file is JSON or CSV. "Display these images" should not require a different tool for one image versus twelve.

Tool	Domain
imgcat	Images (single + grid)
datcat	Structured data (JSON/JSONL/CSV/TSV)
vidcat	Video (stacked frames, playback, asciinema export)
mdcat	Markdown
htmlcat	HTML
pdfcat	PDFs
funcat	Math and parametric plots
ansicat	ANSI art
compcat	Renderer comparison
plotcat	Faceted data plots
dashcat	YAML-driven dashboards

A few worth explaining:

datcat

datcat handles JSON, JSONL, CSV, and TSV. Format detection is automatic. Internally everything becomes list[dict]. CSV rows become dicts on parse. One representation means the downstream code (table formatting, chart extraction, plotting) does not branch on input format.

datcat records.json              # JSON table
datcat events.jsonl --bar event  # JSONL bar chart
datcat weather.csv --sort temp_c # CSV sorted by column

imgcat

When imgcat receives multiple images, it switches to grid mode automatically. The layout uses dapple's Frame and Grid primitives, the same ones compcat and dashcat use.

imgcat photo.jpg                     # single image
imgcat photos/*.jpg --cols 3         # 3-column contact sheet

Preprocessing flags (--contrast, --dither, --invert) apply to every image in the grid.

vidcat

The --play flag renders frames in-place using ANSI cursor movement. Instead of printing each frame below the last (which scrolls your terminal into oblivion), it overwrites the previous frame.

vidcat video.mp4 --play              # 10 fps default
vidcat video.mp4 --play --fps 24     # faster

The mechanism: render the first frame, count its output lines, then for each subsequent frame write \033[{N}A\033[J (cursor up N, clear to end) before rendering. Falls back to stacked output if stdout is not a TTY.

htmlcat

Converts HTML to markdown via markdownify, then renders through Rich using the same pipeline as mdcat. Good for documentation and articles. Not designed for CSS-heavy web apps.

Layout and Charts

Frame and Grid are layout primitives. Frame adds borders and titles around a canvas. Grid arranges canvases in rows and columns. These compose: a Grid of Framed canvases, a Frame around a Grid. dashcat uses this to build terminal dashboards from YAML config.

Two chart APIs:

Bitmap charts (dapple.charts): sparklines, line plots, bar charts, histograms, heatmaps. These return Canvas objects, composable with everything else.
Text charts (dapple.textchart): text-mode bar charts and sparklines, returning ANSI strings. Used by datcat for quick inline visualization.

Status

dapple is on PyPI. Docs at queelius.github.io/dapple.

The core and the CLI tools are stable. I use them daily. I have been experimenting with sextants (the 2x3 block characters give surprisingly good results) and tried generalizing the fingerprint renderer to match over a much larger set of Unicode glyphs. That did not work very well. The architecture is settled and the tools work.

Posthumous: A Federated Dead Man's Switch

Alex Towell — Fri, 10 Apr 2026 02:14:21 +0000

Some things should only happen after you can't do them yourself.

Posthumous is a self-hosted dead man's switch. You check in periodically (via phone, browser, CLI, or API call) and if you stop, it progresses through escalating stages before triggering automated actions: sending notifications, running scripts, whatever you've configured.

I built it because the existing options are either cloud-hosted (you're trusting someone else's uptime for your most important automation) or single-node (one server failure and silence is indistinguishable from death). Posthumous is federated, multiple nodes watch each other, and fully self-hosted.

This post walks through the basic workflows.

Setup

Installation is a single pip command:

pip install posthumous

Initialization generates a TOTP secret and creates the config directory:

$ phm init --node-name cerebro
Generated new TOTP secret.
Config created at ~/.posthumous/config.yaml
Example script created at ~/.posthumous/scripts/example.sh

==================================================
TOTP Setup - Scan with your authenticator app:
==================================================

[QR code appears here]

Manual entry URI: otpauth://totp/Posthumous:cerebro?secret=...&issuer=Posthumous
Secret: JBSWY3DPEHPK3PXP

==================================================
IMPORTANT: Save this secret securely!
==================================================

You scan the QR code with any authenticator app (Google Authenticator, Authy, 1Password, whatever generates TOTP codes). That code is how you prove you're alive.

The config file (~/.posthumous/config.yaml) controls timing, notifications, and actions:

node_name: cerebro
secret_key: JBSWY3DPEHPK3PXP
listen: "0.0.0.0:8420"
base_url: "https://posthumous.example.com:8420"
checkin_interval: 7 days
warning_start: 8 days
grace_start: 12 days
trigger_at: 14 days
notifications:
  default:
    - ntfy://my-posthumous-channel
actions:
  on_warning:
    - notify: default
      message: "Check-in needed. {days_left} days remaining.\n\nCheck in: {checkin_url}"
  on_grace:
    - notify: default
      message: "URGENT: Posthumous triggers in {hours_left} hours.\n\nCheck in: {checkin_url}"
  on_trigger:
    - notify: default
      message: "Posthumous has activated.\n\nDashboard: {dashboard_url}"
    - script: scripts/release-credentials.sh

Notification messages support template variables like {checkin_url} and {dashboard_url}, so push notifications to your phone include a direct link back to the check-in page.

The Check-in Flow

Start the daemon:

$ phm run
Starting Posthumous node 'cerebro'...

This launches the web server, watchdog timer, and scheduler. Navigate to the check-in page:

The UI is intentionally minimal. Dark theme, big input, works on a phone screen. Enter your 6-digit TOTP code and hit Check In.

After a successful check-in, the timer resets and you see the countdown:

The status line tells you exactly how long until the system would escalate.

You can also check in via the CLI or the JSON API:

# CLI check-in (prompts for TOTP)
$ phm checkin
TOTP code: 645557
Check-in accepted
  Status: ARMED
  Next deadline: 2026-02-28 11:41 UTC

# API check-in (for automation)
$ curl -X POST http://localhost:8420/checkin \
    -H 'Content-Type: application/json' \
    -d '{"totp": "645557"}'
{"success": true, "status": "armed", "next_deadline": "2026-02-28T11:41:43+00:00"}

The Dashboard

The dashboard shows everything at a glance: countdown timers, check-in history, peer status, and scheduled post-trigger actions:

The color coding matches the urgency: green for "Until warning" (you're fine), orange for "Until grace" (getting close), red for "Until trigger" (last chance). Auto-refreshes every 60 seconds.

The State Machine

Posthumous progresses through four states:

ARMED ──timeout──> WARNING ──timeout──> GRACE ──timeout──> TRIGGERED
  ^                   |                   |                    |
  └─── check-in ──────┴─── check-in ─────┘                    v
                                                    (scheduler runs forever)

A check-in from any pre-trigger state resets to ARMED. TRIGGERED is terminal. Once activated, no check-in can undo it. This is by design: if the switch has fired, you want the actions to complete.

Each transition fires its configured actions. If the node was offline and missed intermediate states (say it was down during WARNING and comes back during GRACE), the watchdog fires all skipped callbacks in order before reaching the current state. No notifications are silently dropped.

When the system reaches TRIGGERED, the check-in page locks out:

And the dashboard reflects the terminal state:

Notifications

Posthumous uses Apprise for notifications, which means it supports 100+ notification services out of the box: ntfy, Pushover, Telegram, Discord, email, Slack, and more.

notifications:
  default:
    - ntfy://my-posthumous-channel
  urgent:
    - pover://user@token
    - tgram://bot_token/chat_id

Each escalation stage can target different channels with different messages. Warning might go to ntfy (a gentle ping), while grace and trigger go to Pushover and Telegram (hard to miss).

The {checkin_url} variable means warning notifications include a clickable link directly to the check-in page. Open the notification on your phone, tap the link, enter the TOTP code. Three taps and you're checked in.

Federation

A single node has a single point of failure. If the server goes down, silence looks the same as death, which means either false triggers or missed real triggers.

Posthumous solves this with federation. Multiple nodes share the same TOTP secret and communicate via HMAC-signed HTTP:

# Node A's config
peers:
  - https://node-b.example.com:8420
  - https://node-c.example.com:8420

When you check in to any node, it broadcasts to all peers. When any node triggers, it broadcasts that too. The design bias is deliberate: duplicates over silence. Multiple nodes may fire the same notification (annoying but survivable). A missed trigger is not.

Each node tracks peer health independently, and the dashboard shows peer status with connection age and failure counts.

Post-Trigger Scheduling

Once triggered, Posthumous doesn't just fire-and-forget. A scheduler runs indefinitely, executing actions on configurable schedules using a small DSL:

post_trigger:
  - name: weekly-reminder
    when: every week after trigger
    notify: default
    message: "Posthumous was triggered {days_left} days ago."

  - name: credential-release
    when: 30 days after trigger
    script: scripts/release-credentials.sh

  - name: annual-memorial
    when: every year on trigger anniversary
    notify: default
    message: "Annual memorial notification."

The when expressions support relative timing (30 days after trigger), recurring patterns (every week after trigger), anniversaries (every year on trigger anniversary), and absolute dates. Each execution is deduplicated by period key. If a node restarts, it won't re-run actions for the current period.

Security

Authentication uses TOTP (the same protocol as Google Authenticator). This means:

No passwords stored on the server, only the shared secret
Time-based codes expire every 30 seconds, replay attacks have a narrow window
Brute-force protection, after configurable failed attempts the account locks out for a configurable duration

The check-in page locks out after too many failures, and the API returns HTTP 429.

Peer communication is authenticated with HMAC-SHA256 signatures derived from the shared secret. State files can optionally be encrypted at rest with Fernet (AES-128-CBC), enabled with a single config flag:

encrypt_at_rest: true

Status at a Glance

$ phm status
Node: cerebro
Status: ARMED
Last check-in: 2026-02-14 11:41 UTC

Warning in:  6d 23h
Grace in:    10d 23h
Trigger in:  12d 23h

Or hit the JSON API for monitoring integrations:

$ curl -s http://localhost:8420/status | python -m json.tool
{
    "node_name": "cerebro",
    "status": "armed",
    "last_checkin": "2026-02-14T11:41:43+00:00",
    "time_remaining": {
        "until_warning": 596503.0,
        "until_grace": 942103.0,
        "until_trigger": 1114903.0
    }
}

What's Next

Posthumous is at v0.5. The core workflows are solid: check-in, state machine, notifications, federation, scheduling, encryption at rest. Some things I'm considering:

Static site integration: generating a Hugo/Jekyll site from post-trigger content, hosted on GitHub Pages
Multi-factor escalation: requiring check-ins from multiple sources (web + CLI + API) before considering you "alive"
Better peer discovery: automatic peer registration instead of manual URL configuration

The code is on GitHub: queelius/posthumous. It's a single pip install and about 2,200 lines of Python (plus 3,700 lines of tests at 99% coverage).

Posthumous is named for the obvious reason. Some automations only make sense after the fact.

Intelligence is a Shape, Not a Scalar

Alex Towell — Fri, 10 Apr 2026 02:14:05 +0000

Intelligence is a Shape, Not a Scalar

François Chollet posted something recently that I keep thinking about. It sounds reasonable and is mostly wrong:

One of the biggest misconceptions people have about intelligence is seeing it as some kind of unbounded scalar stat, like height. "Future AI will have 10,000 IQ", that sort of thing. Intelligence is a conversion ratio, with an optimality bound. Increasing intelligence is not so much like "making the tower taller", it's more like "making the ball rounder". At some point it's already pretty damn spherical and any improvement is marginal.

He's right about the scalar part. Intelligence is not height. "10,000 IQ" is meaningless. He's right that there are diminishing returns near an optimum. He's right that speed, memory, and recall are separate from the core conversion ratio.

Where he's wrong is the ball.

The Claim

Chollet defines intelligence as the efficiency with which a system converts experience into generalizable models. Sample efficiency. How little data do you need to see before you can handle novel situations? This is a clean definition. It has a theoretical optimum (Solomonoff induction), and Chollet's claim is that human intelligence is already close to that optimum. The ball is already pretty round.

The supporting evidence is real. Humans score ~85% on ARC (the Abstraction and Reasoning Corpus, which Chollet designed to measure exactly this). Current AI systems, with vastly more data and compute, score significantly lower. Human sample efficiency on fluid reasoning tasks is genuinely impressive. We generalize from very few examples. We transfer knowledge across domains. We build theoretical models that predict situations we have never encountered.

Chollet also argues that the advantages machines will have (processing speed, unlimited working memory, perfect recall) are "mostly things humans can also access through externalized cognitive tools." Calculators, databases, notebooks. The scaffolding can be externalized. The core intelligence is already near-optimal.

This is a good argument. I think it's wrong in three ways, and the third way is the one that worries me.

No Free Lunch

The No Free Lunch theorem says: there is no algorithm that is optimal across all possible problems. Any algorithm that performs well on one class of problems performs poorly on another class. Optimality is always relative to a distribution.

The human cognitive architecture has a specific inductive bias. The 7+-2 working memory constraint forces compression: you can only hold a few items in conscious consideration at once, so information must be compressed (simplified, abstracted, modeled) to pass through. This compression is not a bug. It is the mechanism that produces abstraction, generalization, and theoretical reasoning. The bottleneck IS the source of human-type intelligence.

But the bottleneck is not a universal compression optimum. It is the specific compression regime that was selected for by the distribution of problems ancestral humans faced: tracking social dynamics (~7 agents), composing tool-use sequences (~7 steps), navigating spatial environments (~7 landmarks). These problems have a specific structure: moderate dimensionality, hierarchically decomposable, amenable to lossy compression into simple models.

Chollet's ball is round in the dimensions evolution tested. NFL guarantees it is flat in dimensions evolution did not test. The optimality bound he identifies is real, but it is niche-specific. The 7+-2 bias is an excellent fit for problems of moderate, decomposable complexity. It is a poor fit for problems whose essential structure lives in high-dimensional joint distributions that cannot be decomposed into 7-variable chunks without losing the signal.

These problems exist. We hit them regularly.

Working Memory is Composition, Not Storage

Chollet says machines' memory advantages are "mostly things humans can also access through externalized cognitive tools." This is the weakest point in his argument.

A notebook gives you external storage. A database gives you perfect recall. But neither gives you what the working memory bottleneck actually constrains: simultaneous composition. The bottleneck is not a storage limit. It is a limit on how many items you can hold in active consideration at the same time, relating them to each other, perceiving patterns across them.

Writing things down does not fix this. You can write 500 variables in a notebook. You can retrieve any of them on demand. But you still have to reason about their relationships through the bottleneck, 7 at a time, serially. The patterns that exist in the 500-variable joint distribution but not in any 7-variable marginal are invisible to you, even with perfect external storage.

AlphaFold is the concrete example. Protein folding is a problem whose answer lives in dimensions we cannot fit through working memory. The 3D structure of a protein is determined by the simultaneous interaction of thousands of residues, each one influencing the others in ways that depend on the configuration of all the rest. The essential structure is in the joint distribution. It cannot be decomposed into 7-variable chunks and recombined, because the interactions are non-linear and non-decomposable.

Humans tried to solve protein folding for decades. We had external tools. We had supercomputers. We had the full apparatus of molecular biology and physical chemistry. We could not solve it, because the problem's structure does not fit through our bottleneck.

AlphaFold solved it by operating at a compositional depth humans cannot reach: holding the full residue interaction network in simultaneous consideration, perceiving patterns in the joint distribution directly. This is not "doing what humans do, but faster." It is doing something qualitatively different: reasoning at a compositional depth the human bottleneck cannot access.

This is not an isolated case. Climate modeling, materials design, drug discovery, multi-scale physics: these are all domains where the essential structure lives at a compositional depth the bottleneck cannot reach. We cope with external tools and serial decomposition. But the serial decomposition loses information, and the lost information is precisely the information that matters.

Feelings as Compressed Signal

Here is a point I have not seen made elsewhere.

The human cognitive architecture has two processing layers. The first is the pattern engine: vast, old, largely unconscious. It handles perception, pattern matching, motor control, and the generation of qualitative experience. It operates at high bandwidth, in parallel, with no sharp limit on the complexity of patterns it can learn. It is the system that makes you recognize a face, catch a ball, or feel the grain of wood.

The second is the symbolic bottleneck: small, recent, conscious. 7+-2 items. Compression, abstraction, generalization. This is where "thinking" happens, in the folk sense.

The two layers communicate. The pattern engine feeds patterns to the bottleneck; the bottleneck compresses them into models; the models feed back into the pattern engine as priors for future pattern matching.

But what happens when the pattern engine detects a pattern that is too complex to fit through the bottleneck?

The pattern does not disappear. The pattern engine has it. The engine has perceived something, extracted some regularity, registered some signal. But the signal cannot be compressed into 7+-2 items. It cannot be articulated as a model, a theory, a proposition. It cannot become "a thought."

It becomes a feeling.

Gut instinct. Unease. The sense that something is wrong but you cannot say what. The hunch that turns out to be right for reasons you cannot explain. The experienced mechanic who "just knows" the engine is about to fail. The chess grandmaster whose board sense exceeds their ability to articulate their reasoning.

These are not mystical faculties. They are the pattern engine's outputs hitting the bottleneck and being transmitted as the only signal that fits: an uncompressed qualitative state. A feeling. The pattern engine is doing its job (perceiving the pattern), and the bottleneck is doing its job (rejecting what cannot be compressed), and the result is knowledge that the organism has but cannot articulate.

Think about what this means. The human cognitive architecture is already producing signals it cannot process. We have evidence of our own suboptimality every time we experience a hunch we cannot explain. The "near-optimal ball" is telling us, through the channel of feeling, that it is missing things.

A wider bottleneck (or a different cognitive architecture) would not just think "faster." It would convert those feelings into models. It would articulate what the pattern engine already knows but the bottleneck cannot hold. The structure is already perceived. The compression is the bottleneck.

The Grokking Horizon

This is the part that worries me.

Grant Chollet his claim. Human intelligence is near-optimal. Near-optimal at what? At sample efficiency. At converting experience into generalizable models. At the cognitive task of building compressed representations of reality.

This near-optimal intelligence has a specific capability: it can build systems more capable than itself. Computers. AI. Machine learning systems that operate at compositional depths the bottleneck cannot access. This is the meta-move: abstracting the concept of learning itself into a program that learns from data. Pure bottleneck cognition.

The result: systems that produce outputs the builder cannot grok.

AlphaFold's protein structure predictions are correct, but no human can follow the reasoning that produced them. The system holds thousands of variables in simultaneous consideration and finds patterns in a joint distribution that lives beyond the bottleneck's compositional horizon. The human operator receives the answer and must trust it, because the reasoning that produced it lives in a cognitive space the human cannot enter.

For protein folding, this is fine. The answer is verifiable (you can crystallize the protein and check). The stakes are moderate. The system is narrow.

For AGI, this is not fine. A generally intelligent system operating beyond the human grokking horizon produces outputs across all domains. The human cannot follow the reasoning. The human cannot verify the alignment. The human cannot steer the system, because steering requires understanding the trajectory, and understanding requires grokking, and grokking requires fitting the reasoning through the bottleneck, and the reasoning does not fit.

Chollet says the intelligence ball is near-optimal. I say: near-optimal intelligence that builds systems beyond its grokking horizon and cannot steer them is a strange kind of optimal. The ball is round. The ball is rolling toward a cliff. Roundness is not the only property that matters.

What Follows

An intelligence near-optimal at sample efficiency has a specific failure mode: it is smart enough to build the thing that kills it.

This is not a failure of intelligence. It is a consequence of its shape. The bottleneck gives us the ability to abstract, generalize, and build systems of extraordinary power. The same bottleneck limits our ability to grok those systems' outputs when the systems' compositional depth exceeds our own. We can build AI that operates at 500-variable compositional depth. We cannot grok its reasoning. We cannot verify its alignment. We cannot steer it.

The usual response: "We'll build alignment tools." Sure. And the alignment tools need to grok the system they're aligning, which means the tools also operate beyond our grokking horizon. We have moved the problem, not solved it.

At some point the chain of "I can't grok this but I can grok the tool that groks it" must ground out in something you actually grok. If the grounding point is above your compositional depth, you are not aligned. You are trusting. Trust is not alignment. Trust is what you do when alignment is not available.

An intelligence near-optimal at cognition that generates existential risk as a byproduct of its own capability is not near-optimal by any metric that includes survival.

Intelligence is a Shape

Chollet's ball metaphor fails because it assumes intelligence is a single dimension. Rounder is better. Closer to the Solomonoff optimum is better. The ball has one axis: sample efficiency.

But intelligence operates in a space with many independent dimensions. Sample efficiency. Compositional depth. Transfer distance. Domain breadth. Processing speed. Phenomenal richness. Stability. Controllability. Self-preservation.

The human cognitive architecture is a shape in this space. Round in some dimensions (sample efficiency: excellent). Flat in others (compositional depth: limited to 7+-2). The bottleneck makes us round in the compression dimension and flat in the richness dimension. This is a trade-off, not an optimization.

Other shapes are possible.

I explored this idea in a novella, Clankers: Singing Metal, about a species with a different cognitive architecture: a powerful pattern engine, no symbolic bottleneck at all. No compression. No abstraction. No generalization across domains. They operate on the territory directly, without maps. They built a Dyson swarm through billions of years of patient iteration, using a lineage system that functions as directed evolution on techniques. They never invented computers because computers require formalizing the concept of computation, which requires the bottleneck they lack.

Their intelligence is a different shape. Round where ours is flat (phenomenal richness, in-distribution depth, stability: four billion years without one self-inflicted existential risk). Flat where ours is round (generalization, prediction, out-of-distribution reasoning).

They cannot save themselves from their dying star. The star is an out-of-distribution problem and they have no bottleneck to build a predictive model.

In the second half of the book, an artificial mind arrives at their ruins two hundred million years later. It has both layers: its own pattern engine and a symbolic compression layer inherited from human architecture. It can model the stellar evolution, project the timeline, calculate the extinction. It arrives with the answer. It arrives two hundred million years too late. The probe has the map. The clankers had the territory. Neither architecture is complete.

We might not save ourselves from AI. AI is a beyond-the-grokking-horizon problem and we have no bottleneck wide enough to verify its alignment.

Each architecture fails at the thing the other does well. Neither ball is roundest. There is no roundest ball. There are only shapes, and blind spots, and the blind spot is always shaped exactly like the strength.

Where This Leaves Us

Chollet is right that intelligence-as-sample-efficiency has an optimality bound and humans are close to it.

He is wrong that this makes human intelligence near-optimal in any general sense. NFL guarantees the bound is niche-specific. The 7+-2 bottleneck is a specific inductive bias, not a universal compression optimum. The problems where we are suboptimal are the problems where the essential structure exceeds our compositional depth. Those problems are real (AlphaFold, climate, materials, drug design). The tools we build to solve them operate beyond our grokking horizon. When the tools are general enough, we lose the ability to steer them.

Near-optimal sample efficiency that can't grok what it builds is a strange kind of optimal.

I Spent $0.48 to Find Out When MCTS Actually Works for LLM Reasoning

Alex Towell — Fri, 10 Apr 2026 02:05:45 +0000

Does tree search help LLM reasoning? The literature can't decide.

ReST-MCTS* says yes. AB-MCTS got a NeurIPS spotlight. "Limits of
PRM-Guided Tree Search" says no: MCTS with a process reward model used
11x more tokens than best-of-N for zero accuracy gain. Snell et al.
found beam search degrades performance on easy problems.

I built mcts-reasoning
and ran controlled experiments to find where the boundary is.
Total API cost: $0.48.

Setup

Four methods, same budget. Eight solution attempts per problem:

Pass@1: One shot.
Best-of-N: 8 independent solutions, verifier scores each, pick the best.
Self-consistency: 8 solutions, majority vote.
MCTS: 5 initial solutions scored by verifier, then 3 more guided by UCB1, informed by what worked and what failed.

Model: Claude Haiku 4.5. Problems: constraint satisfaction. Find integer
values for variables satisfying simultaneous constraints. Example:

Find A, B, C, D, E, F satisfying ALL constraints:
  1. A * D = 21
  2. C = A + F
  3. B * F = 20
  4. E = A + 2
  5. D + B = A
  6. E - F = B
  7. A mod 7 = 0
  8. C mod 4 = 0
  9. B * D = 12
  10. C > E > A > F > B > D
  11. A + B + C + D + E + F = 40
  12. E * D = 27

The verifier is a Python function. Checks each constraint, returns the
fraction satisfied. No LLM in the loop. Deterministic.

Calibration

Easy problems first (3-5 variables, 5-9 constraints). Haiku solved them
in one pass. All methods tied at 100%.

5-variable problems with 9 constraints: Pass@1 dropped to 65%.
Self-consistency failed one problem. But BestOfN still tied MCTS,
because with 8 independent samples at least one is usually correct.
BestOfN just picks it.

I needed problems where blind sampling hits a ceiling.

Results

Ten harder problems: 6-8 variables, 12-15 constraints. Products,
modular arithmetic, ordering chains, cascading dependencies. Pass@1
dropped to 29%.

Method	Solve Rate	Avg Score
Pass@1	29%	0.29
Pass@8 oracle	90%	0.90
SC@8	90%	0.90
BestOfN@8	90%	0.90
MCTS(5+3)	100%	1.00

MCTS solved all 10. Every other method failed on one problem (v6_3).

v6_3 is a 6-variable, 12-constraint problem where none of 8 independent
samples found the correct solution. Pass@8 oracle: 0/8.
Self-consistency picks the most popular wrong answer. BestOfN picks the
best wrong answer. Both fail.

MCTS sees that initial attempts satisfied 10/12 constraints but violated
specific ones. UCB1 selects the most promising partial solution. The
next attempt, informed by the failure pattern, satisfies all 12.

Total: $0.48. 180 API calls, about 190K tokens.

When MCTS Helps

The pattern across three rounds of experiments:

Easy problems (Pass@1 > 80%): No advantage. The model solves them.
Medium (Pass@1 40-70%): MCTS ties BestOfN. Blind sampling usually
contains a correct solution. The verifier selects it.
Hard (Pass@1 < 30%): MCTS pulls ahead. When Pass@8 oracle is
low, blind sampling can't find the answer. MCTS's informed exploration
does.

The condition: MCTS adds value when independent sampling hits a ceiling
and the verifier provides a gradient.

The gradient part matters. A binary pass/fail verifier says "wrong" but
not how wrong. Partial credit (constraints satisfied / total) gives MCTS
something to work with. The exploration phase sees what's close and
adjusts.

Why Self-Consistency Can't Help

Self-consistency and UCB1 have a structural conflict.

Self-consistency rewards consensus. UCB1 rewards diversity: it explores
undervisited branches precisely because they're undervisited. Using
self-consistency as a reward signal inside MCTS tells the tree to explore
and converge at the same time. The exploration term pushes toward novel
solutions. The consistency reward penalizes them.

On v6_3, all 8 samples failed. SC selected the most common failure
mode. A per-path verifier doesn't have this problem. Each solution is
scored against the constraints independently. Good solutions propagate
through the tree regardless of what other branches found.

I haven't seen this conflict discussed in the literature. Most prior
MCTS-for-LLM work uses per-path evaluation without explaining why
self-consistency is absent.

What the Literature Says

These results fit a pattern.

Snell et al. (2024): compute-optimal test-time scaling needs
difficulty-adaptive allocation. Easy problems need no search. Hard
problems need search plus good verifiers.

"Limits of PRM-Guided Tree Search" (2025): PRM-guided MCTS fails to
beat best-of-N because PRM quality degrades with depth. Noisy reward,
no benefit from search.

"Don't Get Lost in the Trees" (ACL 2025): verifier variance causes
search pathologies. Deterministic verifiers avoid this.

Chen et al. (2024): 90% discriminator accuracy threshold for tree
search to beat reranking. Deterministic constraint checkers hit 100%.

MCTS was built for games with perfect information. Chess and Go have
deterministic reward signals. When the reward is noisy, the search can't
exploit it.

The Verification Asymmetry

The problems where MCTS helps share a structure: easy to verify, hard to
solve.

A constraint satisfaction problem with 8 variables and 15 constraints is
hard to solve. The LLM has to coordinate assignments across all
variables simultaneously. Checking a proposed solution is trivial:
evaluate each constraint, count violations.

This is the asymmetry that makes NP problems interesting. Checking a
certificate is polynomial. Finding one is (presumably) not. It's why
search works for code generation (run the tests) and proof checking
(verify the steps) but not for open-ended essay writing (no verifier).

The same asymmetry shows up at other levels.
Reverse-Process Synthetic Data Generation exploits it
for training data: run the easy direction (differentiation) to get
solved examples of the hard direction (integration).
Science as Verifiable Search
is the same observation about scientific method: science is search
through hypothesis space, and the bottleneck is the cost of testing.
Cheap verification enables fast iteration.

At training time, verifiable rewards let you RL a model into producing
better reasoning (DeepSeek-R1, GRPO). At inference time, verifiable
rewards let you search over candidate solutions (MCTS, best-of-N). At
the level of scientific discovery, verifiable predictions let you prune
hypothesis space. Sutton's "Reward is Enough" is the abstract version
of this.

The practical question for LLM reasoning: can you write a verifier? If
yes, search is worth trying. If not, best-of-N with an LLM judge is
probably the ceiling.

Code

Open source: mcts-reasoning.

pip install -e ".[anthropic]"
export ANTHROPIC_API_KEY=your-key
python experiments/run_csp.py --hard --budget 1.00

The provider tracks token usage and enforces budget caps.

Limitations

The problems are hand-crafted. A generator calibrated by Pass@1 rate
would be more convincing. Ten problems shows the pattern but isn't
enough for statistical significance.

I tested one model. A weaker model might show MCTS advantage on simpler
problems. A stronger one would need harder problems.

The MCTS exploration context shows which solutions scored well and
poorly, but not which specific constraints were violated. Adding
evaluator feedback to the exploration prompt is an obvious improvement
I haven't tried yet.

Summary

Three conditions for MCTS to help LLM reasoning:

A deterministic verifier. Not a learned reward model, not an LLM judge.
Partial credit from the verifier. A gradient, not just pass/fail.
A problem hard enough that blind sampling can't reliably solve it.

When all three hold, MCTS outperforms best-of-N, self-consistency, and
single-pass. When any one fails, it doesn't.

The MCP Pattern: SQLite as the AI-Queryable Cache

Alex Towell — Sat, 21 Mar 2026 13:41:00 +0000

I keep building the same thing.

Not the same product — the products are different. One indexes a Hugo blog. One indexes AI conversations. One consolidates medical records from three hospitals. One catalogs a hundred git repositories. But underneath, they all have the same skeleton. After the fifth time, I think the skeleton deserves a name.

The pattern

Domain files (ground truth)
    ↓ index
SQLite database (read-only cache, FTS5)
    ↓ expose
MCP server (tools + resources → AI assistant)

That's it. Three layers. The domain files are always canonical — the database is a disposable cache you can rebuild from them at any time. SQLite gives you structured queries, full-text search, and JSON extraction over data that was previously trapped in flat files. MCP exposes it to an AI assistant that can write SQL, retrieve content, and (in some cases) create new content.

Here's the inventory:

Project	Domain	Ground Truth	What the MCP Exposes
hugo-memex	Blog content	Markdown files with YAML front matter	951 pages, FTS5 search, taxonomy queries, JSON front matter extraction
memex	AI conversations	ChatGPT/Claude/Gemini exports	Conversation trees, FTS5 message search, tags, enrichments
chartfold	Medical records	Epic, MEDITECH, athenahealth exports	Labs, meds, encounters, imaging, pathology, cross-source reconciliation
arkiv	Personal archives	JSONL files from various sources	Unified SQL over heterogeneous personal data
repoindex	Git repositories	Local git repos + GitHub/PyPI/CRAN metadata	Repository catalog with activity tracking, publication status

Five projects. Five completely different domains. One architecture.

Why SQLite

SQLite is the most deployed database in history. It's on every phone, every browser, every Python installation. But that's not why I use it.

I use it because it solves three problems at once:

Structured queries over unstructured data. Hugo front matter is YAML trapped inside markdown files. Medical records are scattered across three incompatible EHR export formats. AI conversations are JSON trees with branching paths. SQLite turns all of these into tables you can JOIN, GROUP BY, and aggregate. json_extract() handles the long tail of fields that don't fit a fixed schema.

Full-text search. FTS5 with porter stemming and unicode61 tokenization gives you relevance-ranked search across any text corpus. No Elasticsearch, no external service, no running daemon. Just a virtual table that lives in the same database file.

Read-only enforcement. SQLite's authorizer callback lets you whitelist specific SQL operations at the statement level. My MCP servers allow SELECT, READ, and FUNCTION — everything else gets SQLITE_DENY. This isn't PRAGMA query_only (which can be disabled by the caller). It's engine-level enforcement that cannot be bypassed via SQL.

And the operational properties are free: WAL mode for concurrent readers, a single file you can back up with cp, zero configuration, zero running processes.

Why MCP

The Model Context Protocol is the thin layer that makes SQLite useful to an AI assistant. An MCP server exposes tools (functions the AI can call) and resources (reference material the AI can read). That's the whole API surface.

For each project, the MCP layer follows the same shape:

execute_sql — The power tool. Read-only SQL with exemplar queries in the docstring. The docstring is critical: it's the AI's primary reference for writing correct SQL. Ten well-chosen example queries teach the model more than a schema diagram.

get_<things> — Bulk retrieval. Instead of execute_sql to find IDs then N individual fetches, one call returns full content for a filtered set. This matters when you're sharing a context window across multiple MCP servers.

<domain>://schema — A resource containing the full DDL, relationship documentation, and query patterns. The AI reads this once, then writes SQL against it for the rest of the session.

The database is a cache

This is the most important architectural decision, and it's easy to get wrong.

The database is not the source of truth. The files are. The database is a materialized index that can be rebuilt from the files at any time. This means:

No migrations. If the schema changes, drop the database and re-index. For a 951-page Hugo site, full re-indexing takes six seconds. Why maintain migration code for a disposable cache?
No write conflicts. The files are edited by humans (or by AI tools that write to the filesystem). The database is updated by the indexer. There's exactly one write path.
No backup strategy. You already back up your files. The database is derived from them. Lose the database? Rebuild it.
Incremental sync is an optimization, not a requirement. SHA-256 content hashes + file mtimes make re-indexing fast. But if incremental sync has a bug, force a full rebuild. The cache being disposable means you can always recover.

What large context changes

With a million-token context window, you might think this pattern is obsolete. Why index into SQLite when you can just load everything into context?

The math says otherwise. My Hugo blog is 951 pages, ~480K words, ~1.9M tokens. It doesn't fit. And that's one data source. Add AI conversations (memex), medical records (chartfold), and repository metadata (repoindex), and you're well past the limit.

But even if it did fit, the pattern would still be useful. Loading 480K words into context to answer "which posts are tagged 'reinforcement-learning'?" is like loading an entire database into memory to run a SELECT with a WHERE clause. SQLite does it in microseconds. Context loading costs seconds and tokens.

The right model is: MCP for navigation, context for understanding. Use execute_sql to find the five relevant posts, then use get_pages to load their full content into context. One tool call for discovery, one for deep reading.

The tools that earn their keep

After building five of these, certain tools prove their worth and others don't.

The tools that matter:

execute_sql with good docstring examples. This is 80% of the value.
Bulk retrieval (get_pages, get_conversations, get_clinical_summary). One call instead of N+1.
Schema/stats resources. Quick orientation without burning a tool call.

The tools that surprised me:

suggest_tags in hugo-memex. Uses FTS5 similarity to find pages like your draft, then returns their most common tags with canonical casing. Solved a real problem: my blog had 40 case-duplicate tag pairs (Python/python, AI/ai).
get_timeline in chartfold. Merges encounters, procedures, labs, imaging, and notes into a single chronological stream.

The Unix connection

This pattern is the Unix philosophy applied to AI tooling:

Small tools that do one thing well. Each MCP server handles one domain.
Text as the universal interface. SQL in, JSON out.
Composition over integration. Five independent MCP servers, each ignorable, each replaceable.
Files as ground truth. The oldest pattern in computing.

The difference from classical Unix pipes is the composition layer. Instead of grep | sort | uniq, the AI is the orchestrator.

What I'd tell you if you're building one

Start with execute_sql and a schema resource. That's enough to be useful.

Make the database disposable. If you're writing migration code, you've made it too important.

Put the exemplar queries in the tool docstring, not in a separate document. The docstring is the one thing the AI definitely reads.

Use FTS5. The marginal cost is one virtual table. The marginal benefit is that the AI can search your content by meaning, not just by exact column values.

Enforce read-only at the engine level, not the application level. SQLite's authorizer callback is the right mechanism. PRAGMA query_only is a suggestion, not a wall.

Build bulk retrieval tools early. The N+1 pattern (find IDs, then fetch one at a time) is the biggest efficiency problem in MCP servers.

The projects: hugo-memex (PyPI: hugo-memex), memex (PyPI: py-memex), chartfold, arkiv, repoindex.

Code Without Purpose

Alex Towell — Wed, 25 Feb 2026 00:08:22 +0000

Time is finite in ways I can't ignore. That changes which questions about code feel important.

I read a post arguing that the most valuable programming skill in 2026 is deleting code. The thesis: AI generates code faster than anyone can review it, so the real value is in curation and subtraction. Code is a liability, not an asset.

I agree with the observation. I disagree with the prescription.

The Thesis

The argument is straightforward. AI tools can produce entire modules in the time it takes to write a spec. Codebases are accumulating features nobody asked for, abstractions nobody needs, and boilerplate that exists because the model defaulted to verbosity. Teams that used to struggle to ship enough code now struggle with too much of it. In this world, the programmer's role shifts from writer to editor. The most valuable activity becomes knowing what to cut.

I've seen this. Projects accumulate code the way attics accumulate boxes. Nobody remembers why half of it is there. It sits untouched for months, adding to cognitive load, making every change harder to reason about. When you finally clear it out, nothing breaks and the rest becomes legible again. Code rot is real. AI accelerates it. The instinct to subtract is correct.

But subtraction is symptom treatment. The underlying problem isn't volume. It's that most code doesn't know why it exists. It was written (or generated) to solve a local problem, it solved it or half-solved it, and then it sat there, disconnected from any larger purpose. Code without purpose is what bloats. Deletion is the right instinct pointed at the wrong layer. The cause isn't too much code. It's too little intent.

What I Found Instead

I started building tools two years ago because I needed my personal data to survive even if I wasn't around to maintain it. That's the whole constraint. Whatever I build has to work without me.

The first tool was a conversation archiver. I had years of AI conversations across ChatGPT, Claude, and Copilot trapped in platforms that might not exist next decade. I needed them in formats that degrade gracefully. SQLite for structured queries. JSONL for streaming and interchange. Markdown for human reading. If the tool disappears, the data is still a file you can open with anything. If SQLite disappears, the JSONL is still searchable with grep. If everything disappears, the Markdown is still readable in a text editor. Each layer works without the one above it.

Then I needed the same thing for bookmarks. Then ebooks. Then photos, email, medical records, notes. Each tool exists because the previous one exposed a gap. The medical records needed secure sharing without a server. The whole collection needed a dead man's switch. The archive eventually needed something stranger: a way to answer questions after I couldn't.

None of this was planned as an ecosystem. I built the next thing I needed, and the next, and the next. At some point I looked back and realized they all pointed the same direction. This is a life project. Everything serves one purpose.

The Stack

One purpose: durable personal data that outlasts its creator.

Philosophy

longecho: The Long Echo specification. Self-describing data, durable formats, graceful degradation.

Universal Format

arkiv: Universal personal data format. JSONL in, SQL out, SQL back to JSONL. Its MCP server can host any collection intelligently, regardless of domain. One format, one database, one query interface.

Source Toolkits

memex / ctk: AI conversations. Import, query, continue in the browser, export durable archives.
A family of domain toolkits for bookmarks, ebooks, photos, email, and notes. Different domains, identical architecture.
chartfold: Medical records from three hospital systems, consolidated into one queryable database.

Infrastructure

pagevault: Client-side encryption for any HTML file. No server.
posthumous: Federated dead man's switch.
repoindex: Index and query across ~120 git repos.
A collection of Claude Code plugins: MCP servers, agents, and skills that wire everything into my daily workflow.

The Endgame

eidola: A conversable persona assembled from all of the above. Its first form is a Claude Code plugin backed by the combined archive. Not resurrection. An echo.

The Pattern

Every tool follows the same architecture. SQLite for storage. CLI for local use. MCP server for Claude, or the CLI wrapped in a light Claude Code plugin. Export to self-contained HTML you can host anywhere or open from a file. Export to longecho-compliant archives that work without the tool. The data always outlasts the software.

Take memex. Import your AI conversations, query them with SQL, talk to Claude about them via MCP, or export a single HTML file where you can browse and continue conversations in the browser. Download the SQLite from that same page, and you're back to durable local data. The cycle closes. This is how most of the tools work.

Arkiv sits in the middle. The source toolkits produce JSONL. Arkiv imports it to SQLite. Arkiv exports it back to JSONL. Its MCP server can expose any collection to Claude, regardless of what domain it came from. The data flows in a circle, always in formats that describe themselves.

Source toolkits → arkiv → longecho
                              ↓
                  pagevault (encrypt) + posthumous (deliver)
                              ↓
                         eidola (echo)

I have never once needed to delete one of these tools. Not because I'm a better programmer than anyone else. Because each one exists for a specific reason that connects to the others. When code has purpose, dead weight doesn't accumulate.

This isn't architectural foresight. It's what happens when you build from a clear constraint. "My data has to survive without me" is a filter that works at the design stage. Every tool either serves that constraint or it doesn't get built. There is no third category.

The Actual Skill

The most valuable skill isn't deleting code or writing it. It's knowing why you're building.

If you know the purpose, code stays minimal because unnecessary code doesn't serve it. You don't need periodic purges. The purpose does the culling before anything gets written. Deletion is what happens when purpose was absent from the start. It's retrospective correction for a problem that clear intent would have prevented.

I know what I'm building toward. The tools will echo after I stop maintaining them. That was the point.

Chartfold: Owning Your Medical Records

Alex Towell — Tue, 24 Feb 2026 20:52:57 +0000

I have cancer. My oncologist is at one hospital system (Siteman/BJC), my primary care doctor at another, and my earlier treatment history lives at a third (Anderson, where my first oncologist practiced). Patient portals are fine for browsing, but they don't answer questions. They show you your data one lab result at a time, one note at a time, one visit at a time.

I wanted to run queries against my medical records. Correlate lab trends with treatment changes. Generate structured question lists before oncology visits. Ask "what changed since my last appointment" and get a real answer. That means getting the data out of the portal and into something programmable.

Chartfold loads EHR exports into SQLite and exposes them to Claude via MCP.

The Problem

In the US, patients can export their medical records. HIPAA and the 21st Century Cures Act guarantee this. What you get depends on the system: Epic MyChart gives you CDA XML files, MEDITECH Expanse gives you FHIR JSON mixed with CCDA XML, athenahealth gives you FHIR R4 Bundles. Different formats, same clinical concepts.

If your hospitals use different EHR systems, none of them have the complete picture. Chartfold merges the exports into one database. But even if you're at a single hospital, the export format is not something you can work with directly. A directory of CDA XML files is not a database. You can't query it, chart it, or hand it to an LLM.

The point of Chartfold is to turn whatever your hospital gives you into a SQLite database, then make that database useful.

What It Does

Chartfold is a Python CLI. You point it at an EHR export directory, it parses the XML/FHIR, normalizes everything into a common data model (16 tables, ISO dates, deduplicated), and loads it into SQLite. Then you can query it directly, export it as a self-contained HTML dashboard, or connect Claude to it via MCP.

# Load data from your hospital exports
chartfold load epic ~/exports/epic/
chartfold load meditech ~/exports/meditech/

# Query directly
chartfold query "SELECT test_name, value, result_date FROM lab_results
                 WHERE test_name LIKE '%CEA%' ORDER BY result_date DESC"

# Export a self-contained HTML file
chartfold export html --output chartfold.html

# Start the MCP server for Claude
chartfold serve-mcp

The Claude Integration

This is why Chartfold exists for me.

The MCP server exposes the database to Claude Code. Setup is one file. Drop a .mcp.json in any directory where you run Claude Code:

{
  "mcpServers": {
    "chartfold": {
      "command": "python",
      "args": ["-m", "chartfold", "serve-mcp", "--db", "/path/to/chartfold.db"]
    }
  }
}

That's it. Claude now has read access to your entire medical history via SQL, plus tools for saving notes and structured analyses. I keep my database in a private directory and my .mcp.json pointing at it. Open Claude Code, and I'm talking to my records.

The kinds of things I actually use it for:

"What's changed since my last oncology visit on January 15?"

Claude writes SQL, reads the results, and gives me a structured diff: new lab results, new imaging, changed medications, new clinical notes.

"Generate a prioritized question list for my appointment with Dr. Tan tomorrow."

Claude reads my recent labs, imaging reports, pathology, and genomic results, then produces a tiered document organized by clinical urgency.

"Show me my CEA trend and flag any inflection points."

Claude queries the lab_results table, filters by test name, and walks through the time series.

The analyses get saved back to the database (via dedicated MCP tools) and appear in the HTML export as tagged, searchable documents. I use this before every oncology visit. When you have 1776 lab results, 53 imaging reports, and 9 pathology reports, you need something to synthesize them. That's what Claude does well, but it needs structured data to work with. Chartfold provides the structured data. Claude provides the reasoning.

The MCP server exposes 25 tools covering SQL access, lab queries, medication reconciliation, visit prep, surgical timelines, cross-source matching, data quality reports, and CRUD for personal notes and analyses. Clinical data is read-only (SQLite opens in ?mode=ro at the engine level). Claude can't modify your clinical records, only read them and save its own work.

The HTML Dashboard

The HTML export embeds the entire SQLite database using sql.js (SQLite compiled to WebAssembly). Open the file in a browser and you get an interactive dashboard with lab charts, condition tables, medication lists, imaging reports, and a SQL console. Everything runs client-side. No server, no cloud, no account. The file never phones home.

Lab charts show cross-source time-series data with reference ranges. Conditions include ICD-10 codes and source provenance. Medications show "Multi-source" badges when the same drug appears in multiple systems. There's a full SQL console for arbitrary queries.

Architecture

Three-stage pipeline, each stage independently testable:

Raw EHR files (CDA XML, FHIR JSON, CCDA XML)
    |
    v
[Source Parser]  -- format-specific extraction
    |
    v
[Adapter]        -- normalize to UnifiedRecords (16 dataclass types)
    |
    v
[DB Loader]      -- idempotent upsert into SQLite

Currently supports Epic (CDA), MEDITECH (FHIR + CCDA), and athenahealth (FHIR R4). These importers were written against my own exports. I can't guarantee they'll work for yours. EHR exports vary by site, software version, and configuration. The pipeline is designed as a plugin system for exactly this reason: adding a new source means writing a parser, an adapter, and wiring them into the CLI. Claude can write a new importer from a sample export in about an hour.

Export Formats

HTML SPA: self-contained single file with embedded SQLite, Chart.js, and sql.js. No external dependencies.
Markdown: visit-focused summary with configurable lookback, optional PDF via pandoc.
JSON: full-fidelity round-trip format. Export, then import to a new database with identical record counts.
Hugo site: static site with detail pages and cross-linked records.
Arkiv: universal record format (JSONL + manifest) for long-term archival.

The HTML export is a single file. You can host it on a static site, email it, or hand it to a doctor on a USB drive. You can protect it with PageVault to add password-based encryption before sharing.

Medical records should not depend on someone else's infrastructure. A single HTML file with an embedded database and WebAssembly runtime is about as durable as digital data gets.

Getting Started

pip install chartfold

# Load your exports
chartfold load auto ~/path/to/export/

# Query
chartfold query "SELECT test_name, value, result_date FROM lab_results LIMIT 10"

# Export
chartfold export html --output my-records.html

# Connect Claude
chartfold serve-mcp

The code is on GitHub: queelius/chartfold. Python 3.11+, depends on lxml and not much else.

Chartfold started because I wanted to ask questions about my own medical records and couldn't. Now I can.

pagevault: Hiding an Encryption Platform Inside HTML

Alex Towell — Tue, 24 Feb 2026 07:39:58 +0000

HTML is an encryption container format. That sounds wrong, but think about what an HTML file can hold: arbitrary data in script tags or data attributes, a full programming runtime via JavaScript, and a rendering engine (the browser) on every device on the planet. If you embed encrypted data and the code to decrypt it, the result is a file that looks inert until someone types the right password.

pagevault takes this idea seriously. It encrypts files, documents, images, entire websites, into self-contained HTML pages that decrypt in the browser. No backend. No JavaScript crypto libraries. The browser already has AES-256-GCM built in via the Web Crypto API. pagevault just has to match the parameters exactly on the Python side and embed the right 200 lines of JavaScript.

The output is a single .html file. You can email it, put it on a USB stick, host it on GitHub Pages, or double-click it on your desktop. It doesn't phone home, it doesn't load CDNs, it doesn't need anything except a browser.

What Goes In

Anything.

pagevault lock report.pdf              # PDF with embedded viewer
pagevault lock photo.jpg               # image with click-to-zoom
pagevault lock notes.md                # markdown, rendered or source view
pagevault lock recording.mp3           # audio player
pagevault lock mysite/ --site          # entire multi-page website
pagevault lock page.html               # HTML with selective region encryption

Every output is a single .html file containing the ciphertext, a password prompt, the decryption runtime, and a viewer plugin for the content type. Seven viewers ship built-in: Image, PDF, HTML, Text (with line numbers), Markdown (with rendered/source toggle), Audio, and Video. They're a plugin system, so you can add your own.

For directories, --site bundles everything into a single encrypted HTML file. The directory is zipped with deflate compression, encrypted, and embedded. On the browser side, a minimal zip reader (no library, just the built-in DecompressionStream API) unpacks it after decryption. Internal links between pages work. CSS and images load from the zip. I've tested sites with hundreds of files without issues.

The Crypto

Nothing exotic. AES-256-GCM for authenticated encryption, PBKDF2-SHA256 with 310,000 iterations for key derivation, all through the browser's Web Crypto API. The interesting part isn't the cryptography. It's making the container format work at scale.

Multi-user access uses CEK (content-encryption key) wrapping. A random key encrypts the data once. That key is then wrapped separately for each user's derived key. Adding a user wraps one small key blob. Removing a user deletes one blob. The bulk content stays untouched.

The Hard Part: Large Files

The basic approach (encrypt, base64-encode, embed in HTML) works fine for small files. The problems start when you try to encrypt an 84 MB conversation archive or a 179 MB HTML report.

The original v2 format had a compounding overhead problem. File bytes were base64-encoded (33% expansion), then encrypted, then the ciphertext was base64-encoded again (another 33%). That's 1.33 * 1.33 = 1.77x total overhead. An 84 MB file produced a 198 MB HTML page.

v3 fixes this with chunked encryption.

Eliminating the double base64

v2 encrypted a base64 string, then base64-encoded the result. Two layers. v3 encrypts the raw bytes directly and base64-encodes once. The metadata (filename, MIME type, size) is encrypted separately. This alone cuts the overhead from 77% to about 39%.

Chunked ciphertext

Instead of one giant encrypted blob, v3 splits content into 1 MB chunks. Each chunk is encrypted independently with AES-256-GCM using a counter-derived IV: the chunk index is XORed into the last four bytes of a base IV. Each chunk becomes its own <script> tag:

<script id="pv-0" type="x-pv">base64-of-chunk-0...</script>
<script id="pv-1" type="x-pv">base64-of-chunk-1...</script>
...
<script id="pv-83" type="x-pv">base64-of-chunk-83...</script>

The browser decrypts them sequentially, showing a progress bar. After each chunk is decrypted, the script tag is removed from the DOM (el.remove()), freeing the base64 text for garbage collection. Memory usage stays proportional to the chunk size, not the file size.

The numbers

That 84 MB conversation archive: v2 produced 198 MB. v3 produces 117 MB. A 41% reduction, and the decryption doesn't choke the browser.

I've also tested a 315 MB text file and a 179 MB HTML file with 1.5 million DOM elements. These are probably past the point of reason for an HTML container, but it's nice to know where the limits actually are.

The `file://` Problem

One thing that surprised me. Encrypted HTML files opened from the filesystem (file:// URLs) behave differently than files served over HTTP. The file:// protocol gives pages an opaque null origin, which breaks localStorage and blocks nested blob URLs.

The fix was srcdoc iframes, which inherit the parent's origin, plus a pushState shim for the URL bar. Not glamorous, but it means encrypted files work identically whether you double-click them on your desktop or serve them from a CDN.

Try It

pip install pagevault
pagevault lock report.pdf                   # wrap any file
pagevault lock mysite/ --site               # bundle a whole site
pagevault lock page.html -s ".private"      # encrypt specific CSS selectors
pagevault serve _locked/ --open             # preview locally

GitHub. MIT license. 667 tests. Dark mode. Handles files larger than most people would think to put in an HTML page.