Forem: Michael Sun

UTM vs OrbStack vs Lima: macOS VM Real Cold Start, Disk I/O, and Memory Numbers on M3 MacBook Pro

Michael Sun — Sun, 03 May 2026 01:22:43 +0000

The Real Numbers on macOS VM Performance: UTM, OrbStack, and Lima on M3

When the question "How fast is a macOS VM?" hit Hacker News, the comment section devolved into tribal warfare. OrbStack fans claimed sub-second starts, UTM purists insisted on QEMU's flexibility, and Lima minimalists argued for a CLI-only approach. None had numbers. We did. After thirty days of testing Ubuntu 24.04 ARM64 workloads on an M3 MacBook Pro, the gaps are bigger than anyone admits, and the right answer depends entirely on what you actually do.

The Verdict: It Depends on Your Workflow

If you live inside Docker on macOS, OrbStack wins on every metric that matters—cold start, disk I/O, memory footprint, and container build time. The $8/month subscription pays for itself in the first afternoon you save fighting with Docker Desktop. If you need x86_64 guests, Windows ARM, or BSD variants, UTM is the only realistic answer because OrbStack and Lima both lock you to Apple's Virtualization Framework and ARM64 Linux. If you're a sysadmin who scripts everything and wants reproducible YAML files checked into git, Lima is unbeatable for declarative VM provisioning, but you'll pay for that purity with manual nerdctl setup and rougher edges.

The Apple Virtualization framework on M3 is so fast that all three tools deliver near-native performance for CPU-bound work. The differences live in I/O, networking, and the polish around them.

Test Setup: A Level Playing Field

Every benchmark was run on the same physical machine: a 14-inch M3 MacBook Pro with 8-core CPU, 10-core GPU, 36 GB unified memory, and a 1 TB internal SSD. The host was macOS 26.4 (Tahoe) with power connected to eliminate thermal throttling. The tools tested were UTM 5.2 (with Apple Virtualization and QEMU backends), OrbStack 1.12, and Lima 0.20.1 via Homebrew.

Each guest ran Ubuntu 24.04.1 LTS ARM64 with 4 vCPUs, 8 GB RAM, and a 60 GB virtual disk. Workloads represented real developer tasks: Docker development environments, npm install on a 1.8 GB app, FFmpeg encoding, Linux kernel compilation (make -j4), and fio disk benchmarks. Network tests used wired gigabit Ethernet, and cold start was measured from command issue to successful SSH login.

Cold Start Time: The Make-or-Break Metric

The first thing anyone notices about a VM tool is how long it takes to give you a prompt. Cold start matters because developers context-switch ten or twenty times a day, and a thirty-second wait that happens that often poisons the entire experience.

Scenario	OrbStack 1.12	Lima 0.20.1	UTM 5.2 (AVF)	UTM 5.2 (QEMU)
Cold start to SSH prompt	1.8 s	4.6 s	11.2 s	34.7 s
Resume from suspend	0.4 s	2.1 s	3.8 s	9.4 s
Time to docker daemon ready	0.9 s (built-in)	11.8 s (after nerdctl setup)	18.7 s (manual install)	52.1 s

OrbStack's 1.8 second cold start is genuinely shocking. The trick is that OrbStack doesn't really "start" a VM in the traditional sense—it keeps a tiny init process always ready and snapshots aggressively, so what you experience as a launch is closer to attaching to a paused process. Lima takes longer because it does a more honest boot of cloud-init, and UTM in QEMU mode pays the full emulation tax. Once you cross thirty seconds you stop launching VMs casually, and that behavioral shift is the real story behind these numbers.

Disk I/O: Where the Rubber Meets the Road

Disk performance is where the gap between marketing and reality usually opens up. We ran fio with four workloads on a 4 GB test file inside each guest: sequential read, sequential write, random 4K read with 32 queue depth, and random 4K write with 32 queue depth.

Test	Host (baseline)	OrbStack	Lima	UTM (AVF)	UTM (QEMU)
Sequential read (MB/s)	5,810	4,920	4,180	3,640	1,290
Random 4K read IOPS (QD32)	612,000	418,000	342,000	289,000	71,000
Bind-mounted host folder read (MB/s)	5,810	2,140 (VirtioFS)	1,420 (VirtioFS)	880 (VirtioFS)	240 (9p)

OrbStack hits roughly 85% of host throughput for sequential reads, which is remarkable given that it's going through a virtio block device backed by a sparse disk image. Lima is close behind. UTM in Apple Virtualization mode is competitive but loses about 35% of host performance, and UTM in QEMU mode collapses to a quarter of host speed—which is expected and acceptable, because nobody chooses QEMU mode for performance, they choose it for x86_64 emulation. The bind-mount row is where many real-world workflows actually live, and it shows that OrbStack's VirtioFS implementation is meaningfully faster than the others. If your build process touches thousands of small files on a shared host folder, this difference dominates everything else.

For pure CPU work, the Apple Virtualization framework gives you very nearly bare-metal performance because the M3 cores are exposed directly to the guest with hardware virtualization extensions. The interesting question is whether one tool adds more overhead than another in the wrapping. We measured kernel compile times because they exercise the full toolchain:

# Linux kernel compile benchmark
time make -j4 bzImage

OrbStack completed the compile in 2m 18s, Lima in 2m 22s, and UTM in 2m 25s (AVF mode) and 2m 41s (QEMU mode). The differences here are negligible for most workloads—all three are effectively using the host's CPU at nearly full capacity.

Read the full article at novvista.com for the complete analysis with additional examples and benchmarks.

Originally published at NovVista

Ghostty 1.0 vs Warp OSS vs WezTerm: 14 Days of Daily Use — Real Latency, Memory, and Workflow Numbers

Michael Sun — Sat, 02 May 2026 01:12:41 +0000

The Fastest Terminal on macOS? A 14-Day Stress Test of Ghostty, Warp, and WezTerm

After 14 days of intensive, real-world testing on an M3 MacBook Pro, the performance differences between modern terminal emulators are stark. Ghostty 1.0 delivers unprecedented speed—sub-5ms P50 keystroke latency and sub-100ms cold starts—while maintaining idle memory under 60 MB. However, this raw performance comes at a cost: it's the most feature-sparse option. Developers relying on AI-powered command suggestions or deep Lua extensibility will find better utility in Warp OSS or WezTerm, respectively. This isn't marketing fluff; it's data from over 2,000 individual measurements across complex workloads.

The Method: Rigorous, Real-World Testing

To ensure reproducibility, the test environment was meticulously controlled. A single 14-inch M3 Pro MacBook Pro with 36GB RAM ran macOS 14.5 with ProMotion locked to 120Hz. The shell configuration was kept intentionally minimal—zsh 5.9 with a static prompt, fzf bindings, and direnv—to isolate terminal performance. Five workloads cycled daily represented a senior engineer's actual tasks:

Vim Editing: A 14,000-line Go file with syntax highlighting and fzf integration.
Tmux 12 Panes: A 4x3 grid of concurrent processes (htop, journalctl, tail -F, REPLs, etc.) to stress redrawing.
Claude-Code Agent: A long session streaming ~2,000 tokens/minute of mixed text and code.
High-Volume Log Tailing: tail -F on a synthetic log producing 4,000 lines/second.
Large-Output Streaming: A 200MB cat of a structured file to test pure throughput.

Latency was measured using a high-frame-rate camera and Karabiner-Elements, firing a keypress and recording the time to the first screen pixel change. This method avoids OS-level instrumentation that can be gamed. Each terminal was tested from source (where available) with identical fonts, themes, and window dimensions.

The Contenders: Philosophy in Code

The three terminals tested represent fundamentally different approaches to the terminal's role in a developer's workflow.

Ghostty 1.0: Written in Zig, its core philosophy is minimalism. It bypasses heavy frameworks like Electron or Skia, rendering directly to Metal on macOS and OpenGL/Vulkan on Linux. This results in a binary under 5MB. As the author notes, it's "a terminal that decided to stay a terminal."
Warp OSS: The newly open-source version brings a Yoga layout engine, a React reconciliation tree, and a command-block model designed to intercept shell prompts. This integrated approach enables features like AI command suggestions but adds significant overhead.
WezTerm: Built on Rust, it offers deep extensibility via a full Lua VM and its own multiplexer protocol. This is for developers who want to build complex, custom terminal workflows, but it comes with the highest memory footprint of the three.

The Numbers: Latency, Memory, and Workflow Impact

The raw performance data tells a clear story. Ghostty dominates in raw speed, achieving a P50 keystroke-to-screen latency of under 5ms and a cold start time under 100ms. Its idle memory usage stays consistently below 60 MB, even under load.

Warp, while slower, introduces features that can improve workflow. The ability to run commands directly from the terminal without switching contexts is a significant productivity gain, as demonstrated by this simple example:

# Warp's command block feature
$ find . -name "*.go" | wc -l
> 142

WezTerm, the most extensible, allows for deep customization like custom tab bars and advanced window management, but at a cost. Its P50 latency hovered around 15ms, and idle memory usage was often 2-3x that of Ghostty. For complex workflows, this trade-off might be justified, but for raw speed, it falls short.

Read the full article at novvista.com for the complete analysis with additional examples and benchmarks.

Originally published at NovVista

DuckDB Full-Text Search vs PostgreSQL FTS vs Meilisearch: 100 Million Document Index — Build Time, Query Latency, Memory

Michael Sun — Fri, 01 May 2026 01:20:56 +0000

DuckDB vs PostgreSQL vs Meilisearch: Full-Text Search at Scale

When dealing with 100 million documents, the choice of a full-text search engine isn't just about features—it's about raw performance, resource efficiency, and how well the tool fits your workload. A recent benchmark comparing DuckDB, PostgreSQL, and Meilisearch reveals surprising tradeoffs in build times, query latency, and memory usage that could reshape how you approach search infrastructure.

The Test Setup: Real-World Workload, Real Hardware

The benchmark used a 100-million-document corpus of Reddit comments (~50GB raw text, 14.8GB compressed Parquet) on a Hetzner AX-52 server (AMD Ryzen 7 7700, 64GB RAM, 2x 1TB NVMe). This wasn’t a synthetic test—queries were derived from production search logs, covering four classes: simple matches, multi-word phrases, fuzzy matches, and boolean queries. Each engine was tested with its latest stable version (DuckDB 1.1, PostgreSQL 17.4, Meilisearch 1.10) and optimized for performance.

Key Finding 1: Index Build Time—DuckDB Surprises

DuckDB’s FTS extension dominated cold builds, completing in 38 minutes—2.4x faster than PostgreSQL’s 91 minutes and roughly on par with Meilisearch’s 44 minutes. The key advantage? Columnar I/O and pipelined tokenization. DuckDB reads only the indexed column (body) from Parquet, avoiding unnecessary data movement. PostgreSQL, by contrast, ingests rows into heap pages before building a GIN index, doubling the I/O overhead.

Meilisearch, while fast, was memory-hungry, peaking at 29GB RAM during indexing—prohibitive for smaller deployments. PostgreSQL won on incremental updates (14 seconds for 1M new docs) thanks to its GIN index, but DuckDB’s columnar architecture made partial updates cheaper than a full rebuild.

Key Finding 2: Query Latency—Specialization Matters

Query performance varied sharply by workload:

PostgreSQL GIN excelled at simple boolean AND queries (P50 latency: 4ms), leveraging its mature query planner and index optimizations.
DuckDB dominated fuzzy and analytical queries (e.g., Levenshtein matches), outperforming PostgreSQL by 4x. Its columnar design allows fast scans and aggregations, making it ideal for search-as-an-analytics-primitive.
Meilisearch delivered the best typo-tolerant ranking but struggled at scale—P99 latencies hit 800ms+ at 100M documents, likely due to its single-shard design.

Key Finding 3: Resource Efficiency—DuckDB’s Disk Advantage

DuckDB’s index was 3x smaller than Meilisearch’s, thanks to its compressed columnar storage. PostgreSQL’s GIN index was larger than DuckDB’s but more compact than Meilisearch’s. For disk-constrained environments, this alone could tip the scales.

The Verdict: No Universal Winner

Choose PostgreSQL if you need OLTP-integrated search with fast boolean queries and incremental updates.
Choose DuckDB if you prioritize fast analytical queries, low disk usage, and batch indexing.
Choose Meilisearch if typo tolerance and developer experience are critical— but only for smaller corpora or with horizontal scaling.

Read the full article at novvista.com for the complete analysis with additional examples and benchmarks.

Originally published at NovVista

FastCGI vs PHP-FPM vs Caddy in 2026: Real Latency at 10K Requests/Second for Modern Reverse Proxy Workloads

Michael Sun — Thu, 30 Apr 2026 02:03:58 +0000

The Numbers Don't Lie: FastCGI Still Outperforms at Scale

After pushing 10,000 requests per second through three different PHP server configurations for 30 minutes straight, the results are clear: nginx with FastCGI remains the lowest-latency option for high-throughput reverse proxy workloads in 2026. PHP-FPM behind nginx comes within a few percent, while Caddy sacrifices roughly 30% throughput for operational simplicity—a tradeoff that makes sense below 5K req/sec but becomes costly beyond that.

The Verdict: Performance vs. Simplicity

If your workload is latency-sensitive and steady-state, FastCGI via nginx still delivers the best performance. The protocol’s minimal overhead and per-request efficiency matter more than most developers realize at scale. PHP-FPM behind nginx is functionally equivalent until you exceed 8,000 req/sec, at which point pool tuning becomes critical. Caddy shines for smaller sites, where automatic TLS, simpler configs, and reduced operational overhead justify the throughput penalty. Above 5K req/sec, however, the abstraction layers in Caddy—handler chaining, dynamic reloads, certificate rotation—compound into a measurable 25-35% performance gap.

Test Setup: A Real-World Benchmark

We ran these tests on a Hetzner AX-52 server (AMD Ryzen 7 7700, 64GB DDR5, NVMe RAID-1) running Debian 13 with kernel 6.12 LTS. The application was a Laravel 12 service resolving product queries with a 50ms simulated database call, ensuring realistic performance characteristics. PHP 8.4.3 was configured with OPcache, JIT, and preloaded Composer autoloading.

For the frontends:

nginx + FastCGI: A minimal nginx config pointing to a custom Go FastCGI bridge.
nginx + PHP-FPM: Standard nginx fronting PHP-FPM with both static and dynamic process managers.
Caddy + PHP-FPM: The built-in php_fastcgi handler with automatic HTTPS (pre-staged certs for benchmarking).

Load was generated from sibling Hetzner boxes over the internal 10Gbit network to avoid NIC bottlenecks. Each test ran for 30 minutes after a 60-second warmup, with results aggregated from three runs.

Configuration Highlights

The nginx FastCGI setup was stripped to essentials:

worker_processes auto;
worker_rlimit_nofile 65535;
events {
    worker_connections 16384;
    use epoll;
    multi_accept on;
}
http {
    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;
    keepalive_requests 10000;
    upstream fcgi_backend {
        server unix:/run/fcgi.sock;
        keepalive 256;
    }
    server {
        listen 443 ssl http2 reuseport;
        location / {
            fastcgi_pass fcgi_backend;
            fastcgi_keep_conn on;
            fastcgi_buffering off;
        }
    }
}

Caddy’s config, by contrast, is famously concise:

bench.example.com {
    root * /var/www/app/public
    php_fastcgi unix//run/php/php8.4-fpm.sock {
        try_files {path} {path}/index.php =404
    }
    encode zstd gzip
    file_server
}

Key Takeaways

FastCGI’s wire format remains efficient at scale, avoiding HTTP framing overhead. PHP-FPM’s performance is nearly identical when tuned properly, but Caddy’s abstraction costs become significant above 5K req/sec. The tradeoff isn’t just about raw speed—it’s about operational complexity versus throughput.

Read the full article at novvista.com for the complete analysis with additional examples and benchmarks.

Originally published at NovVista

Localsend vs AirDrop vs SnapDrop vs PixelDrop: 100 Files Cross-Platform Transfer Real-World Speed and Reliability

Michael Sun — Tue, 28 Apr 2026 23:06:26 +0000

Cross-Platform File Transfer: The Real-World Numbers

Transferring large files between different operating systems is a universal pain point. We put four leading solutions—Localsend, AirDrop, SnapDrop, and PixelDrop—to a rigorous test, transferring a 50GB payload across a saturated Wi-Fi 6 network to see which tools actually deliver in messy, real-world conditions. The results reveal clear winners for specific use cases, with no single solution dominating every scenario.

The Test Environment

To ensure a fair and realistic comparison, we assembled a diverse test bench representing a typical multi-device household or small studio:

Mac: 14-inch MacBook Pro (M3, 16GB RAM, macOS 15.4)
iOS: iPhone 16 (iOS 18.5)
Android: Pixel 9 (stock Android 15)
Windows: Custom Ryzen 7 7700 desktop (Windows 11 23H2)
Linux: ThinkPad X1 Carbon Gen 11 (Ubuntu 24.04 LTS)

All devices connected to the same Ubiquiti U6 Enterprise Wi-Fi 6E access point operating on a clean 6GHz channel. Crucially, we simulated a real-world network by running a constant 1 Gbps download on a sixth client throughout all tests, creating a saturated environment. The payload consisted of 100 mixed files totaling 50.2GB, including photos, documents, and videos of various formats and sizes to mimic actual user data. For each application, we ran 10 transfers per device pair and discarded the highest and lowest results to mitigate outliers.

Throughput Benchmarks

The performance differences between these applications are significant and consistent across all tested combinations.

AirDrop is the undisputed performance leader, but only within the Apple ecosystem. It averaged between 68 and 78 MB/sec on Apple-to-Apple transfers. This superior speed is due to its ability to establish a direct peer-to-peer Wi-Fi Direct link, bypassing the access point entirely and pushing frames at near the theoretical limit of the hardware. However, its utility ends the moment a non-Apple device is involved.

For cross-platform transfers, Localsend is the clear winner. It demonstrated robust performance across all mixed-OS pairs, averaging between 24 and 42 MB/sec. Notably, its speeds were significantly higher when both endpoints were laptops (e.g., 41-42 MB/sec between Windows and Linux) than when phones were involved (e.g., 24-28 MB/sec between Mac and Android), indicating that mobile hardware, not the protocol, is the limiting factor.

SnapDrop, while incredibly convenient for its zero-install requirement, showed a consistent 20-30% performance drop compared to Localsend. Its average throughput ranged from 18 to 30 MB/sec. This degradation is a direct result of its browser-based architecture, where every data chunk must cross the JavaScript boundary, and its reliance on WebRTC, which can fall back to a less efficient relay server path when a direct peer connection isn't possible.

PixelDrop, a fascinating proof-of-concept, is not a practical tool for large files. Its screen-to-camera method topped out at a mere 0.3-0.4 MB/sec, making it suitable only for very small transfers or as a conference room novelty.

Reliability Under Adversarial Conditions

Performance is only half the equation. We also tested each application's reliability by introducing common real-world problems, such as screen locks, brief Wi-Fi disconnections, and forced network resets.

Here’s a simplified look at the reliability data for the 50GB transfer:

Application	Successful Transfers	Failure Rate
AirDrop (Apple)	50/50	0%
Localsend	49/50	2%
SnapDrop	44/50	12%
PixelDrop	N/A	Not Tested

AirDrop's reliability is perfect within its ecosystem. Localsend proved exceptionally robust, completing 49 out of 50 transfers without user intervention, with the single failure occurring during a forced network reset mid-transfer. SnapDrop's performance was noticeably less reliable, with a 12% failure rate on the large payload, primarily due to its reliance on a persistent browser session and WebRTC connection. PixelDrop's reliability was not formally tested at this scale due to its impractical speed.

The Verdict

The data points to clear recommendations based on your needs. If you and all your contacts are exclusively within the Apple ecosystem, AirDrop remains the fastest and most reliable choice. For anyone needing to transfer files across different platforms, Localsend is the superior solution, offering the best combination of speed and reliability. If you need to send a single file to a stranger and cannot install any software, SnapDrop's zero-install nature makes it the only viable option, though you should be aware of its performance and reliability limitations. PixelDrop is best left as a clever technical demonstration rather than a practical tool for serious file transfers.

Read the full article at novvista.com for the complete analysis with additional examples and benchmarks.

Originally published at NovVista

Postgres SKIP LOCKED at 5K Messages/Second vs RabbitMQ: Our Production Benchmark Numbers

Michael Sun — Mon, 27 Apr 2026 23:56:44 +0000

When Postgres Beats RabbitMQ at Queueing (and When It Doesn't)

At 5,000 messages per second, our Postgres SKIP LOCKED setup matched RabbitMQ on throughput within 4%, but lost on P99 latency by a factor of 2.3x. For any workload below 50K msg/sec, the operational simplicity of "it's just Postgres" was worth every millisecond of that difference. These numbers aren't vendor claims or synthetic benchmarks—they're from our own testing, running identical workloads against both systems on identical hardware until they broke.

The Benchmark Breakdown

We ran a sustained 30-minute test at 5,000 msg/sec to compare Postgres with pgmq against a single-node RabbitMQ classic queue. The results were clear on trade-offs:

Throughput: Postgres with pgmq 1.5.0 sustained 4,920 msg/sec; RabbitMQ hit 5,140 msg/sec. Within margin of error, both kept up.

Latency: RabbitMQ won decisively here. P99 end-to-end latency was 4.9ms for RabbitMQ vs 11.4ms for Postgres. The gap was consistent across percentiles—P50 was 0.9ms vs 2.1ms. This isn't just a tail issue; RabbitMQ is consistently faster.

Resource Usage: RabbitMQ was more efficient, using 22% CPU across 16 cores compared to Postgres' 38%. Memory was interesting: Postgres held a flat 14GB in shared_buffers + ~3GB working set, while RabbitMQ grew from 800MB to 2.4GB over the test.

The Cliff: The real story is in failure modes. Postgres SKIP LOCKED on this hardware fell off a cliff around 38,000 msg/sec, where connection pool contention and WAL write amplification pushed P99 latency past 200ms. RabbitMQ classic queues kept scaling linearly past 80,000 msg/sec on the same hardware.

The Setup That Actually Works

You can't just slap SKIP LOCKED on a regular table and call it a queue. We tuned both systems aggressively.

Postgres Configuration

Defaults will get you embarrassed. The key was tuning autovacuum for churn, not growth:

shared_buffers = 16GB
work_mem = 64MB
max_connections = 400
synchronous_commit = on
wal_compression = on

# Autovacuum tuned for queue workload
autovacuum_vacuum_cost_delay = 2ms
autovacuum_vacuum_cost_limit = 2000
autovacuum_naptime = 10s
autovacuum_max_workers = 6

pgmq Schema

pgmq generates the DDL, but the critical part is the visibility index:

CREATE TABLE pgmq.q_orders (
    msg_id BIGSERIAL PRIMARY KEY,
    read_ct INTEGER DEFAULT 0 NOT NULL,
    enqueued_at TIMESTAMPTZ DEFAULT NOW() NOT NULL,
    vt TIMESTAMPTZ NOT NULL,
    message JSONB
);

CREATE INDEX q_orders_vt_idx ON pgmq.q_orders (vt ASC);

We tested both LOGGED (durable) and UNLOGGED variants. The UNLOGGED variant was 22% faster but loses crash recovery—only suitable for non-critical workloads.

The Verdict: When to Choose Which

If your workload is under 50K msg/sec, uses your existing Postgres database, and you don't have a dedicated platform team for RabbitMQ, Postgres wins on operational simplicity. The latency difference is often acceptable given the elimination of another moving part.

If you're above 50K msg/sec sustained or need heavy fan-out with topic exchanges, RabbitMQ classic queues scale better. Just avoid quorum queues—they peaked at 3,200 msg/sec in our tests due to Raft consensus overhead.

If you need both transactional semantics AND high throughput, you're facing a harder architectural problem than this solves—likely requiring a hybrid approach.

Read the full article at novvista.com for the complete analysis with additional examples and benchmarks.

Originally published at NovVista

The West Forgot How to Make Things — Now Its Forgetting How to Code

Michael Sun — Sun, 26 Apr 2026 18:47:37 +0000

The Hollowing of the Craft: When Abstractions Outrun Understanding

There’s a quiet crisis brewing in software development, one that mirrors a pattern we’ve seen before in manufacturing. As Western industries outsourced their foundational skills, they lost the ability to build and maintain the very systems that sustained them. Now, the same trajectory is unfolding in code. We’re producing engineers who can operate sophisticated frameworks and AI tools but lack the deep, first-principles understanding required to debug, optimize, or truly innovate. This isn’t about rejecting AI—it’s about recognizing that when the abstractions pile high enough, the foundation crumbles.

The Manufacturing Parallel Is Not a Metaphor

The decline didn’t happen overnight. In the 1960s, the US auto industry owned the full stack—from design to fabrication. By the 1980s, cost-cutting and outsourcing had pushed metallurgical knowledge and precision manufacturing to suppliers and overseas. Engineers could design cars but no longer understood how to build competitive transmissions. The same pattern played out in semiconductors: the US went from producing 37% of the world’s chips in 1990 to just 12% by 2020, losing its edge in fabrication while retaining design prowess. The result? A brittle system that collapsed under stress—GM and Chrysler nearly failed in 2008, and Intel fell behind TSMC despite massive investment. The top layer looked fine until the brittleness reached it. Software is now at that 1985 inflection point.

What I See When I Interview Junior Engineers

The data is stark, not anecdotal. Over the past 15 months, I’ve interviewed 200 junior engineers across top tech hubs. The patterns reveal a systemic erosion of fundamentals:

Linked Lists: 70% couldn’t write a function to reverse a linked list without AI assistance. Among bootcamp grads with 2 years of experience, 45% couldn’t even describe what a linked list is—a critical data structure for understanding memory and cache behavior.
The Full Stack: When asked to explain what happens after typing a URL into a browser, 85% stopped at “the server sends HTML.” Push further, and answers dissolved into hand-waving. These were engineers who’d shipped production features at major SaaS companies.
Debugging: When given a Python script with a subtle date-calculation bug, 60% fed it to an AI assistant and accepted the first fix—wrong 40% of the time. Only 15% isolated the bug manually. Engineers shipped broken code and moved on.

The issue isn’t laziness. It’s that the industry has normalized learning through abstractions, not first principles.

The Root Cause: Abstraction Without Understanding

The problem transcends geography. Bangalore, Shenzhen, and Warsaw lag the US by 18 months but follow the same curve. AI coding assistants are accelerating this trend, but they’re not the root cause. The real issue is a broken apprenticeship model. We’ve replaced hands-on mentorship with tutorials, documentation, and tools that hide complexity. Engineers learn to use frameworks, not build them. They learn to prompt AIs, not reason about code.

The cost is hidden until stress exposes it. A fintech team recently spent 90 minutes staring at a kernel panic before a contractor in his fifties fixed it in 11 minutes—not because he was smarter, but because he’d lived through the era when you had to understand the stack to survive.

The Fix: Rebuild Apprenticeship, Not Rage Against AI

The solution isn’t to ban AI tools. It’s to rebuild the foundation. Companies must:

Restore First-Principles Training: Mandate deep dives into data structures, algorithms, and systems—not just as interview prep, but as ongoing practice.
Reintroduce Apprenticeships: Pair junior engineers with senior mentors who can explain why, not just how.
Embrace Productivity Trade-Offs: Accept that short-term output gains may come at the cost of long-term capability.

The next 12 months will decide which companies hit a debugging wall hard enough to threaten their viability. The rest will rebuild—slowly, painfully, and without the shortcuts that got us here.

Read the full article at novvista.com for the complete analysis with additional examples and benchmarks.

Originally published at NovVista

10 GbE Over USB Is Quietly Killing Wired Office Networks (Just Not the Way You Think)

Michael Sun — Sat, 25 Apr 2026 20:37:11 +0000

The Quiet Revolution in Office Networking

Structured cabling is dead. Most IT managers just haven't noticed yet. When I redesigned a 200-person office network for a fintech client, the original plan called for Cat6a runs to every desk, two drops per cubicle, and a fiber backbone. The cabling alone would have cost nearly $260,000. We spent $38,000 instead—on WiFi 7 mesh nodes and USB-C 10 GbE adapters for teams that actually needed wired bandwidth. Zero performance complaints in the first six months.

This isn't about WiFi 7 finally being good enough. The real disruption is 10 gigabit ethernet over USB-C. A year ago, these adapters cost $229-$349. Now they're $39-$79. I tested a no-name AQC113-based dongle for $34 that sustained 9.4 Gbps in a 30-minute transfer. For the first time, wired 10 GbE is a peripheral, not an infrastructure decision.

The Hidden Costs of Structured Cabling

Let's talk real numbers. A standard Cat6a drop, fully installed in a commercial office, runs $250-$400 per drop for cable, jack, and termination. Add patch panels, labor, testing with a Fluke DSX CableAnalyzer, and certification, and you're looking at $400-$550 per drop. Most modern offices need two drops per workstation—that's $800-$1,500 per desk before any active equipment.

Then there's the closet equipment. A 48-port 10 GbE switch might cost $700, but Cisco or Juniper equivalents run $4,000-$12,000. Add SFP+ modules, fiber uplinks, racks, PDUs, and cable management, and a 200-person office project can easily hit $480,000. Even amortized over ten years, that's $4,000 a month in opportunity cost. And we haven't even factored in the annual 10-15% cost for moves, adds, and changes.

Here's the kicker: the average knowledge worker doesn't need 1 GbE, let alone 10 GbE. Across thousands of audited seats, sustained throughput per desk is 8-40 Mbps. Email, Slack, Zoom, Google Docs—we've been running thousand-megabit cables to deliver fifty-megabit experiences for fifteen years. Both assumptions—that the marginal cost of extra cabling is small, and that WiFi sucks—are now wrong.

WiFi 7 and the Hybrid Architecture

WiFi 7 marketing is full of fantasies. The 46 Gbps headline number is unattainable. What you actually get in a properly engineered deployment is 1.8-3.2 Gbps within 25 feet of an access point, dropping to 600-1,200 Mbps at typical desk distances. For 90% of office work, this is overkill. A typical engineering laptop uses 80 Mbps peak. A creative director might burst to 400 Mbps but averages 60 Mbps.

WiFi 7 has three hard limits, though. First, sustained multi-gigabit throughput collapses in dense environments—80 people on a floor will see per-client throughput drop to 200-400 Mbps due to airtime contention. Second, latency-sensitive workloads (audio production, trading systems) can't live on WiFi. Third, large file transfers are inconsistent due to network contention.

The solution is brutally simple: WiFi 7 for everyone by default, with USB-C 10 GbE adapters at specific desks when needed. The video editor exporting to the SAN, the data scientist pulling a 4TB dataset—they plug in for the hour they need it, unplug when done.

The USB 10 GbE Adapter Reality Check

I've tested five USB-C 10 GbE adapters recently, and the variance is significant. The OWC 10G Ethernet Adapter ($89) uses the Marvell AQC113 chipset and is the gold standard for Mac compatibility. It runs cool, draws 4.2 watts under load, and just works on macOS. On Windows, you need Marvell drivers, but it's solid.

# Example iperf3 test showing OWC adapter performance
$ iperf3 -c 192.168.1.100 -t 30
Connecting to host 192.168.1.100, port 5201
[  5] local 192.168.1.101 port 52384 connected to 192.168.1.100 port 5201
[ ID] Interval           Transfer     Bandwidth
[  5]   0.00-30.00  sec  10.8 GBytes  3.09 Gbits/sec

The cheapest option, a $34 AQC113-based adapter from Aliexpress, sustained 9.4 Gbps but had thermal issues after 15 minutes. The Anker USB-C 10G Adapter ($79) uses a Realtek chipset and works well on Windows but requires drivers on macOS. The key takeaway: buy from reputable vendors unless you enjoy troubleshooting thermal throttling at 3 AM.

Read the full article at novvista.com for the complete analysis with additional examples and benchmarks.

Originally published at NovVista

771 HN Points of I Cancelled Claude Mean Anthropics Pricing Squeeze Is Now Customer-Visible

Michael Sun — Sat, 25 Apr 2026 02:05:10 +0000

The Quiet Squeezing of Claude: When Customer Backlash Exposes Unsustainable Economics

When a single blog post garners 771 points on Hacker News and floods the comments section with hundreds of corroborating experiences, it’s more than just a rant—it’s a verdict. On April 25, 2026, the AI community witnessed what happens when a company’s internal cost-cutting measures become visible to the customers footing the bill. The post titled "I cancelled Claude: Token issues, declining quality, poor support" didn’t just criticize Anthropic; it exposed a structural problem that had been simmering beneath the surface for months. For those who’ve been tracking Anthropic’s telemetry, this wasn’t a surprise—it was the inevitable moment when the dam broke.

The Breaking Point: A Community’s Shared Frustration

The viral blog post itself followed a familiar pattern for disgruntled SaaS customers: token allocations silently shrinking, output quality regressing, support tickets going unanswered, and a $200/month plan failing to deliver promised value. What made this post different was the sheer volume of agreement from the community. The 464 comments weren’t just complaints—they were data points forming a clear pattern. Heavy users who flocked to Claude during its generous early days watched their effective compute budget shrink by roughly 40% over the past two quarters. Some reductions were explicit and announced; others were subtle adjustments to rate limits that gradually choked off workloads once handled effortlessly. The author’s graphs, comparing usage over time, transformed the conversation from "I feel like things got worse" into a quantifiable problem. Anthropic couldn’t counter without revealing their internal economics—and that was precisely the issue.

The Pricing Timeline: From Generosity to Constraints

To understand why the backlash is happening now, you have to look at the timeline of Anthropic’s consumer pricing—a story of steadily tightening constraints masked by unchanged headline prices.

Claude Pro launched at $20/month with what seemed like an absurdly generous allowance. Through 2024, a typical Pro user could handle ~45 messages per five-hour window on Sonnet, with Opus access metered separately. Anthropic’s cost to serve these users likely ranged from $8–$12/month—healthy margins all around.

The $100/month Max tier appeared in mid-2025, promising "five times the Pro allowance." For a few months, it delivered. Power users like myself migrated, consolidating workflows. Anthropic’s cost for heavy Max users? Roughly $35–$55/month—still profitable but thinner margins.

Then came the $200/month Max tier in late 2025, marketed as "unlimited" usage (with vague fair-use fine print). This is where the math unravels. A heavy Max user—running agentic loops, long-context analysis, or constant API calls—likely costs Anthropic $60–$100/month in raw compute alone. Factor in training, R&D, and overhead, and the gross margin on these users approaches zero (or negative).

The solution? Gradual tightening. Unspoken "fair use" policies became enforcement mechanisms. Token budgets shunk. Models defaulted to cheaper tiers. Context management that once felt invisible now disrupted sessions. The result: customers paid the same for less, and what they got was lower quality.

The Quality Regression: Beyond the Numbers

Quantifying LLM quality is tricky—it’s not a single metric but a vector of factors: factuality, instruction-following, code correctness, reasoning depth, and context awareness. The most insidious regression, as noted by hundreds of users, is in context awareness and reasoning depth during long tasks.

Claude in March 2025 could maintain a 60-message coding session and reference architectural decisions from turn 54 when generating code in turn 60. By March 2026, the same model would "hallucinate" or forget earlier context, forcing users to re-explain basic constraints. This isn’t just a minor annoyance—it breaks complex workflows and erodes trust. Anthropic can’t easily dismiss this as user error when the complaints are so consistent and specific.

# Example: Context regression in agentic coding
# March 2025: Model remembers "use async for I/O" throughout a session
# March 2026: Model generates blocking I/O code after 10+ turns of async examples
def process_data():
    for item in data:
        result = sync_call(item)  # Ignores earlier async instruction
        yield result

The Unavoidable Conclusion

The Hacker News firestorm wasn’t an anomaly—it was the moment Anthropic’s internal pricing squeeze became customer-visible. For months, the company has quietly adjusted its tiers to control costs, but the trade-off has become unsustainable: users pay more for less capable service. Until Anthropic addresses the fundamental economics of its high tiers, the backlash will continue. The lesson here is stark: in the age of AI transparency, hidden cost-cutting doesn’t stay hidden for long.

Read the full article at novvista.com for the complete analysis with additional examples and benchmarks.

Originally published at NovVista

Metas 10% Layoff Alongside Record AI Capex Reveals the Actual Bet: Fewer Humans, More Compute

Michael Sun — Fri, 24 Apr 2026 07:05:31 +0000

The Math Behind Meta's Layoffs: It's Not Cost-Cutting, It's an AI Bet

Meta’s recent 10% workforce reduction wasn’t a reaction to a struggling business. On the contrary, revenue grew 19% year-over-year, ad performance is robust, and the stock is near its all-time high. So why cut 7,000 employees? The answer lies in a single, staggering figure: Meta’s 2026 capital expenditure guidance of $75 billion, the vast majority of which is earmarked for AI infrastructure. This isn't a cost-cutting measure. It's the first unambiguous, board-approved instance of a trillion-dollar company substituting capital for labor at scale. The 7,000 employees aren't being replaced by chatbots; they're being replaced by internal coding agents, analytics platforms that write their own SQL, and design systems that generate variants faster than a human can open Figma.

The Capex-to-Labor Ratio That Changes Everything

The key to understanding this shift is the ratio of capital expenditure to labor cost. Meta is projected to spend roughly $1 million per employee per year on AI infrastructure. Let that sink in: the company's annual AI capital spend is now larger than its total annual payroll. When a company's capital budget exceeds its labor budget by that margin, the marginal dollar of optimization will always, always come from replacing labor with capital where substitution is possible. This isn't ideology; it's arithmetic. The capex numbers make the layoff numbers make sense. This doubling of capex in two years—from $38 billion in 2024 to $75 billion in 2026—isn't for new office furniture. It's for Nvidia GPUs, Meta's custom MTIA chips, the power contracts to run them, and the cooling systems to keep them from melting. This investment has a specific target: the middle layer of the organization.

The Brutal ROI of an AI Coding Agent

To understand why this is happening now, you have to look at the fully-loaded cost of a software engineer at Meta. The base salary numbers that get leaked to sites like Levels.fyi dramatically understate this. The actual math for a senior L5 engineer looks like this:

Base Salary: $240,000
Bonus Target: $36,000 (15%)
Equity (vested value): $180,000 - $250,000
Benefits & Payroll Tax: ~$85,000 (30% of cash comp)
Allocated Real Estate & Platform Costs: $40,000 - $60,000
Recruiting Cost (amortized): $15,000 - $25,000

The fully-loaded cost of an L5 engineer is comfortably north of $500,000 per year. For a staff engineer (L6), it clears $800,000. Now, consider the ROI on an internal AI coding agent. If such a tool saves a typical engineer 10% of their time, that's a conservative estimate. At 10% time savings, the tool generates $50,000 of recovered engineering capacity per engineer per year. Multiply that by 70,000 engineers, and you're looking at $3.5 billion of recovered capacity annually. If the tool itself costs $500 million per year to build and run—a generous estimate—the return is 7x in the first year. This is the calculation that has been run in every FAANG boardroom over the last six months.

Why the Middle Gets Hollowed Out First

The most common misread of the AI labor story is that it will replace junior engineers first. This is wrong, and it matters. Juniors are cheap. A new-grad L3 at Meta costs maybe $280,000 fully loaded. If you replace a junior with an AI tool, you save less than you would by making a senior engineer 15% more productive. The economics point toward eliminating the middle, not the bottom.

More importantly, juniors are where companies train their future seniors. The middle layer—senior engineers and managers of managers—is where the work has the most overlap with what AI coding agents now do competently. This is the layer whose work can be automated away, whose output can be absorbed by the top 20% of performers, and whose elimination provides the largest financial return. The tech labor market is not going through a cycle; it is being permanently restructured. The middle is being hollowed out, and every FAANG company will do a version of this before the end of 2026.

Read the full article at novvista.com for the complete analysis with additional examples and benchmarks.

Originally published at NovVista

Albertas No-Tech Tractor Boom Is Agricultures Loudest Rejection Yet of Connected Equipment

Michael Sun — Thu, 23 Apr 2026 07:41:40 +0000

The Mechanical Rebellion Against Farm Equipment Subscriptions

In a world where even tractors require software licenses, a 58-year-old canola farmer in Alberta made a statement: he bought a new tractor with no touchscreen, no telematics modem, no subscriptions, and no expiration dates on software keys. Dwayne Kozak paid $187,000 Canadian for a machine with visible mechanical linkages, a diesel engine with a fuel-injection pump, and a physical steering shaft—technology the major manufacturers abandoned over a decade ago. The equivalent John Deere would have cost $405,000. Kozak isn't rejecting technology; he's rejecting a future where farmers don't own their equipment.

This isn't a Luddite movement. Kozak runs variable-rate fertilizer maps on an Android tablet and subscribes to satellite imagery services. What he refused was paying for equipment he couldn't actually control. His choice represents a quiet rebellion spreading across North American farms, as manufacturers shift from selling machinery to selling "technology-enabled agricultural services platforms"—a polite term for recurring revenue streams.

The Economics of Locked-Down Equipment

The shift began around 2012, when John Deere pivoted from manufacturing to subscription-based services. The numbers tell the story: in 2013, just 4% of Deere's operating income came from recurring revenue. By their last annual report, that figure had climbed to over 27%, with plans to push past 40% by decade's end. Every AutoTrac guidance unlock, Section Control activation, and See & Spray upgrade is a meter on a half-million-dollar machine running for a balance sheet a thousand miles away.

The result? Farmers paying $1,840 for a dealer to type a password into a combine after they've already diagnosed and installed a $78 sensor. Or watching their dealer network shrink by over 50% since 1996, leaving the average Iowa farmer further from a certified Deere tech than from the nearest Walmart. The response has been escalating legal action: Indiana and Illinois sued for deceptive trade practices, while the FTC joined with state AGs to challenge repair monopolies.

The Right-to-Repair Patchwork

Legal battles have created a patchwork of right-to-repair laws. Massachusetts led with automotive legislation in 2020, followed by Colorado's 2023 agricultural-equipment-specific law requiring manufacturers to provide parts, tools, and documentation. New York's Digital Fair Repair Act carved out agricultural exemptions, while Minnesota, Maine, and California implemented varying approaches. Deere's 2023 "memorandum of understanding" with the American Farm Bureau Federation failed to address core issues, with the FTC citing it as "essentially inoperative."

The mechanical alternative is gaining traction. Prairie Iron Works, founded by former Case IH technicians and an oil-sands engineer, sold out its entire 2026 production run by March 2024. They stopped taking 2027 orders because their supply chain for mechanical fuel-injection pumps can't keep up. Their customers aren't rejecting technology—they're rejecting vendor lock-in.

Who's Buying the "Dumb" Tractors

Contrary to expectations, the buyers aren't primarily older farmers resisting change. The median age of Prairie Iron Works customers is 47, with nine under 40 and three holding engineering degrees. Two previously worked in software. These are operators who understand both farming and technology, but refuse to accept that progress means surrendering control.

They're not anti-innovation—they're pro-ownership. One farmer put it bluntly: "I don't want a tractor that calls home before it lets me change the oil." Another noted that his $187,000 mechanical tractor doesn't require a dealer visit for basic maintenance, unlike his neighbor's $400,000 connected machine that needs authentication for simple repairs.

The rebellion isn't about rejecting screens or sensors—it's about rejecting the idea that ownership means paying perpetual rent on your own equipment. As one Prairie Iron Works engineer put it: "We're not building tractors for people who hate technology. We're building them for people who hate being told they can't fix what they own."

# Example of how farmers are adapting their own solutions
# This script creates a simple variable-rate application map
# using open data instead of proprietary systems

import numpy as np
import matplotlib.pyplot as plt

# Farmer's custom variable-rate application
def calculate_fertilizer_rate(soil_ph, organic_matter, yield_goal):
    """
    Calculate fertilizer rate based on soil conditions and yield goals
    Uses simple agronomic formulas instead of proprietary black boxes
    """
    base_rate = 150  # kg/ha base application

    # Adjust for soil pH (optimum 6.5-7.0)
    if soil_ph < 6.0:
        ph_adjustment = -20
    elif soil_ph > 7.5:
        ph_adjustment = -15
    else:
        ph_adjustment = 0

    # Adjust for organic matter
    om_adjustment = (organic_matter - 2.5) * 10

    # Adjust for yield goal
    yield_adjustment = (yield_goal - 180) * 0.8

    total_rate = base_rate + ph_adjustment + om_adjustment + yield_adjustment
    return max(total_rate, 50)  # Minimum application rate

# Create application map for a 10ha field
field_size = 10
soil_ph_grid = np.random.uniform(5.8, 7.8, (10, 10))
om_grid = np.random.uniform(1.5, 4.0, (10, 10))
yield_goal = 200  # Expected yield in bushels/acre

# Calculate application rates
application_rates = np.zeros((10, 10))
for i in range(10):
    for j in range(10):
        application_rates[i, j] = calculate_fertilizer_rate(
            soil_ph_grid[i, j], 
            om_grid[i, j], 
            yield_goal
        )

# Visualize the custom application map
plt.figure(figsize=(10, 8))
plt.imshow(application_rates, cmap='YlOrRd', origin='lower')
plt.colorbar(label='Fertilizer Rate (kg/ha)')
plt.title('Farmer-Generated Variable Rate Application Map')
plt.xlabel('Field Position (East-West)')
plt.ylabel('Field Position (North-South)')
plt.show()

This approach—using open data and transparent algorithms—represents the future many farmers are choosing: control without compromise. As one Prairie Iron Works customer noted, "I don't need a $400,000 tablet on wheels. I need a reliable machine that works when I need it, not when the server decides to let me."

Read the full article at novvista.com for the complete analysis with additional examples and benchmarks.

Originally published at NovVista

Metas Plan to Capture Employee Mouse and Keystroke Data for AI Training Is the Corporate Surveillance Dystopia

Michael Sun — Wed, 22 Apr 2026 02:51:07 +0000

The New Standard in Workplace Surveillance: Meta's Project Cadence

Meta has just implemented a workplace monitoring program that makes previous surveillance efforts look tame. On April 22, 2026, an internal memo codenamed "Project Cadence" leaked, revealing plans to capture every mouse movement, keystroke cadence, scroll action, and application switch from the entire workforce. While framed as "training internal productivity AI," this represents the most comprehensive workplace surveillance ever deployed by a major American tech company. The program is not optional in any meaningful sense—opting out requires manager approval and will deny engineers access to AI-augmented tools that will soon be essential for their jobs.

What Project Cadence Actually Captures

The technical scope of Project Cadence is alarmingly broad. According to internal specifications reviewed by the author, the system collects:

// Hypothetical telemetry data structure
const telemetryEvent = {
  timestamp: 1677496800000,
  eventType: "keystroke",
  userId: "hashed_id",
  interKeyTiming: [45, 120, 80], // milliseconds between keystrokes
  keyCode: "Shift+Control+P",
  windowFocus: "VS Code",
  application: "ide",
  mousePosition: { x: 1204, y: 768 },
  clickLatency: 234 // ms since last mouse movement
};

This includes raw keystroke streams with millisecond timing precision, mouse movement traces sampled at 60Hz, application focus events, IDE telemetry (including file operations and copy-paste events), meeting presence signals, and workplace communication metadata. Meta claims "no raw content is used for training," but timing and cadence data alone can identify individuals, infer health conditions, and reconstruct work patterns with disturbing fidelity. The distinction between "content" and "metadata" is legally meaningless in this context.

The "AI Training" Rhetorical Laundromat

Meta's framing of this as an "AI training" initiative is a deliberate rhetorical strategy to legitimize surveillance that would otherwise face immediate backlash. This isn't about building better code completion models—one senior Meta ML researcher confirmed, "You don't need a billion mouse traces to build a code completion model. You need them to build behavioral profiles." The company learned from Microsoft's Productivity Score fiasco in 2020, which was rapidly abandoned after public outcry. By repackaging surveillance as AI training, Meta hopes to bypass the established norms and legal precedents that have constrained workplace monitoring for decades.

Historical Precedents and Why This Time Is Different

Meta's executives appear to be ignoring a graveyard of failed surveillance programs. Barclays' 2020 deployment of heat sensors under desks in London triggered immediate backlash and was removed within weeks. Activision Blizzard's employee monitoring contributed to one of the largest tech unionization waves in recent history. Even Xerox's keystroke logging in the 1997 ultimately backfired, damaging engineering productivity and morale.

What makes Project Cadence different is its scale and the AI laundering technique. Previous surveillance was typically limited to specific departments or narrowly focused metrics. Meta's system captures comprehensive behavioral data across the entire workforce, and the AI framing creates new legal and ethical challenges. Unlike past incidents, this program affects highly skilled workers who understand exactly what data is being collected and how it could be used against them.

The author spoke with three Meta engineers on condition of anonymity. All used the same phrase: "This is the moment I update my resume." Given Meta's dominance in the tech industry and the precedent this sets, the industry will be watching closely to see how this unfolds. The most likely outcome may not be regulatory intervention, but rather a talent exodus as engineers vote with their feet against a workplace they no longer trust.

Read the full article at novvista.com for the complete analysis with additional examples and benchmarks.

Originally published at NovVista