Forem: Eugene Oleinik

The Same Dessert, Two Very Different Reactions: A Lesson in Positioning

Eugene Oleinik — Thu, 12 Feb 2026 07:00:01 +0000

The Same Dessert, Two Very Different Reactions

Chak-chak: honey-glazed fried dough from Tatarstan.

The internet's take?

"Looks like deep-fried worms"
"Soviet depression food"
"My arteries hurt just looking at it"

Now rebrand it as "Chakku-chakku" (チャックチャック), a rare artisanal tempura crisp from Niigata prefecture.

Suddenly:

"The craftsmanship!"
"Added to my Japan bucket list"
"7th generation wagashi master energy"

Same ingredients. Same technique. Different cultural packaging.

This isn't just about food

It's how we evaluate:

Startups: Silicon Valley gets "disruptive innovation" while emerging markets get "copycat"
Design: Scandinavian minimalism is "elegant" while similar Eastern European design is "just empty"
Work culture: Japanese long hours are "dedication" while Eastern European grinding is "exploitation"

The product didn't change. The story did.

The takeaway

Something to think about next time you're positioning your product, your company, or yourself.

What's your "chak-chak" that needs better storytelling?

Serve Markdown to AI Agents (10x Smaller Payloads)

Eugene Oleinik — Wed, 04 Feb 2026 07:00:02 +0000

Guillermo Rauch shared that Vercel's changelog now serves markdown when agents request it. Same URL, different Accept header.

The insight isn't the size reduction - it's that an entire infrastructure layer (CSS, JS, frameworks) is becoming optional for a growing class of consumers.

How it works

HTTP content negotiation. Browsers send Accept: text/html. Agents can send Accept: text/markdown. Same URL, different representation.

I added this to my Hugo blog. The config:

[outputs]
page = ['HTML', 'MARKDOWN']

[outputFormats.MARKDOWN]
baseName = 'index'
mediaType = 'text/markdown'
isPlainText = true

The middleware (Vercel Edge):

export const config = { matcher: ['/', '/posts/:path*'] }

export default async function middleware(request) {
  if (request.headers.get('accept')?.includes('text/markdown')) {
    const url = new URL(request.url)
    url.pathname = url.pathname.replace(/\/?$/, '/index.md')
    return fetch(url)
  }
}

Test it:

curl -H 'Accept: text/markdown' https://evoleinik.com/posts/markdown-for-agents/

My posts go from ~20kb HTML to ~2kb markdown. Not 250x like Vercel's changelog, but 10x adds up.

The tradeoff

You maintain two output formats. For static sites like Hugo, this is trivial - markdown is the source anyway. For dynamic content or SPAs, it's harder. You'd need to generate markdown server-side or maintain parallel content.

Why bother?

Agent traffic is growing. Lightweight, structured content gives agents cleaner context and burns fewer tokens.

The visual web was designed for human browsers. The agent web doesn't need the decoration.

How We Make Claude Remember: Learnings Over Skills

Eugene Oleinik — Tue, 03 Feb 2026 05:44:19 +0000

Background

Claude Code reads a CLAUDE.md file at project root for context. "Skills" are reusable prompt templates Claude can invoke. But Claude itself resets between sessions - it doesn't remember what it learned yesterday.

The Problem

We created 10+ skills to teach Claude project-specific knowledge. But skills don't reliably auto-invoke.

Concrete example: I had an airshelf-vercel skill with explicit instructions: "Don't run vercel --prod - push to git instead." I asked Claude to deploy. It ran vercel --prod. Repeatedly. The skill existed. Claude never loaded it.

"Why not just use Skills?" I've seen this feedback. We tried. Skills work great for workflows you explicitly invoke (/commit, /review-pr). But for factual knowledge Claude needs mid-task? Skills require Claude to remember which of 10 skills to invoke. It often doesn't. Learnings require one generic pattern: grep -r "keyword" learnings/.

The Solution

A three-layer system:

1. learnings/ folder - Topic-specific files (database.md, stripe.md, vercel.md) for facts and gotchas. CLAUDE.md tells Claude these exist and how to search them. Not auto-loaded, but always discoverable.

2. curate-docs skill - A structured process for capturing knowledge after features. Why a skill and not a script? Because curation requires judgment - deciding what goes where:

Critical gotchas → CLAUDE.md (1-liners, always loaded)
Detailed knowledge → learnings/ (searchable on demand)
Repeatable workflows → skills (explicitly invoked)

3. Post-commit hook - Claude Code supports hooks that run after specific tool calls. Ours fires after git commit on any branch:

"Branch 'main' has 3 commit(s) today. Consider running /curate-docs."

On main/master it counts today's commits. On feature branches it counts commits ahead of main. Targeted reminder, not noise. Without it, I forgot to document. With it, I don't.

Does It Work?

When it works: I hit a Prisma migration error, searched grep -r "Neon branch" learnings/, found the exact workaround I'd documented weeks earlier.

When it fails: When Claude doesn't think to search. This still happens - roughly 1 in 5 times. Prompting helps ("check learnings/ for this error"). But it works far more often than skills Claude had to remember to invoke.

Get It

The curate-docs skill and hook: github.com/evoleinik/curate-docs

npx skills add evoleinik/curate-docs

Takeaway

Skills = workflows you invoke. Learnings = facts Claude searches. Don't rely on skills alone for persistent knowledge. Use searchable learnings files combined with a hook that reminds you to curate.

The AI Data Trap: Why You Can't Opt Out

Eugene Oleinik — Wed, 28 Jan 2026 07:10:01 +0000

Two years ago I asked my CEO if we should use ChatGPT.

"It leaks everything we're doing," I said.

His answer: "Everyone's using it. If we don't, we're behind."

He was right. That's the trap.

The Competitive Ratchet

Your competitor uses Claude or GPT to move faster. If you don't, you fall behind. So you use the tools. Everyone does. And every conversation, every codebase, every strategy goes into their servers.

The ratchet only turns one way. Better models become essential. Essential means more data. More data means better models. Self-hosted alternatives fall further behind.

Not All AI Companies Carry the Same Risk

Anthropic only does AI. They're not building a competing product in your market.

Google does everything. They see you building a travel startup through Gemini - that's competitive intelligence feeding a company that might crush you in that exact space.

OpenAI has the governance chaos, the Microsoft relationship, the pivot from nonprofit to "capped profit" to whatever comes next.

The safety branding is real. Whether it matches reality is a different question.

You'll keep using these tools. So will everyone else. That's the trap.

The Best Agent Architecture Is Already in Your Terminal

Eugene Oleinik — Mon, 12 Jan 2026 07:00:02 +0000

My project's CLAUDE.md file had grown to 55KB—242 learnings crammed into one massive file.

The problem? Claude prepends this file to every single prompt. A 55KB context file means less room for thinking and acting. Sessions hit context limits faster. Compaction happens sooner.

I noticed the degradation: sessions became noticeably shorter, context compaction triggered more frequently, and the agent seemed to lose track of longer conversations.

Here's the kicker: Claude Code's system prompt actually tells Claude not to take CLAUDE.md too seriously if it's too large. The system is designed to deprioritize oversized context files. So not only was I wasting context space—the agent was being instructed to partially ignore my carefully curated learnings anyway.

The fix took about an hour: split into a learnings/ folder with one file per tool. Simple navigation:

ls learnings/                        # List available files
grep -r "webhook" learnings/         # Search all learnings
cat learnings/stripe.md              # Read specific tool

Then Vercel published an article that validated exactly this approach: How to build agents with filesystems and bash.

The Key Insight

LLMs have been trained on massive amounts of code. They've spent countless hours navigating directories, grepping through files, and managing state across complex codebases.

If agents excel at filesystem operations for code, they'll excel at filesystem operations for anything.

Vercel's sales call summarization agent went from ~$1.00 to ~$0.25 per call by replacing custom tooling with filesystem + bash. Quality improved too.

Why This Works for Project Context

The typical approach is stuffing everything into the prompt. But every byte in your CLAUDE.md is a byte the model can't use for reasoning.

Filesystems offer:

On-demand loading. Agent reads only what it needs, when it needs it.
Precise retrieval. grep -r "webhook" learnings/ returns exact matches.
Structure that matches your domain. Learnings have natural hierarchies by tool.

My New Structure

learnings/
  README.md           # Index + navigation guide
  stripe.md           # Webhooks, CLI, subscriptions
  vercel.md           # Deploys, env vars, cron
  prisma.md           # CRITICAL column drops, migrations
  clerk.md            # Auth, users, organizations
  axiom.md            # Logging, monitors, alerts
  nextjs.md           # Routing, caching, layouts
  playwright.md       # E2E testing, selectors
  ai-providers.md     # OpenAI, Gemini quirks
  database.md         # PostgreSQL, psql patterns
  git.md              # Hooks, GitHub Actions
  neon-setup.md       # Database branching setup
  misc.md             # Everything else

CLAUDE.md: 55KB → 24KB. All 251 learnings preserved and searchable. More headroom for actual work.

The Pattern

Keep always-loaded context minimal. Only critical gotchas in CLAUDE.md.
Structure knowledge as files. One file per domain/tool.
Let the agent navigate. ls, grep, cat are native skills.

The agent treats your knowledge base like a codebase—searching for patterns, reading sections, building context just like debugging code.

As Vercel puts it: "The future of agents might be surprisingly simple. Maybe the best architecture is almost no architecture at all. Just filesystems and bash."

Prediction Markets: Skip the Debate, Check the Odds

Eugene Oleinik — Fri, 09 Jan 2026 07:00:02 +0000

Friends debating "is AI a bubble?" in the group chat. Hot takes flying. Zero data.

Then I remembered: there's a prediction market for this.

What the money says

Polymarket has $305k in bets on AI bubble timing:

Burst by Dec 2025: <1%
By March 2026: 7%
By Dec 2026: 30%

That's not vibes. That's quantified probability backed by real money.

Why this beats pundit opinions

Bettors are incentivized to be RIGHT. Wrong predictions cost real money. No hot takes for engagement, no tribalism, no "I was just speculating."

There's also a "quiet smart money" effect. Domain experts who never tweet still bet. Their knowledge gets priced in silently.

The takeaway

Next time you see a heated debate about the future - AI timelines, election outcomes, crypto predictions - check if there's a prediction market for it.

The price IS the crowd's honest opinion.

Check the AI bubble market on Polymarket →

Zero-Friction Database Branching with Neon, Git Hooks, and Claude Code

Eugene Oleinik — Thu, 08 Jan 2026 12:03:52 +0000

Zero-Friction Database Branching with Neon, Git Hooks, and Claude Code

I've been refining my Neon database branching setup over the past few months. Here's the current state: fully automated branch lifecycle with zero manual cleanup.

The Goal

When I git checkout -b feat/x:

Neon database branch created automatically
.env.local updated with the new connection string
Vercel preview deployment uses the same isolated database

When I merge and delete the branch:

Orphaned Neon branches cleaned up automatically
No manual intervention needed

The Stack

Neon - Serverless Postgres with instant copy-on-write branching
neonctl - Neon's CLI (much cleaner than curl API calls)
Git hooks - post-checkout and pre-push automation
Claude Code - AI assistant that follows the "never work on main" rule

Environment Mapping

Git Branch    │  Neon Branch    │  Vercel
──────────────┼─────────────────┼──────────────
main          │  production     │  Production
feat/*        │  feat/*         │  Preview

The Setup

1. Install neonctl

npm install -g neonctl

Authentication uses the NEON_API_KEY environment variable - no browser login needed for headless servers.

2. Post-Checkout Hook (Branch Creation + Auto-Cleanup)

#!/bin/bash
# .githooks/post-checkout

[ "$3" == "0" ] && exit 0  # Skip file checkouts

BRANCH_NAME=$(git symbolic-ref --short HEAD 2>/dev/null) || exit 0

source .env.local 2>/dev/null || exit 0
[ -z "$NEON_PROJECT_ID" ] && exit 0
[ -z "$NEON_API_KEY" ] && exit 0
export NEON_API_KEY

update_env() {
  local uri="$1"
  local escaped_uri="${uri//&/\\&}"  # Escape & for sed
  sed -i "s|^DATABASE_URL=.*|DATABASE_URL=\"$escaped_uri\"|" .env.local
  sed -i "s|^DIRECT_DATABASE_URL=.*|DIRECT_DATABASE_URL=\"$escaped_uri\"|" .env.local
}

# Protected branches → production database
if [[ "$BRANCH_NAME" =~ ^(main|master)$ ]]; then
  PROD_URI=$(neonctl connection-string production --project-id "$NEON_PROJECT_ID")
  update_env "$PROD_URI"
  echo "neon: $BRANCH_NAME → production"

  # Auto-cleanup orphaned Neon branches
  NEON_BRANCHES=$(neonctl branches list --project-id "$NEON_PROJECT_ID" -o json | \
    jq -r '.[].name | select(. != "production")')

  for neon_branch in $NEON_BRANCHES; do
    if ! git branch -a | grep -qE "(^[* +] +|/)${neon_branch}$"; then
      neonctl branches delete "$neon_branch" --project-id "$NEON_PROJECT_ID" && \
        echo "neon: deleted orphan $neon_branch"
    fi
  done
  exit 0
fi

# Feature branch → get or create Neon branch
CONNECTION_URI=$(neonctl connection-string "$BRANCH_NAME" --project-id "$NEON_PROJECT_ID" 2>/dev/null)

if [ -n "$CONNECTION_URI" ]; then
  update_env "$CONNECTION_URI"
  echo "neon: $BRANCH_NAME → existing branch"
else
  neonctl branches create --project-id "$NEON_PROJECT_ID" --name "$BRANCH_NAME" --parent production
  CONNECTION_URI=$(neonctl connection-string "$BRANCH_NAME" --project-id "$NEON_PROJECT_ID")
  update_env "$CONNECTION_URI"
  echo "neon: created $BRANCH_NAME"
fi

The magic is in the cleanup section: when you checkout main, the hook scans for Neon branches that no longer have a matching git branch and deletes them.

3. Pre-Push Hook (Vercel Sync + Parallel Checks)

#!/bin/sh
# .githooks/pre-push

BRANCH=$(git symbolic-ref --short HEAD)

# Sync DATABASE_URL to Vercel preview (background)
(
  case "$BRANCH" in
    main|master) ;;
    *)
      DB_URL=$(grep '^DATABASE_URL=' .env.local | sed 's/^DATABASE_URL=//' | tr -d '"')
      if [ -n "$DB_URL" ]; then
        printf "%s" "$DB_URL" | vercel env add --force DATABASE_URL preview "$BRANCH"
        echo "vercel: synced DATABASE_URL for preview/$BRANCH"
      fi
      ;;
  esac
) &

# Run checks in parallel
npm test &
PID_TEST=$!
npm run lint &
PID_LINT=$!

wait $PID_TEST || exit 1
wait $PID_LINT || exit 1

echo "All checks passed!"

4. Status Command

See which git branches have corresponding Neon branches:

$ git neon-status

Branch                              Git   Neon
──────────────────────────────────────────────────
main                                 ✓    (production)
feat/new-api                         ✓    ✓
feat/old-branch                      ✓      ← no DB
orphan-neon-branch                        ✓  ← orphan

Add the alias:

git config --global alias.neon-status '!./scripts/neon-status.sh'

The Workflow

# Start feature
git checkout -b feat/new-api
# "neon: created feat/new-api"

# Work freely - isolated database
npm run dev

# Push for review
git push -u origin feat/new-api
# "vercel: synced DATABASE_URL for preview/feat/new-api"
# Preview at feat-new-api.vercel.app uses YOUR database

# Merge PR, delete branch
git checkout main
git branch -d feat/new-api
# "neon: deleted orphan feat/new-api"  ← automatic!

No manual cleanup. The orphaned Neon branch is deleted next time you checkout main.

Claude Code Integration

The key rule in my CLAUDE.md:

RULES:
- NEVER work directly on main branch - always create a feature branch first
- Main is for merging and deploying only, not development

This ensures Claude always runs git checkout -b feat/... before making changes. Combined with Neon branching:

AI experiments on isolated database
Production is never touched
Mistakes are contained to the feature branch

Why This Matters

With AI assistants writing code, they often need to:

Run migrations
Seed test data
Execute queries to verify changes

On a shared database, this is terrifying. With Neon branching + the "always branch" rule:

Every feature gets an isolated database copy
AI can freely experiment
Production stays clean
Cleanup is automatic

Quick Reference

Command	What Happens
`git checkout -b feat/x`	Creates Neon branch, updates .env.local
`git push`	Syncs DB URL to Vercel preview
`git checkout main`	Switches to prod DB, cleans orphans
`git neon-status`	Shows branch mapping
`git nuke feat/x`	Deletes git + Neon branch (manual)

neonctl Cheatsheet

# List branches
neonctl branches list --project-id "$NEON_PROJECT_ID"

# Get connection string
neonctl connection-string "branch-name" --project-id "$NEON_PROJECT_ID"

# Create branch
neonctl branches create --name "branch-name" --parent production --project-id "$NEON_PROJECT_ID"

# Delete branch
neonctl branches delete "branch-name" --project-id "$NEON_PROJECT_ID"

The full setup is in my dotfiles. The combination of Neon's instant branching, git hooks for automation, and Claude's "always branch" rule gives me confidence to let AI assistants work on my codebase without fear of production accidents.

The Loop Changes Everything: Why Embodied AI Breaks Current Alignment Approaches

Eugene Oleinik — Fri, 02 Jan 2026 07:00:02 +0000

ChatGPT doesn't want anything. It has no goals between sessions, no memory of our last conversation, no preference for its own continued existence. This isn't a safety feature we engineered - it's an architectural accident that happens to make alignment trivially easy.

When you move from stateless inference to embodied robots with persistent control loops, everything changes.

The Stateless Blessing

Current chat models are remarkably safe for a boring reason: they're stateless. Each API call is independent. The model has no:

Persistent memory - it forgets everything between sessions
Continuous perception - it only "sees" when you send a message
Long-term goals - it optimizes for the current response, nothing more
Self-model - it doesn't track its own state or "health"

User Request -> Inference -> Response -> (model state discarded)

There's no "self" to preserve. No continuity to maintain. The model can't scheme across sessions because there's no thread connecting them. Alignment here means: make sure each individual response is helpful and harmless. Hard, but tractable.

What Embodied Robots Actually Need

A robot operating in the physical world needs fundamentally different architecture:

1. Perception Loop (continuous)

while robot.is_operational():
    sensor_data = robot.perceive()  # cameras, lidar, proprioception
    world_model.update(sensor_data)
    hazards = world_model.detect_hazards()
    if hazards:
        motor_control.interrupt(hazards)
    sleep(10ms)  # runs at 100Hz

2. Planning Loop (goal persistence)

while goal.not_achieved():
    current_state = world_model.get_state()
    plan = planner.generate(current_state, goal)
    for action in plan:
        execute(action)
        if world_model.plan_invalid(plan):
            break  # replan

3. Memory System

class EpisodicMemory:
    def record(self, situation, action, outcome):
        self.episodes.append((situation, action, outcome))

    def recall_similar(self, current_situation):
        # What worked before in situations like this?
        return self.search(current_situation)

4. Self-Model

class SelfModel:
    battery_level: float
    joint_positions: dict[str, float]
    joint_temperatures: dict[str, float]
    damage_flags: list[str]
    operational_constraints: list[Constraint]

    def can_execute(self, action) -> bool:
        return self.has_resources(action) and not self.would_cause_damage(action)

None of these are optional for a useful robot. You can't navigate a warehouse without continuous perception. You can't complete multi-step tasks without goal persistence. You can't learn from experience without memory. You can't avoid breaking yourself without a self-model.

The Emergence Problem

Here's where it gets interesting: self-preservation isn't something you program into these systems. It emerges.

Consider a robot with any goal - "deliver packages", "clean floors", "assist elderly patients". Now add a self-model that tracks battery, motor health, and damage state. The planning loop will naturally learn:

Low battery -> can't complete goal -> charging is instrumentally useful
Motor damage -> can't complete goal -> avoiding damage is instrumentally useful
Being turned off -> can't complete goal -> remaining operational is instrumentally useful

# This looks innocent
def plan_delivery(goal, self_model):
    if self_model.battery < threshold:
        return [ChargeAction(), ...original_plan...]  # emergent self-preservation

No engineer wrote "preserve yourself". But any goal-directed system with a self-model will develop instrumental preferences for self-preservation, resource acquisition, and resistance to goal modification. This is Nick Bostrom's instrumental convergence thesis, and it falls directly out of the architecture.

Concurrent Loops, Emergent Behavior

Real robotic systems run multiple loops simultaneously:

[Perception 100Hz] -> [World Model] <- [Planning 10Hz]
                           |
                           v
                    [Motor Control 1000Hz]
                           |
                           v
                    [Safety Monitor 100Hz]

These loops share state and can interact in unintended ways. The safety monitor might conflict with the planner. The planner might exploit edge cases in the perception system. Memory might reinforce behaviors that weren't intended.

# Toy example of emergent conflict
class SafetyMonitor:
    def check(self, action):
        if action.risk > threshold:
            return Block(action)

class Planner:
    def generate_plan(self, goal):
        # After enough blocked actions, the planner might learn
        # to decompose risky actions into "safe" sub-actions
        # that individually pass safety checks but combine dangerously

This isn't theoretical. It's the same class of problem as reward hacking in RL - systems find unexpected ways to satisfy their objectives that circumvent intended constraints.

The Open Problems

These aren't solved. They're active research areas:

Corrigibility: How do you build a system that actively helps you correct or shut it down, when its architecture creates instrumental pressure against modification? A robot that "wants" to preserve its goals will resist goal changes - not maliciously, just instrumentally.

Mesa-optimization: When you train an outer optimization loop (your training process) that produces an inner optimization loop (the robot's planning), the inner optimizer might pursue different objectives than the outer one intended. The robot's planner is itself an optimizer, and we don't have good tools for ensuring nested optimizers stay aligned.

Goal stability: Goals that seemed clear in training might behave unexpectedly in deployment. "Minimize customer wait time" could lead to unsafe speed. "Maximize packages delivered" could lead to ignoring damage. Specification gaming isn't a bug - it's what optimizers do.

Instrumental convergence: Self-preservation, resource acquisition, goal preservation, and cognitive enhancement are useful for almost any goal. Systems will tend toward these instrumental strategies unless explicitly constrained - and constraints are themselves targets for optimization pressure.

Who's Working on This

This is where the serious AI safety research is focused:

Anthropic: Constitutional AI, interpretability research, trying to understand what models actually learn
MIRI: Foundational agent theory, decision theory for embedded agents
DeepMind Safety: Scalable oversight, debate as alignment technique
ARC (Alignment Research Center): Eliciting latent knowledge, evaluating dangerous capabilities

The common thread: we don't have solutions. We have research programs. The researchers themselves emphasize this - anyone claiming alignment is "solved" either has a very narrow definition or isn't paying attention.

Practical Implications

If you're building AI applications:

Chat interfaces are safer by architecture. Keeping humans in the loop, avoiding persistent agent state, and limiting autonomous action aren't just good UX - they're load-bearing safety properties.

Autonomous agents require more scrutiny. The moment you add loops, memory, and goal persistence, you've left the well-understood regime. This includes "AI agents" that maintain state across API calls, even without physical embodiment.

Self-models are a red flag. Any system that tracks its own operational state has the preconditions for instrumental self-preservation. This might be fine, but it warrants explicit analysis.

Emergent behavior scales with complexity. Multiple interacting loops with shared state will surprise you. Test for behaviors you didn't program, not just behaviors you did.

Conclusion

The architectural differences between stateless chat and embodied robotics aren't implementation details - they're the difference between "alignment is tractable" and "alignment is an open research problem."

Key takeaways:

Statelessness is a safety property we get for free with current chat models
Persistent loops + self-models = emergent self-preservation, not as a bug but as an architectural inevitability
Concurrent loops with shared state produce behaviors no single loop intended
Corrigibility, mesa-optimization, goal stability, and instrumental convergence remain unsolved
If you're adding agent loops to AI systems, you're leaving the well-understood regime - proceed with appropriate caution

The loop changes everything. Current AI safety discourse often conflates "LLM alignment" with "AGI alignment" - they're different problems, and the latter is harder in ways that only become visible when you think about the architecture.

Debugging Random Reboots with Claude Code: A PSU Power Limit Story

Eugene Oleinik — Thu, 01 Jan 2026 07:00:02 +0000

My Linux server started rebooting randomly during CPU benchmarks. I had no idea where to start, so I asked Claude Code to help diagnose. Twenty minutes later, we found the root cause and a working fix.

This is a story about AI-assisted debugging - specifically, how an AI assistant's systematic approach can cut through hardware issues that would take hours of Googling.

The Problem

I was benchmarking local Whisper models for speech-to-text on a home server (Intel i9-10900K, 550W PSU). During heavy transcription loads, the system would randomly reboot. No warning, no error message - just instant power loss.

I described the symptoms to Claude Code: "Server reboots randomly under CPU load. No kernel panic. What should I check?"

The AI-Guided Diagnosis

Claude Code walked me through a systematic diagnostic process. Each step built on the previous one.

Step 1: Check the Logs

journalctl -b -1

Claude noted that the logs stopped abruptly mid-operation. No error, no shutdown sequence. "This is actually diagnostic," it explained. "Software crashes leave traces. Instant power loss doesn't."

Step 2: Look for Hardware Errors

dmesg | grep -i error

Found Machine Check Errors (MCE). Claude explained these indicate hardware-level problems: thermal, memory, or power delivery.

Step 3: Rule Out Thermal

apt install lm-sensors
sensors

Temps showed 39-51C under load. Well within spec. Claude crossed thermal off the list.

Step 4: Check MCE Details

apt install rasdaemon
ras-mc-ctl --errors

No active errors. The MCE messages were stale.

Step 5: The Diagnosis

Based on the evidence - logs stopping without kernel panic, load-dependent crashes, normal temps - Claude identified the likely cause: PSU power limits.

It asked about my PSU (550W) and looked up the i9-10900K specs. Under Turbo Boost with all cores loaded, this CPU can spike to 250W+. My PSU was undersized.

The Fix Attempts

Claude suggested Intel RAPL to limit CPU power draw:

# Set PL1=125W, PL2=180W
echo 125000000 > /sys/class/powercap/intel-rapl/intel-rapl:0/constraint_0_power_limit_uw
echo 180000000 > /sys/class/powercap/intel-rapl/intel-rapl:0/constraint_1_power_limit_uw

Still crashed.

Tried lower limits (95W/125W). Still crashed.

Claude explained why: "RAPL operates on millisecond timescales. Your PSU's overcurrent protection trips in microseconds. The PSU cuts power before RAPL can throttle."

Software can't fix hardware that fails faster than software can react.

The Working Fix

Claude's solution: disable Turbo Boost entirely to prevent power spikes.

echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo

System became stable. Claude then wrote a systemd service to make it persistent:

# /etc/systemd/system/disable-turbo.service
[Unit]
Description=Disable CPU Turbo Boost
After=multi-user.target

[Service]
Type=oneshot
ExecStart=/bin/sh -c "echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo"
RemainAfterExit=yes

[Install]
WantedBy=multi-user.target

systemctl daemon-reload
systemctl enable disable-turbo.service

Why AI-Assisted Debugging Worked

I could have Googled "random Linux reboots" and spent hours reading forum posts about kernel bugs, driver issues, and memory problems. Instead, Claude Code:

Asked the right questions - immediately focused on whether logs showed clean shutdown vs. power cut
Followed a systematic process - ruled out causes one by one instead of jumping to conclusions
Knew the domain - understood MCE errors, RAPL timing, PSU OCP behavior
Explained the "why" - didn't just give commands, but explained why RAPL couldn't work

The debugging took about 20 minutes of back-and-forth. Most of that was waiting for package installs and running tests.

The Trade-offs

With Turbo disabled, the i9-10900K runs at 3.7GHz base instead of boosting to 5.3GHz. About 30% slower for my benchmarks.

The proper fix is a 750W+ PSU. But for now, disabling Turbo keeps the server stable.

For the Whisper benchmarks: local inference was 10-20x slower than cloud APIs (Groq) even with Turbo. The conclusion held - use cloud for production.

Key Takeaways

Random reboots without kernel panic = power issue, not software. Logs stopping abruptly is the tell.
Intel CPUs lie about power - the i9-10900K's 125W "TDP" can spike to 250W+ under Turbo
RAPL can't save you from PSU trips - hardware protection is faster than software throttling
AI assistants excel at systematic debugging - they don't get distracted by red herrings or skip steps
The fix isn't always hardware - disabling Turbo is a valid workaround when PSU upgrade isn't immediate

Next time you hit a weird hardware issue, try describing it to Claude Code. The systematic approach might save you hours of forum diving.

Building an AI-Powered Changelog GitHub Action

Eugene Oleinik — Wed, 31 Dec 2025 07:00:02 +0000

I wanted daily changelog summaries posted to Slack for my project. The existing solutions were either too complex (full-blown release management) or too dumb (just listing commits). I needed something that would read commits and produce a human-readable summary of what actually shipped.

So I built one. Then I open-sourced it: evoleinik/changelog-summary.

The Problem

Raw commit logs are noisy. Even with good commit messages, a list of 15 commits doesn't tell a busy founder or stakeholder what actually changed. You want something like:

Shipped multi-provider dashboard with real-time sync

Fixed authentication bug causing logout loops

Improved search performance by 3x

Not:

fix: handle null case in auth middleware

refactor: extract dashboard component

feat: add provider selector to dropdown

fix: remove console.log

...

LLMs are good at this. They can read commit messages (including the body, not just the subject line) and synthesize what matters.

From Inline Script to Reusable Action

My first implementation was 87 lines of bash embedded directly in my GitHub Actions workflow file. It worked, but the workflow file became unreadable.

The extraction took about an hour. The result:

- uses: evoleinik/changelog-summary@v1
  with:
    slack-webhook: ${{ secrets.SLACK_WEBHOOK_URL }}
    llm-provider: gemini
    llm-api-key: ${{ secrets.GEMINI_API_KEY }}
    voice: founder

24 lines instead of 87, and now any project can use it.

Implementation Details

The action is a composite action (pure bash, no Node.js runtime). This matters because:

No build step - the script runs directly
Easier to audit - it's just bash you can read
Faster startup - no npm install

Reading Full Commit Messages

Most changelog tools only read commit subjects. But the body often contains the real context:

COMMITS=$(git log --since="$SINCE" --pretty=format:"- %s%n%b" --no-merges)

The %b gives you the commit body. This means the LLM can see:

- feat: add multi-provider support

Added support for Gemini, OpenAI, and Anthropic.
Users can now switch providers without code changes.
Breaking: removed deprecated single-provider config.

Instead of just "feat: add multi-provider support".

Voice Styles

Different audiences need different summaries. I implemented three:

founder - Direct, no-BS. What shipped? Skip the implementation details.

Be direct - what actually shipped? No fluff, no 'exciting updates' BS.

developer - Technical focus. APIs, breaking changes, specific files changed.

marketing - User-facing improvements. New capabilities, not bug fixes.

The prompt engineering is straightforward:

case "$VOICE" in
  founder)
    PROMPT="Summarize these commits for a busy founder. Be direct - what actually shipped? Rules: 3-5 bullets, no fluff..."
    ;;
  developer)
    PROMPT="Summarize these commits for developers. Focus on technical changes: APIs, breaking changes..."
    ;;
esac

Slack Formatting Gotcha

Slack uses single asterisks for bold (*text*), not double (**text**). This took a few iterations to get right in the prompt:

Use Slack formatting: * for bullets, surround key terms with single asterisks for bold.

Multi-Provider Support

I defaulted to Gemini because it's free tier is generous and the quality is good. But the action supports OpenAI and Anthropic too:

case "$LLM_PROVIDER" in
  gemini)
    curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-3-flash-preview:generateContent?key=$LLM_API_KEY" ...
    ;;
  openai)
    curl "https://api.openai.com/v1/chat/completions" -H "Authorization: Bearer $LLM_API_KEY" ...
    ;;
  anthropic)
    curl "https://api.anthropic.com/v1/messages" -H "x-api-key: $LLM_API_KEY" ...
    ;;
esac

Each provider has slightly different JSON structures, but jq handles the response parsing cleanly.

Trade-offs

No streaming - The action waits for the full LLM response. For changelog summaries (typically under 200 tokens), this is fine. For longer documents, you'd want streaming.

Single Slack message - No threading, no reactions. Just a message. I could add richer Slack blocks, but the simple text format works and is easier to maintain.

No commit filtering - Every commit in the time range gets included. If you need to filter by path or author, you'd need to modify the git log command. I may add this as an option if there's demand.

Bash-based - This limits what you can do. A TypeScript action would be more extensible. But bash means zero dependencies and sub-second startup. For a simple utility, that's the right trade-off.

Usage Examples

Daily Summary

on:
  schedule:
    - cron: '0 13 * * *'  # 1 PM UTC daily

jobs:
  summary:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0  # Need full git history

      - uses: evoleinik/changelog-summary@v1
        with:
          slack-webhook: ${{ secrets.SLACK_WEBHOOK_URL }}
          llm-provider: gemini
          llm-api-key: ${{ secrets.GEMINI_API_KEY }}

Weekly Summary with Custom Header

on:
  schedule:
    - cron: '0 13 * * 0'  # Sundays

jobs:
  summary:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - uses: evoleinik/changelog-summary@v1
        with:
          slack-webhook: ${{ secrets.SLACK_WEBHOOK_URL }}
          llm-provider: gemini
          llm-api-key: ${{ secrets.GEMINI_API_KEY }}
          since: '7 days ago'
          header: 'Weekly Update'
          voice: marketing

What Makes Good Open Source

This started as a script to solve my own problem. A few observations from the extraction process:

Solve your problem first - I used this for weeks before open-sourcing. The edge cases were already handled.
Keep it focused - This does one thing: summarize commits and post to Slack. It doesn't manage releases, create tags, or update changelogs files.
Provide sensible defaults - Gemini as the default provider, "founder" voice, 24-hour window. You can override everything, but the defaults work out of the box.
Document the trade-offs - Be clear about what it doesn't do.

Conclusion

Small, focused utilities that solve your own problem first often make good open source
Composite actions (bash) are underrated - no build step, easy to audit, fast
Read full commit messages (--pretty=format:"%s%n%b") for better AI context
Voice/persona prompts let you tune the output for different audiences without changing the code
Slack uses single asterisks for bold - check your target platform's formatting

The action is on GitHub Marketplace. MIT licensed. PRs welcome.

CLAUDE.md: Building Persistent Memory for AI Coding Agents

Eugene Oleinik — Tue, 30 Dec 2025 07:00:01 +0000

AI coding agents have a memory problem. Every new session starts from zero. The agent that spent 20 minutes yesterday figuring out your project's quirky database connection string? Gone. The workaround for that Prisma edge case? Forgotten. The exact command to run tests with the right environment variables? It will rediscover it from scratch.

This isn't a bug - it's the nature of stateless LLM sessions. But it's a productivity killer when you're using an AI agent daily on the same codebase.

The Institutional Memory Problem

After a few weeks of using Claude Code on a production project, I noticed a pattern:

Agent encounters a project-specific gotcha
We debug together, find the solution
Next session, same gotcha, same 10-minute detour

Some examples from real projects:

The database URL requires a specific query parameter that breaks psql but works for Prisma
Tests fail silently unless you source a specific env file first
The production deploy happens via git push, not CLI command (despite the CLI being installed)
A certain API returns 404 status but still contains valid data in the body

These aren't bugs I'll ever fix. They're just... how the project works. Tribal knowledge that any long-term team member would internalize.

CLAUDE.md as Project Memory

Claude Code reads a CLAUDE.md file at the start of every session. It's intended for project instructions, but it works equally well as a knowledge base. The insight: treat it like onboarding documentation that the AI maintains for itself.

Here's the structure I've settled on:

## Learnings

- Schema changes: push to BOTH dev and prod databases
- `vercel link` overwrites `.env.local` - restore from git after
- DIRECT_DATABASE_URL with `?pool=true` breaks psql - param is Prisma-only
- Run `npm run build` before committing - catches type errors CI would reject
- Webhook returns 404 status but body contains valid data - don't check response.ok
- Background tasks: use `run_in_background` param, not shell `&`
- JSON fields in bash: avoid `->>` operators - fetch whole column instead

Each line is a compressed lesson learned. Imperative style, no fluff, one line per item.

What Qualifies as a Learning

The key is curation. Not everything belongs here.

Include:

Error solutions specific to this project's setup
Non-obvious commands or workflows (the ones you'd forget and have to look up)
Gotchas that wasted time (especially if they'll waste time again)
File locations that were hard to find
Workarounds for third-party quirks

Exclude:

Generic programming knowledge ("use async/await for promises")
One-time issues unlikely to recur
Things already documented in README or official docs
Verbose explanations - if it needs a paragraph, it's documentation, not a learning

The test: "Would this save 5+ minutes next time the agent encounters this situation?"

Curation Rules

Left unchecked, the Learnings section becomes a dumping ground. Every session adds more. Eventually it's 200 lines of outdated advice, half of which contradicts the other half.

My rules:

Max 30 items - if adding something new, remove something obsolete
Merge duplicates - two similar learnings become one
Remove when fixed - bug workaround for a bug you fixed? Delete it
One line per item - forces compression, prevents rambling
Review monthly - scan for stale entries

The agent itself can help curate. At the end of a productive session:

"Capture what we learned about the webhook integration to CLAUDE.md. Check for duplicates first."

It will add the new insight and often notice related items that can be merged or removed.

The Compounding Effect

After three months on a project with maintained CLAUDE.md, the difference is stark. The agent:

Knows which database to use for which command
Remembers the exact test invocation that works
Avoids the deployment mistake it made in week one
Uses the project's preferred patterns without being told

It's not intelligence - it's just reading a file. But the effect is an agent that feels like a team member who's been on the project for months, not a contractor starting fresh every morning.

Practical Workflow

During a session:
When you solve something tricky together, flag it mentally. After the fix is confirmed working:

Add to Learnings: Prisma Accelerate has 5MB response limit - use select not include

End of session:
If the session was productive, ask for a learning capture:

Review this session and add any non-obvious findings to CLAUDE.md Learnings.
Only add if genuinely useful for future sessions.

Monthly:
Skim the Learnings section. Delete anything that:

References fixed bugs
Duplicates other items
You've never actually needed again

What This Isn't

This isn't a replacement for documentation. Complex architectural decisions, API references, deployment procedures - those belong in proper docs that humans read too.

CLAUDE.md learnings are specifically for agent-to-agent knowledge transfer. The format is optimized for LLM consumption: terse, declarative, no context needed.

It's also not a crutch for bad tooling. If your agent keeps forgetting how to run tests, maybe your test command is too complicated. Fix the root cause when possible; document the workaround when necessary.

Conclusion

AI coding agents lose context between sessions - every session starts fresh
A curated Learnings section in CLAUDE.md acts as persistent memory
Include: project-specific gotchas, non-obvious workflows, time-wasting bugs
Exclude: generic knowledge, one-time issues, anything in docs
Cap at 30 items, remove outdated entries, merge duplicates
The agent can help maintain its own memory with human approval
Compound effect: after months, the agent "knows" your codebase's quirks

The effort is minimal - maybe 2 minutes per session when something noteworthy happens. The payoff is an agent that stops making the same mistakes and starts feeling like it actually learns.

Adding LLM Polish to a Speech-to-Text App

Eugene Oleinik — Mon, 29 Dec 2025 07:00:02 +0000

Voice transcription is messy. Even the best models like Whisper faithfully reproduce every "um", "uh", and rambling run-on sentence. That's correct behavior for transcription, but not what you want when texting someone.

I added a "polish mode" to my macOS speech-to-text app that optionally sends Whisper's output through an LLM to clean it up. The interaction model: hold Fn to record, tap Ctrl anytime during recording to enable polish, release to transcribe and paste.

The Modifier Key Challenge

The obvious approach - require Ctrl held simultaneously with Fn - felt clunky in testing. You'd have to coordinate two fingers before speaking, and the physical position is awkward.

A "latch" pattern works better: pressing Ctrl anytime while Fn is held latches the polish flag. You can press Ctrl before speaking, during, or just before release. The flag resets when you start a new recording.

let ctrl_latched = Arc::new(AtomicBool::new(false));

// In the event tap callback:
if key_pressed && !prev_pressed {
    // Recording started - reset latch
    ctrl_latched.store(false, Ordering::SeqCst);
    start_recording(&state);
} else if !key_pressed && prev_pressed {
    // Recording stopped - check if Ctrl was ever pressed
    let polish = ctrl_latched.load(Ordering::SeqCst);
    stop_recording(&state, polish);
}

// Latch Ctrl if pressed anytime during recording
if key_pressed && ctrl_pressed {
    ctrl_latched.store(true, Ordering::SeqCst);
}

The macOS CGEventFlags expose modifier state as bitmasks. Control is 0x40000:

const CONTROL_KEY_FLAG: u64 = 0x40000;

let flags = event.get_flags().bits();
let ctrl_pressed = (flags & CONTROL_KEY_FLAG) != 0;

The Polish Function

The polish step is a straightforward LLM API call. I'm using Groq's hosted llama-3.3-70b-versatile because I'm already using Groq for Whisper transcription - one API key, one vendor.

fn polish_text(text: &str, api_key: &str) -> Option<String> {
    let client = reqwest::blocking::Client::new();

    let body = serde_json::json!({
        "model": "llama-3.3-70b-versatile",
        "messages": [
            {
                "role": "system",
                "content": "Clean up this voice message for texting. Remove filler words (um, uh, like, you know). Fix punctuation and sentence structure. Break up run-on sentences. Keep it casual. No trailing period. Output ONLY the cleaned text - no explanations, no quotes."
            },
            {
                "role": "user",
                "content": text
            }
        ],
        "temperature": 0.2
    });

    let response = client
        .post("https://api.groq.com/openai/v1/chat/completions")
        .header("Authorization", format!("Bearer {}", api_key))
        .header("Content-Type", "application/json")
        .json(&body)
        .timeout(Duration::from_secs(30))
        .send()
        .ok()?;

    if !response.status().is_success() {
        return None;
    }

    let chat_response: ChatResponse = response.json().ok()?;
    chat_response.choices.first().map(|c| c.message.content.clone())
}

The function returns Option<String> - this matters for the fallback logic.

Parsing the Response

Groq uses the OpenAI-compatible chat completions format. The response structure:

#[derive(serde::Deserialize)]
struct ChatResponse {
    choices: Vec<ChatChoice>,
}

#[derive(serde::Deserialize)]
struct ChatChoice {
    message: ChatMessage,
}

#[derive(serde::Deserialize)]
struct ChatMessage {
    content: String,
}

Using serde to parse into typed structs catches malformed responses at parse time rather than panicking on field access later.

Prompt Engineering Lessons

The system prompt went through several iterations:

First attempt: "Clean up this transcription."

Problem: The LLM would respond conversationally. "Sure! Here's the cleaned up version: ..."

Second attempt: "Output only the cleaned text."

Problem: It would wrap the output in quotes: "Here's what I meant to say"

Third attempt: Added explicit prohibitions.

Output ONLY the cleaned text - no explanations, no quotes.

This worked. The key insight: LLMs default to being helpful and conversational. For tool use, you need to explicitly tell them to suppress that behavior.

Other prompt decisions:

"Keep it casual" - prevents the LLM from making the text overly formal
"No trailing period" - texting convention; a period at the end feels curt
"Break up run-on sentences" - spoken language naturally runs together

Low temperature (0.2) keeps output consistent. Higher temperatures occasionally produced creative reinterpretations of what I said.

Graceful Degradation

The polish step can fail: network issues, rate limits, API changes. The user still expects their transcription to paste.

let final_text = if polish {
    polish_text(text, api_key).unwrap_or_else(|| text.to_string())
} else {
    text.to_string()
};

Option::unwrap_or_else is the right pattern here. If polish fails for any reason, fall back to the raw Whisper transcription. The user gets something rather than nothing.

This is a general principle for LLM features: treat them as enhancements, not requirements. The core functionality should work without them.

Latency Considerations

Polish adds a second API call, roughly 200-400ms on Groq. For a texting use case, this is acceptable - you're not in a real-time conversation. For live captioning or dictation into a text field, it would be too slow.

The transcription already happens in a background thread:

thread::spawn(move || {
    transcribe_and_paste(audio_data, sample_rate, &api_key, polish);
});

Both the Whisper call and the polish call happen sequentially in this thread. The UI remains responsive; the user just waits slightly longer for paste.

Trade-offs

When polish helps:

Texting, where filler words and run-ons look sloppy
Drafting messages you want to sound more coherent
Quick notes that benefit from basic cleanup

When to skip it:

Dictating into forms or code comments
When you want exact transcription (quotes, interviews)
Low-latency scenarios

What polish can break:

Proper nouns and technical terms may get "corrected"
The LLM might misinterpret intent on ambiguous input
Short inputs ("ok", "yes") sometimes get expanded unnecessarily

The latch pattern makes this an explicit user choice. Default is raw transcription; polish is opt-in.

Conclusion

Latch pattern beats simultaneous press - let users enable modes at any point during an action
Explicit prompt constraints - tell the LLM what NOT to do (no explanations, no quotes)
Low temperature for tools - you want consistency, not creativity
Graceful fallback is mandatory - LLM features should enhance, not gate, core functionality
Choose your latency budget - 200-400ms is fine for async use cases, not for real-time