Forem: Ranjith Kumar

From Routines to a Crew — Building a System That Plans Its Own Work & executes it

Ranjith Kumar — Fri, 22 May 2026 14:00:00 +0000

The Gap

In Part 1, I built a routine board — a system that runs Claude on a schedule, defined as Markdown files, backed by a Rust engine with cron scheduling, crash recovery, and a self-contained dashboard. It works well for what it does.

But real work isn't a cron job.

Consider this: you need to audit all the places a deprecated API is referenced across a large codebase. That means searching multiple code areas, cross-referencing findings, identifying which references are active vs. dead code, and producing a prioritized cleanup plan. No single Claude session handles this well. The context is too broad, the work needs decomposition, and some pieces depend on others.

That’s the difference between “task execution” and “work management.” Execution is running a prompt. Work management is deciding what to run, in what order, with what context, and what to do when something fails

The Build Sprint

I built the entire system — from nothing to a multi-persona task engine with a dashboard — in 3 days of evenings.

Day 1 was intense: core orchestrator (Phase 0.5), a full Rust dashboard with CRUD operations (Phase 0.75), and a round of silent-failure bug fixes (Phase 0.9) — all in one session. Day 3: hardening with task creation/editing/deletion from the dashboard, POSIX file locking for concurrency safety, and launchd scheduling (Phase 1), then the big architectural addition — planner/worker decomposition with a Worker Hive visualization (Phase 2).

The stack: Python for the orchestrator (subprocess management, YAML parsing, straightforward scripting), Rust for the dashboard (HTTP server, real-time worker status, the same "single binary" philosophy from the routine engine).

The recursive part: Claude helped build the system that orchestrates Claude. The design document, the orchestrator code, the Rust dashboard — all built with Claude as a pair programmer. I was designing a system for autonomous AI work while doing autonomous AI work. It's turtles all the way down.

The Task Schema and Activity Log

Every task lives in a YAML file with a rich schema:

id: "TASK-001"
name: "Audit all deprecated API references"
description: |
  Search codebase for deprecated API references.
  Check related tasks for migration status.
priority: P0
size: L
status: open
type: investigation
requires_human: review
human_loop_mode: blocking
dependencies: []
sub_tasks: []
activity_log: []

Priority (P0–P3), size (XS–XL), status, dependencies, human intervention config — the usual project management primitives. But the real innovation is the activity log.

Every action on a task gets timestamped with who did it (which persona), what they did, and what they found:

activity_log:
  - ts: "2026-03-02T22:37:29"
    persona: worker
    action: picked_up
    detail: "Selected as highest priority unblocked task"
  - ts: "2026-03-02T22:42:10"
    persona: worker
    action: completed
    detail: "Research complete. Found 27 references across 6 categories."

This is the system's memory — and it's surprisingly powerful for how simple it is. When a task fails and retries, the retrying worker sees what was already attempted and tries a different approach. When a planner decomposes a task, it reads the log to understand context. When I look at a task at the end of the day, I can trace exactly what happened — which tools were used, what was found, what failed — without reading pages of raw Claude output.

The action types tell the story: picked_up, progress, planned, failed, retry, blocked, human_requested, human_responded, completed. You can scan a task's log and understand its entire lifecycle in seconds. It's the simplest possible implementation of agent memory, and it carries surprisingly far.

┌──────────┐
│   Open   │
└────┬─────┘
     │
┌────▼─────┐
│ Planning  │◄──────────────────────┐
└────┬─────┘                        │
     │                              │
┌────▼──────┐                       │
│ In Progress├──────────────────────┘
└────┬──────┘   (worker rejects plan → re-plan)
     │
┌────▼─────┐
│ Blocked   │  (needs human input or dependency)
└────┬─────┘
     │
┌────▼─────┐     ┌──────────────┐
│   Done   ├────►│ Spawn Follow-│
└──────────┘     │ up Tasks     │
                 └──────────────┘

Human-in-the-Loop as a Dial

One of the most useful design decisions was making human involvement a dial, not a switch. It's a 2×2 matrix:

requires_human	blocking	non_blocking
none	fully autonomous	fully autonomous
review	pauses before close	closes, sends summary
intervention	pauses at checkpoints	continues with best guess
approval	waits for plan sign-off	—

A P0 investigation should pause for human review — the stakes are too high for full autonomy. A P3 documentation task can run end-to-end without anyone looking at it. A task that needs plan approval waits after the planner proposes sub-tasks, showing them in the dashboard with Approve/Reject buttons.

Different tasks need different autonomy levels, and the system supports that as a per-task configuration rather than a global setting. In practice, I found that most tasks start as requires_human: none (fully autonomous) and I only add friction for high-stakes work. The default is trust, with guardrails where they matter.

The Bugs That Taught Me

The most instructive bugs were all variations on the same theme: silent failure.

Empty output as success. Workers were returning exit code 0 with empty stdout — they'd gotten stuck on a permission prompt and hung until timeout. The orchestrator saw exit code 0 and marked the task as done. Fix: treat empty output as failure. A single if not output: check that routes through the failure handler:

if not output:
    msg = "Worker returned exit code 0 but produced no output"
    return False, msg

Timeout gap in loop mode. The continuous mode spawned workers as background processes and polled for completion, but it wasn't tracking when each worker started. Workers could run forever, accumulating memory and burning API credits. Fix: track spawn_time per worker in the PID file (later enriched to full JSON metadata with persona and start time), check elapsed time each poll cycle, proc.kill() overdue workers.

Lock contention silence. When the orchestrator tried to run but another instance already held the POSIX flock, it would silently exit. No log entry, no notification, nothing. From the outside, it looked like the system stopped working — you'd check the schedule and see it should have run, but there's no evidence it even tried. Fix: write a "Skipped — lock held" entry to the run log before exiting.

The meta-lesson: autonomous systems fail silently by default. You have to instrument every exit path, every edge case, every "this shouldn't happen" branch. If a human isn't watching, nobody is — unless you build the observability in.

Phase 2: Planners and Workers

The architecture shift in Phase 2 was routing tasks through different personas based on their size. Tasks sized M, L, or XL go through a "planner" that decomposes them into smaller sub-tasks. XS and S tasks go directly to workers — backward compatible with Phase 1 behavior.

The routing logic is remarkably compact — three checks that determine the entire system's behavior:

def needs_planning(task):
    if task.get("size") not in {"M", "L", "XL"}: return False
    if task.get("parent_task"): return False  # sub-tasks skip planning
    if task.get("sub_tasks"): return False     # already planned
    return True

Not big? Not a parent's child? Not already planned? Send it to the planner. Everything else goes to a worker. That's the entire routing layer.

The planner gets a specialized prompt asking it to output structured JSON with $N dependency references. The orchestrator resolves $1, $2 etc. to actual TASK-NNN IDs when materializing sub-tasks:

id_map = {}  # $N -> actual task ID (1-indexed)
for i, spec in enumerate(sub_tasks_spec):
    new_id = f"TASK-{next_num:03d}"
    id_map[f"${i + 1}"] = new_id
    # Resolve $N dependencies to real IDs
    resolved_deps = [id_map.get(dep, dep) for dep in spec.get("dependencies", [])]

One subtle but important design choice: planners run before workers in the task queue. When the orchestrator selects the next task, it prioritizes planner work over worker work. This unblocks sub-tasks sooner — you don't want a planner waiting behind three workers when its output would spawn three more parallelizable tasks.

When all sub-tasks complete, the parent auto-closes.

                    ┌──────────────────┐
                    │  Parent Task     │
                    │  (Size: L)       │
                    │  "Audit all API  │
                    │   endpoints"     │
                    └────────┬─────────┘
                             │
                       ┌─────▼─────┐
                       │  Planner   │
                       │  (Claude)  │
                       └─────┬─────┘
                             │ JSON plan with $N deps
               ┌─────────────┼─────────────┐
               │             │             │
        ┌──────▼──────┐ ┌───▼────────┐ ┌──▼───────────┐
        │ Sub-task 1  │ │ Sub-task 2 │ │ Sub-task 3   │
        │ (Size: XS)  │ │ (Size: S)  │ │ (Size: S)    │
        │ No deps     │ │ No deps    │ │ Depends: 1,2 │
        └──────┬──────┘ └─────┬──────┘ └──────┬───────┘
               │              │               │
          ┌────▼────┐   ┌─────▼────┐    ┌─────▼────┐
          │ Worker  │   │ Worker   │    │ Worker   │
          │ (Claude)│   │ (Claude) │    │ (Claude) │
          └─────────┘   └──────────┘    └──────────┘
               │              │               │
               └──────────────┼───────────────┘
                              │
                    All done → Parent auto-closes

The Plan Rejection Protocol

Here's where it gets interesting. Workers can say "this plan is bad."

If a sub-task's plan is fundamentally unworkable — missing prerequisites, contradictory requirements, impossible constraints — the worker outputs PLAN_REJECTED: <reason> instead of completing the task. The orchestrator detects this marker, removes all the old sub-tasks, resets the parent task for re-planning, and includes the rejection reason in the next planner prompt.

┌──────────┐     plan      ┌──────────┐    sub-tasks    ┌──────────┐
│  Parent   │─────────────►│ Planner  │────────────────►│ Workers  │
│  Task     │              │          │                 │          │
└──────────┘              └──────────┘                 └────┬─────┘
     ▲                         ▲                            │
     │                         │     PLAN_REJECTED:         │
     │    reset parent,        │     "missing prereq X"     │
     │    include rejection    └────────────────────────────┘
     │    context
     │
     │   After 3 iterations:
     └── escalate to human intervention

max_plan_iterations defaults to 3. After three failed plan-reject cycles, the system escalates to human intervention — it sets requires_human: intervention and writes a notification explaining what happened.

This isn't error handling. It's a feedback loop between two AI personas. The planner proposes, the worker evaluates, and if the proposal doesn't survive contact with reality, the system iterates. With context. The rejection reason is fed back to the planner, so each iteration is informed by what went wrong before.

Real Results

The system's first real test was a codebase audit — finding all references to a deprecated API across a large repository, checking related tasks for migration status, and producing a prioritized cleanup plan. This is exactly the kind of task that's painful to do manually: boring, sprawling, requires checking dozens of files and cross-referencing with issue trackers.

The planner decomposed it into 8 sub-tasks — each focused on a different code area or a different type of investigation (search this directory, check that task tracker, analyze this migration path). Workers ran independently, some completing in minutes (quick grep-style searches), others taking longer for deeper analysis.

Here's what the workers found, collectively:

27 distinct references across 6 categories, ranked into 3 priority tiers
3 out-of-scope items correctly identified as false positives — things that looked like matches but were actually unrelated (different product's API, different naming convention, already fully migrated). That's human time saved: instead of chasing false positives, I got a pre-filtered list
3 targets confirmed already removed — one sub-task discovered its target had been cleaned up in a previous effort over a year ago. The worker correctly reported "no changes needed" and moved on
One sub-task found that a supposedly abandoned migration task had actually been stalled since 2022 — useful context for prioritization

Final output: a structured action plan with 5 independent code changes, prioritized by risk and effort, with pre-checks and test plans for each. All the changes were small (XS or S sized) and could be submitted in parallel.

What would have taken a full day of manual investigation — opening files, cross-referencing tasks, checking git history, reading old code reviews — was done by 8 coordinated AI workers. And because each sub-task produced a standalone output file, I could review them individually, at my own pace, in whatever order made sense.

What I Learned

The evolution tells a story: cron job → scheduler → orchestrator → planner/worker system. Each step was driven by a real limitation of the previous one, not by architectural ambition.

V0 — Serial                 One worker, one task at a time
╔═══╗     ╔══════╗     ╔══════╗
║ W ║────►║ Task ║────►║ Task ║────► ...
╚═══╝     ╚══════╝     ╚══════╝


V1 — Worker Pool             Independent tasks, concurrent workers
╔═══╗ ╔═══╗ ╔═══╗ ╔═══╗
║ W ║ ║ W ║ ║ W ║ ║ W ║      worker queue
╚═╤═╝ ╚═╤═╝ ╚═╤═╝ ╚═╤═╝
  │     │     │     │
  ▼     ▼     ▼     ▼
┌───┐ ┌───┐ ┌───┐ ┌───┐
│ T │ │ T │ │ T │ │ T │      task pool
└───┘ └───┘ └───┘ └───┘


V2 — Planners + Workers      Decomposition before execution
╔═══╗ ╔═══╗ ╔═══╗ ╔═══╗ ╔═══╗ ╔═══╗
║ W ║ ║ W ║ ║ W ║ ║ W ║ ║ W ║ ║ W ║  workers
╚═══╝ ╚═══╝ ╚═══╝ ╚═══╝ ╚═══╝ ╚═══╝
╔═══╗ ╔═══╗ ╔═══╗
║ P ║ ║ P ║ ║ P ║                     planners
╚═══╝ ╚═══╝ ╚═══╝


V3 — Team (future)           Specialized personas, handoff protocol
╔═══╗ ╔═══╗
║ P ║ ║ P ║                           planners
╚═══╝ ╚═══╝
╔═══╗ ╔═══╗ ╔═══╗ ╔═══╗ ╔═══╗ ╔═══╗
║ W ║ ║ W ║ ║ W ║ ║ W ║ ║ W ║ ║ W ║  workers
╚═══╝ ╚═══╝ ╚═══╝ ╚═══╝ ╚═══╝ ╚═══╝
╔════╗ ╔═══╗
║ PM ║ ║ TL║                          creators
╚════╝ ╚═══╝
       ▲ handoff protocol ▲

Four patterns emerged that I think are generalizable:

Activity log as memory. Context survives across retries and sessions because every action is recorded. A retry doesn't start from zero — it starts from "here's what was tried and why it failed." This is the simplest possible implementation of agent memory, and it's surprisingly effective.

Personas as routing logic. "Planner" and "worker" aren't separate systems or separate models. They're the same Claude CLI called with different prompts. The persona distinction is a function call — needs_planning() returns True, you use the planner prompt template. Returns False, you use the worker template. That's it. No framework, no agent registry, no complex orchestration layer.

Human-in-the-loop as a dial. The 2×2 matrix of requires_human × blocking/non_blocking lets each task declare its own autonomy level. This is more useful in practice than a global "autonomous mode" toggle.

Plan rejection as protocol. Not error handling — a first-class feedback mechanism. PLAN_REJECTED: is part of the prompt contract. Both sides know the rules. The system iterates with context rather than retrying blindly.

The broader ecosystem is exploring similar ideas. Tools and frameworks for autonomous AI workflows are emerging rapidly. There's no canonical architecture for this yet — and that's what makes it an exciting space to build in. We're all figuring this out in real time.

What's Next

Phase 3 is where it gets ambitious. There were a few paths i considered for this. Either going deep on exactly defining the personas, for example: TL and PM personas that create new tasks etc. The other option was to focus on the shared ecosystem instead of the individual personas.
There will always be more & more creative agents with different capabilities continue to show up, so rather than focusing on a single agent with different flavor, i found it to be both very interesting & challenging to rather focus on the shared ecosystem they’d operate under. A space where devs and their agents could co-work in a productive way.

It stops being a tool you use and becomes a team you manage.

Closing

From a bash cron job to a multi-persona planning system in about 3 days of evenings. That's not a testament to my engineering speed — it's a testament to what becomes possible when you have an AI pair programmer that can help build the infrastructure for its own autonomy.

Honest assessment: it's still experimental. The failure rate is real. Silent failures lurk in every corner, and you have to instrument your way to reliability. But the ceiling is visible.

The pattern of "structured task → AI decomposition → parallel execution → human review" works, and it works better than doing everything interactively.

If you have Claude Code or any AI with a CLI, try building something autonomous. Start with a single routine — a cron job that generates your standup. See where it takes you. You might end up with a planning system that argues with itself about how to approach your work.

And that's a surprisingly useful thing to have.

This is Part 2 of a multi-part series. Part 1: "The Routine Board" covers the routine engine that started it all.

Making Claude do your routines while you sleep!

Ranjith Kumar — Thu, 21 May 2026 14:00:00 +0000

The Itch

Every morning, same ritual. Open Claude, ask it to summarize my recent pull requests, check for blockers, prep a standup. Three minutes, every single day.

It’s not a lot of time. But it’s the kind of time that bothers a developer — repetitive, predictable, mechanical. Claude has a CLI. The CLI can run unattended. What if cron just… did this for me?

The Learning Loop

These past few months — learning new tools, new patterns, new ways of working with AI — and the thing that keeps surprising me is the speed. Not the speed of the AI itself, but the speed of the development loop. An idea I’d have over morning coffee could be a working prototype by the time I close my laptop that evening. Things that would have been a week-long side project — spread thin across stolen hours — now materialize in a single focused session.

This isn’t groundbreaking. Plenty of people have built their own versions of “autonomous Claude”: cron wrappers, custom schedulers, Claude Code extensions, full-blown agent frameworks. Some use existing tools like OpenClaw, others write bash scripts, others build elaborate multi-agent systems. The space is full of experimentation, and there’s no canonical answer yet.

What I’m sharing here is my version — a “routine desk” that lets me define, schedule, and monitor AI tasks through a simple dashboard. It’s the story of building it, what I learned, and the specific design choices that made it useful. Your version would look different, and that’s the point. The interesting part isn’t the artifact — it’s what you discover along the way.

The Simplest Version

The first attempt was exactly what you'd expect:

0 9 * * 1-5 claude -p "Summarize my recent code changes" > /tmp/standup.txt

It worked! Sort of. For about two days.

Then the problems compounded. Monday morning: the output file was empty because the CLI had hit an authentication issue overnight — cron swallowed the error silently. Tuesday: two runs overlapped because the first one took 8 minutes instead of the usual 3, and the second kicked off on schedule before the first finished. By Wednesday I had six temp files named standup.txt, standup2.txt, standup-final.txt... you know how that goes.

No visibility into whether runs succeeded or failed. No error handling. No history. No way to tell at a glance whether the system was healthy or broken. Cron is a fantastic tool for running deterministic commands. An AI CLI call is not deterministic — it can hang, timeout, produce unexpected output, or fail silently. I needed something that understood that.

I needed something small but proper.

Routines as Markdown

Here's the core insight that shaped everything: a routine is just a prompt plus scheduling metadata. And there's already a perfect format for "structured metadata + freeform text" — Markdown with YAML frontmatter.

Here's what my morning standup routine looks like:

---
title: "Morning Standup Summary"
schedule: "0 0 9 * * 1-5 *"
model: sonnet
timeout: 300
max_turns: 50
---

Generate a morning standup summary. Do the following:

1. **Recent PRs**: Search for my recent pull requests from the past 24 hours.
   List each with its title, status, and a one-line summary.

2. **Active Tasks**: Search for my active tasks. List each with its title,
   priority, and current status.

3. **Blockers**: Identify any code reviews awaiting approval or tasks that are blocked.

4. **Today's Focus**: Based on the above, suggest 2-3 priorities for today.

Format the output as a clean markdown summary suitable for posting
in a team chat.

The filename is the routine name. morning-standup.md creates a routine called morning-standup. Drop a file in the directory, it gets scheduled. Delete it, it stops. Set enabled: false in the frontmatter to pause it without removing it.

Each routine picks its own model. Anthropic's Claude models range from fast and cheap (Haiku) to powerful and expensive (Sonnet, Opus) — and different tasks need different trade-offs. My standup uses Sonnet — it needs to search across code changes and synthesize a report, so reasoning quality matters. My health-check routine uses Haiku — it's a quick status ping, optimized for speed and cost. A Haiku call costs a fraction of Sonnet and returns in seconds. When all you need is "are things on fire? yes/no," you don't need the most powerful model:

---
title: Metrics Health Check
schedule: "0 0 8 * * * *"
model: haiku
timeout: 120
max_turns: 30
---

Perform a quick health check on key operational metrics...

The Pivot and The Engine

I originally planned to build this in TypeScript. The Claude Code SDK has a clean query() API, node-cron handles scheduling, node:sqlite handles persistence. I had a reference implementation to model after — a review agent built on that exact stack. I started building.

Then I hit a wall: the package registry wasn't accessible in my locked-down dev environment — no outbound network access to install packages. No npm install, no dependencies, dead end.

So I pivoted. Rewrote the whole thing in Rust.

What started as a constraint became the best architectural decision of the project. The final artifact is a single binary — async with Tokio, embedded SQLite via rusqlite, zero runtime dependencies. You can copy it to any machine and run it. No node_modules, no package manager, no runtime version to match. Just one file that contains the scheduler, the executor, the database, and the dashboard. In hindsight, this is exactly what you want for infrastructure that's supposed to run unattended.

The Executor is the core loop. It spawns the claude CLI as a subprocess, streams stdout and stderr concurrently via tokio::select!, counts messages and tool uses as they stream by, and enforces a hard timeout:

let result = timeout(timeout_duration, async {
    let mut child = cmd.spawn()?;
    let mut stdout_reader = BufReader::new(child.stdout.take()?).lines();
    let mut stderr_reader = BufReader::new(child.stderr.take()?).lines();

    loop {
        tokio::select! {
            line = stdout_reader.next_line() => {
                match line {
                    Ok(Some(line)) => {
                        message_count += 1;
                        if line.contains("tool_use") { tool_use_count += 1; }
                        output_lines.push(line);
                    }
                    Ok(None) => break,
                    Err(_) => break,
                }
            }
            line = stderr_reader.next_line() => { /* log and continue */ }
        }
    }
    // ...
}).await;

The Scheduler creates one async task per routine. Each task parses its cron expression, calculates the duration until the next trigger using chrono::Utc::now(), sleeps for exactly that duration, and fires the executor. This means each routine runs in its own Tokio task — no central polling loop, no priority queue, no timer wheel. Each routine is independently responsible for waking itself up.

Overlap protection is a HashSet behind a mutex — before running, check if the routine name is in the set. If it is, skip this tick. Five lines that prevent the most common cron footgun:

{
    let mut locks = locks.lock().unwrap();
    if locks.contains(&name) {
        tracing::warn!(routine = %name, "Already running, skipping tick");
        continue;
    }
    locks.insert(name.clone());
}

Crash recovery is equally minimal. On startup, a cleanup_stale_runs() function queries SQLite for any runs with status running or pending and marks them as failed. If the process crashed mid-run, those records would be stuck forever without this. Five lines that handle unclean shutdowns.

Hot-reload uses the notify crate to watch the routines/ directory. When a .md file changes, the watcher reloads all routines and updates the scheduler. There's one pragmatic hack worth noting: std::mem::forget(watcher) — leaking the watcher so it lives for the program's lifetime rather than getting dropped at the end of the setup function. In a "proper" codebase you'd store the watcher handle somewhere and drop it on shutdown. Here, the program runs until you kill it, so leaking is functionally correct and saves a bunch of lifetime gymnastics. Pragmatism over purity.

The full engine — foundation, scheduling, output handling, dashboard, polish — was built and working in a single sitting. That's the compressed development loop I mentioned at the start: what would have been a multi-week side project materialized in one focused evening with Claude as a pair programmer. Rust's compiler caught entire categories of bugs at compile time that would have been runtime surprises in TypeScript. The borrow checker is annoying until it's saving you from a data race in your async scheduler — then it's your best friend.

┌──────────────────────────────────────────────────────────────┐
│                     Claude Routines Engine                    │
│                                                              │
│  ┌──────────┐    ┌────────────┐    ┌──────────────────────┐  │
│  │  File     │    │            │    │     Executor         │  │
│  │  Watcher  │───►│ Scheduler  │───►│  ┌────────────────┐ │  │
│  │ (notify)  │    │  (cron)    │    │  │ claude -p "..." │ │  │
│  └──────────┘    │            │    │  │ --model sonnet  │ │  │
│       ▲          │ ┌────────┐ │    │  │ --max-turns 50  │ │  │
│       │          │ │Overlap │ │    │  └────────┬───────┘ │  │
│  routines/*.md   │ │ Lock   │ │    │           │         │  │
│                  │ └────────┘ │    │     stdout/stderr   │  │
│                  └────────────┘    └──────────┬─────────┘  │
│                                               │             │
│                  ┌────────────┐    ┌──────────▼─────────┐  │
│                  │  Dashboard  │    │     SQLite Store    │  │
│                  │ :3456       │◄──►│  (WAL mode)        │  │
│                  │             │    │  runs, status,      │  │
│                  └────────────┘    │  crash recovery     │  │
│                                    └────────────────────┘  │
└──────────────────────────────────────────────────────────────┘

The Dashboard

The dashboard is built on raw TCP — no web framework, no Axum, no Actix. One file that parses HTTP requests by hand, routes them, and generates the entire HTML as a Rust string. TcpListener::bind, tokio::spawn per connection, pattern-match on (method, path). That's the entire web server.

Why no framework? Because adding a dependency means adding complexity, and this dashboard needed to do exactly four things: show routine cards, show run history, show logs, and trigger manual runs. A framework would have been architecturally correct and practically overkill.

It serves a dark-themed single-page dashboard: routine cards with status dots (green for last-run-succeeded, red for failed, gray for never-run), a run history table with timing and status for each execution, and a log viewer that shows the raw Claude output for any run. Everything renders server-side — the HTML is a giant Rust string interpolation. No client-side framework, no hydration, no build step.

The killer feature is the "New Routine" button. It opens a modal where you define a routine — title, schedule, model, prompt — and hitting save sends a POST to /api/routines. The server writes a .md file to the routines directory. The file watcher detects the new file. The scheduler picks it up and starts running it. The system grows itself from the browser. You don't need SSH access or a text editor to add a new routine — just a browser and an idea for what Claude should do next.

Browser ──POST /api/routines──► Dashboard Server
                                     │
                              writes metrics-check.md
                                     │
                                     ▼
                              routines/ directory
                                     │
                              File Watcher detects
                                     │
                                     ▼
                              Scheduler reloads
                              & schedules new routine

No CDN, no build step, no external CSS. Auto-refresh with setInterval every 30 seconds, paused when modals are open so your form doesn't disappear mid-edit. The entire UI — CSS, JavaScript, HTML template — is self-contained in the binary. cargo build --release and you have everything.

The "Aha" Moment

The first morning I woke up to a standup summary already sitting there — generated at 9 AM while I was making coffee — something shifted. I didn't open Claude and ask it to do something. It had already done it.

My metrics check ran at 8 AM and flagged an issue before I'd opened my laptop. The notification was waiting. The context was there. I just needed to act on it.

It's a small thing, objectively. A cron job that calls Claude. But subjectively, it feels different. The mental shift is from "I use Claude" to "Claude works for me." The tool has agency — bounded, scheduled, observable agency, but agency nonetheless.

And it changes how you think about your own time. Those three minutes I spent every morning preparing a standup? Gone. But it's not just three minutes saved — it's the cognitive overhead of remembering to do it, the context-switching cost of opening a new session, the friction of formulating the same prompt for the hundredth time. All of that evaporates. You just... have the summary. You open your laptop and the work is already done.

There's a compounding effect too. Once the system exists, the marginal cost of adding a new routine is basically zero — write a markdown file, drop it in the directory. So you start thinking about what else could run unattended. Weekly report summaries. PR review reminders. Dependency update checks. Each one is a few paragraphs of markdown. Each one saves a few minutes a day. The minutes add up. Before long you have a small fleet of routines quietly doing work in the background, and you're spending your own time on the things that actually require you.

Limitations and What Comes Next

For all that it does, routines have real constraints. Every run is stateless — each execution starts fresh with no memory of previous runs. Routines can't coordinate with each other. They can't decompose complex work into smaller pieces. They don't learn from failures.

If a routine fails, it just... fails. There's no retry logic, no fallback strategy, no way to say "try a different approach." The SQLite store records what happened, but nothing acts on that information automatically.

These are fine constraints for scheduled tasks. A daily standup doesn't need memory. A metrics check doesn't need to coordinate with anything. But real work — the kind that takes hours, requires research across multiple areas, produces structured deliverables, and has pieces that depend on other pieces — needs something more.

I started with a question: "What if cron ran Claude for me?" That question led to a routine engine. But the engine surfaced a bigger question: "What if Claude could manage its own work?"

I had a system that could run tasks. But I wanted a system that could manage work — decompose it, prioritize it, retry intelligently, and know when to ask for human help.

That's where Part 2 picks up: a task executor with planners, workers, dependency graphs, retries, and a feedback loop where AI personas negotiate with each other about how to approach a problem. The system that routines couldn't be.

The broader ecosystem is moving this direction too. Tools like Claude Code are enabling a wave of builders to experiment with autonomous AI workflows. There's no canonical way to do this yet — which is what makes it exciting to build.

Part 2: "From Routines to a Crew — Building an AI Task System That Plans Its Own Work" explores what happens when you give Claude the ability to decompose, plan, and coordinate its own work.

The Developer Owns the UX. The AI Owns the Code.

Ranjith Kumar — Mon, 11 May 2026 18:56:54 +0000

My mom does bead art. The kind where you sit with a tray of tiny plastic beads and, over hours — sometimes days — assemble them into an intricate portrait or devotional motif. It's meditative, precise, and deeply personal.

The bottleneck has always been the pattern. You can't look at a photograph and start placing beads. You need to know exactly which bead goes where, in what color, on a grid that maps to the physical constraints of the project: how wide it is, how many colors of beads you've bought, how coarse or fine the detail needs to be.

She was doing this by eye, or with rough printouts. I kept thinking: there has to be a better way. So I opened Gemini and started a conversation.

What came out of that is BeadGen — a fully local, zero-dependency browser tool that converts any photo into a ready-to-stitch bead pattern. No backend. No npm. No install. You open an HTML file and use it.

But this post isn't really about the tool. It's about something I learned while building it: when you let AI write the code, the developer's most important job becomes the experience.

What It Actually Does

Before the technical deep-dive, here's the simplest way to show it.

Take the Golden Gate Bridge at sunset — rich gradients, a complex rust-red structure, water, sky, fog, warm light hitting the cables at an angle. Thousands of colors.

Run it through BeadGen at 150 beads wide, no gradient mode on, full color palette.

Every circle is a bead. Every color in that output is a real, distinct bead color a crafter would need to source and place by hand. The structure of the bridge is preserved. The mood of the sunset is preserved. The complexity is managed — reduced to something a human can actually execute, one bead at a time.

That transformation — from photograph to stitchable grid — is the whole product.

What "AI Owns the Code" Actually Means

I want to be precise here, because "AI-assisted development" has become a meaningless phrase. Everyone says it now. Here's what it meant for me on this project:

I wrote very little code from scratch. I used Gemini as the primary implementer — describing what I needed, reviewing what came back, asking it to revise or explain. The logic for color quantization, the Canvas API rendering pipeline, the pixel buffer manipulation — most of that was AI-native code that I read, understood, and occasionally redirected, but didn't author line by line.

What I didn't delegate: every decision about what the tool should feel like.

How many controls is too many? Where does the slider sit? What does "No Gradient Mode" actually mean to someone who isn't a developer? What should the output look like when it downloads? Should error states be loud or quiet?

None of that came from Gemini. All of it required me to stay in the room, stay opinionated, and push back when the generated UI drifted toward "technically correct but confusing to use."

That division of labor — AI on implementation, human on experience — turned out to be the whole game.

The Technical Problem (Because It's Interesting)

BeadGen solves three sub-problems in sequence:

1. Resolution mapping. A bead project has a fixed width in bead count — say, 150 beads wide. The photo needs to be downsampled to that exact grid resolution, with aspect ratio preserved.

2. Color quantization. The Golden Gate photo has thousands of colors. My mom's bead collection has maybe 10. The image's palette has to collapse to that count without destroying the image.

3. Rendering. Each cell in the grid gets drawn as a filled circle (the bead shape), not a square pixel, on a canvas the user can download.

The interesting one is color quantization. The naive approach — round every color to the nearest bucket — looks terrible. You lose the soul of the image because you're treating all of color space uniformly, when an image's color distribution is wildly uneven.

The right approach is the Median Cut algorithm:

Put all pixels in one bucket.
Find which color channel (R, G, or B) has the widest range across all pixels in the bucket.
Sort by that channel and split at the median.
Recurse until you have N buckets.
Each bucket's representative color is the average of its pixels.

The result is that color splits happen where the actual data is most varied — not uniformly across abstract color space. The Golden Gate image has a massive warm cluster (the bridge, sunset light) and a separate cool cluster (the bay, the fog, the sky). Median Cut finds that divide naturally and allocates palette slots accordingly. The rust-red cables stay rust-red. The blue-gray water stays blue-gray.

function nearestColor(r, g, b, palette) {
  let minDist = Infinity;
  let closest = palette[0];
  for (const color of palette) {
    const dist =
      (r - color.r) ** 2 +
      (g - color.g) ** 2 +
      (b - color.b) ** 2;
    if (dist < minDist) {
      minDist = dist;
      closest = color;
    }
  }
  return closest;
}

Euclidean distance in RGB space isn't perceptually perfect — LAB color space would be more accurate — but it's fast, dependency-free, and more than sufficient for bead-level fidelity. I knew about the tradeoff. I chose simplicity deliberately. That was a developer decision, not an AI one.

The Feature My Mom Asked For

There's one feature in BeadGen I'm particularly glad I added: No Gradient Mode.

Photos have gradients everywhere — smooth transitions between light and shadow, color bleeding, soft backgrounds. The Golden Gate sunset is practically nothing but gradient. In a photograph, that's beautiful. In a bead pattern, it's a nightmare. You'd need 80 colors to represent that sky faithfully, and no one has 80 colors of beads.

No Gradient Mode posterizes the output. After quantization, it snaps colors more aggressively to the palette and flattens subtle transitions into solid bands. The pattern looks more graphic, more like flat illustration — and is actually stitchable by a human working from a printed sheet.

The AI didn't suggest this. My mom did. She looked at her first output and said the gradients were too much.

That's the feature you add when your user is in the room. And that's exactly the kind of thing that wouldn't exist if I'd treated the AI as the product owner instead of the implementer.

The Stack: Deliberately, Aggressively Simple

BeadGen is:

Vanilla JavaScript
HTML5 Canvas API
Zero dependencies
Zero build step
index.html + script.js + style.css, runs off your local file system — literally file:///your-path/index.html

This was a firm decision. My mom needed to use this tool, not install it. That means no localhost server, no terminal, no Python environment. She opens a file in Chrome. Done.

The Canvas API handles everything:

drawImage() to resample the photo to bead-grid resolution
getImageData() to read pixel values for quantization
arc() to draw each bead as a filled circle
toDataURL() to export the result as a downloadable PNG

Working with raw pixel buffers is verbose and unforgiving, like writing assembly. But it keeps the whole thing self-contained, fast, and completely offline. Those are UX decisions first, technical decisions second.

Where I Had to Stay Opinionated

Here's where I want to be honest about the limits of delegating to AI.

Gemini was excellent at implementing what I described. It was not good at knowing what to describe. When left to generate UI scaffolding on its own, it produced things that were functional but unintuitive — too many options exposed at once, labels that made sense to a developer but not to someone who just wants to make a bead pattern, layouts that were complete but crowded.

Every time I let a generated UI suggestion through without interrogating it, my mom got confused. Every time I pushed back — "this should be one toggle, not three settings," "this label needs to say what it does, not what it is" — the tool got better.

The AI wrote the code correctly. I had to tell it what "correct" meant for this user.

That gap — between code that works and an experience that works — is where the developer still has to show up. And I don't think that gap is going away anytime soon.

What's Next

A few things on the roadmap:

Perceptual color distance — Switching from Euclidean RGB to CIEDE2000 in LAB color space for more accurate palette matching, especially for skin tones and subtle warm-to-cool transitions like that bridge sunset.
Bead inventory input — Instead of "give me N colors," let the user input their actual bead colors and map to those exactly.
Print layout export — A PDF with row-by-row bead counts and a color legend, formatted for A4/Letter printing.
Row-by-row guided mode — Step through one row at a time with bead counts, similar to knitting pattern notation.

Try It

The source is on GitHub: ranji2612/beads_design

Clone it, open index.html in a browser, upload a photo. No setup. Runs entirely offline.

If you're a crafter and you make something with it, I'd genuinely love to see it. If you're a developer and you want to tackle LAB-space color distance or a print export feature, PRs are open.

AI wrote most of the code. I designed the experience. My mom makes the patterns. That's a pretty good division of labor.