Forem: Alessio Battistutta

Tomato — Visual DAG editor for NixOS configurations

Alessio Battistutta — Sat, 11 Apr 2026 07:52:12 +0000

Visual hierarchical DAG editor that generates configuration.nix or flake.nix and deploys to a real NixOS machine via SSH with one click.
Nodes are Nix fragments. Gateways descend into subgraphs (floors). NixOS merges the composed fragments automatically.

What's inside:

Traditional configuration.nix and full Flake backend (toggle in the UI)
Multi-machine flakes — each machine is a gateway with its own subgraph
Home Manager support alongside NixOS configs OODN registry for ambient config (${hostname}, ${timezone}, ${pg_port}...) — change once, updates everywhere
Deploy modes: Switch / Test / Dry Run / Diff / Rollback — all from the UI
Pre-built stacks: Prometheus, Grafana+Prometheus, Web Server, and more
Undo/Redo, node search, minimap, content preview on canvas
Built with Elixir/Phoenix LiveView

Early stage but working end-to-end on a real NixOS machine.
Comments, ideas, and improvement requests very welcome.
GitHub: Tomato

The Complexity Trap: What Tainter Teaches Us About Agentic Systems

Alessio Battistutta — Wed, 08 Apr 2026 07:20:01 +0000

You've felt it. The codebase that fights back. The abstraction layer nobody dares touch. The microservice split that made sense three years ago and now requires a dedicated team just to operate. Joseph Tainter had a name for this in 1988 — and it's darker than technical debt.

Tainter's thesis in The Collapse of Complex Societies is deceptively simple: societies don't collapse because they fail — they collapse because complexity stops paying for itself. Every layer added to solve a problem yields diminishing returns, while the cost of maintaining that layer keeps rising. At some point, the math inverts. Complexity becomes the problem.

Software engineers live this every day. The hotfix that births three workarounds. The codebase that becomes load-bearing scar tissue. Eventually, more engineering time is spent managing existing complexity than producing new value — the Tainter inflection point, in code form.

But deterministic systems at least collapse predictably. The failure modes are traceable. Call graphs, dependency trees, config sprawl — you can reason about what broke and why. It breaks the same way every time. Classic Tainter curve.

Agentic systems break the model.

When you chain LLM calls into autonomous workflows, the complexity isn't just structural — it's behavioral and non-reproducible. Every LLM call is a sample from a probability distribution. Chain enough of them and the system's emergent behavior is the product of those distributions. Variance doesn't cancel — it compounds. You haven't built a function; you've built a stochastic process dressed as one.
This is where Tainter gets darker. The natural response to unpredictable LLM output is mitigation: guardrails, validators, retry logic, output sanitizers, confidence thresholds, fallback chains. Each layer adds complexity to manage the chaos of the layer below. But each mitigation layer is itself stochastic — it too samples, classifies, decides. You end up adding complexity that is also unpredictable. The complexity meant to tame variance introduces new variance. The guardrail needs a guardrail.
Tainter would recognize this immediately: complexity generating the very problems it was meant to solve.
The collapse vector in most agentic frameworks is that they don't respect the boundary between stochastic and deterministic. They trust LLM output structurally — parse it, route on it, act on it — and then patch the failures reactively with more stochastic layers. The epistemic problem is you can't enumerate the failure modes of a compounded probability distribution. The system becomes too unpredictable to reason about, too entangled to refactor. Collapse — not with a bang, but as silent behavioral drift nobody can explain or reproduce.

The architectural response is a clean membrane.

You cannot fully determinize a stochastic system without destroying what makes it useful. The LLM's value is its probabilistic nature — generalization, inference under ambiguity, flexible intent parsing. The goal isn't to eliminate stochasticity; it's to bound it tightly and treat everything that crosses the boundary as untrusted input.

This is the core design principle behind AlexClaw — a BEAM-native AI agent framework built for regulated, air-gapped infrastructure where "just call the cloud API" isn't an option. The LLM touches only the intent parsing and skill selection layer. Every output crosses a sanitization choke point before it can influence system state. Downstream — OTP supervision trees, capability tokens, a PolicyEngine with explicit AuthContext — is pure deterministic BEAM. The stochastic surface is small, explicit, and bounded.

Everything that matters about system reliability lives outside the stochastic layer.

Most agentic frameworks make the opposite choice, often implicitly. They're optimized for capability — what the agent can do — without an explicit model of where probabilistic reasoning should stop and deterministic execution should begin. That's a Tainter trap: complexity added for capability, with the collapse cost deferred and compounded.

The question worth asking before adding the next agent layer isn't "what can this enable?" It's "where does this sit on the stochastic/deterministic membrane, and what does it cost when it's wrong?"
Tainter's societies couldn't rewrite themselves. We can. But only if we draw the boundary before the complexity makes that choice for us.

AlexClaw: A BEAM-Native Personal AI Agent Built on Elixir/OTP

Alessio Battistutta — Tue, 17 Mar 2026 10:10:58 +0000

AlexClaw: A BEAM-Native Personal AI Agent Built on Elixir/OTP

AlexClaw is a personal autonomous AI agent that monitors RSS feeds, web sources, GitHub repositories, and Google services — accumulates knowledge in PostgreSQL, executes multi-step workflows on schedule, and communicates via Telegram. It runs entirely on your infrastructure.

The key architectural decision: the BEAM VM is the runtime, not a container for Python-style orchestration. Supervision trees, ETS caching, GenServers, and PubSub are the actual building blocks — not abstractions bolted on top.

GitHub: github.com/thatsme/AlexClaw

What It Does

AlexClaw runs workflows — multi-step pipelines that combine skills (RSS collection, web search, LLM summarization, API calls, browser automation) and deliver results to Telegram. Workflows run on cron schedules or on demand.

A typical workflow:

Collect — fetch 8 RSS feeds concurrently, deduplicate against memory
Score — batch-score 20+ article titles in a single LLM call for relevance
Summarize — pass the top items through an LLM transform with a prompt template
Deliver — send the briefing to Telegram

This runs every morning at 7:00 with zero interaction.

Architecture

Telegram <──> Gateway (GenServer) <──> Dispatcher (pattern matching)
                                            │
                                      SkillSupervisor
                                     (DynamicSupervisor)
                                            │
                              ┌─────────────┼─────────────┐
                           RSS            Research      GitHub
                          Skill            Skill     Security Review
                                            │
                                       LLM Router
                              (Gemini / Anthropic / Local)
                                            │
                                ┌───────────┴───────────┐
                             Memory                  Config
                          (pgvector)             (DB + ETS + PubSub)

Supervision Tree (13 children)

The application starts 13 supervised processes:

Repo — PostgreSQL connection pool (Ecto)
PubSub — config change broadcast to all processes
TaskSupervisor — supervised fire-and-forget tasks (workflow execution, background reviews)
UsageTracker — ETS owner for LLM call counters, persisted to DB
Config.Loader — seeds environment variables into DB, loads into ETS
LogBuffer — in-memory ring buffer (500 entries) attached to Erlang Logger
Google.TokenManager — OAuth2 token lifecycle with auto-refresh
RateLimiter.Server — ETS-based login rate limiting with periodic purge
SkillSupervisor — DynamicSupervisor for isolated skill execution
Scheduler — Quantum cron scheduler
SchedulerSync — syncs DB workflow schedules into Quantum jobs
Gateway — Telegram long-polling bot
Endpoint — Phoenix HTTP server (LiveView admin UI)

Every async operation (workflow runs, GitHub reviews, background tasks) executes under Task.Supervisor — crashes are reported, not silently lost.

Why the BEAM

The BEAM gives you things for free that other runtimes require libraries or infrastructure for:

Process isolation — a failed RSS fetch doesn't affect a concurrent research query. Each skill runs in its own process.
Supervision — if a GenServer crashes, it restarts. The application recovers without external health checks.
ETS — in-process shared memory tables for config cache, usage counters, rate limiting, and token caching. No Redis needed.
PubSub — config changes broadcast to all processes immediately. No polling.
Lightweight concurrency — RSS feeds are fetched concurrently with Task.async_stream. Workflow steps run sequentially but the workflow itself runs in a supervised task.

The LLM Router

Every LLM call in AlexClaw declares a tier: :light, :medium, :heavy, or :local. The router selects the cheapest available model for that tier and falls back automatically.

light:  gemini_flash → haiku → lm_studio → ollama → custom providers
medium: gemini_pro → sonnet → lm_studio → ollama → custom providers
heavy:  opus → lm_studio → ollama → custom providers
local:  lm_studio → ollama → custom providers

Daily usage is tracked per provider in ETS and persisted to PostgreSQL. Each provider has a configurable daily limit. When a provider hits its limit, the router skips it and tries the next one.

Custom providers (any OpenAI-compatible endpoint) can be added via the admin UI. This means you can run multiple local models on LM Studio — same host, different model names, each with its own tier and limit.

A fully local deployment with zero API keys works — set LMSTUDIO_ENABLED=true and all tiers route to your local model.

Cost Tracking

The router doesn't just fall back — it actively minimizes cost:

RSS relevance scoring uses :light tier (Gemini Flash free tier: 250 calls/day)
Research synthesis uses :medium tier
Deep reasoning uses :heavy tier (explicit only, never auto-selected)
The :local tier bypasses all cloud providers

Usage counters reset at midnight UTC via Process.send_after in the UsageTracker GenServer.

Runtime Configuration

All settings live in PostgreSQL, cached in ETS, editable at runtime via the admin UI. No restart required for any change.

On first boot, Config.Loader seeds default values from environment variables. After that, the database is the source of truth. When a value changes:

Written to PostgreSQL
Updated in ETS cache
Broadcast via Phoenix PubSub
All subscribed processes see the change immediately

Categories include: identity/persona, LLM API keys and limits, Telegram settings, GitHub tokens, Google OAuth, rate limiting thresholds, prompt templates, and skill-specific config.

The system prompt is fully configurable — persona name, base prompt, and per-skill context fragments are all config keys. Zero hardcoded strings.

Workflow Engine

Workflows are multi-step pipelines stored in PostgreSQL. Each step specifies:

Skill name — which registered skill to execute
Config JSON — step-level overrides (different repo, different API token, different prompt)
LLM tier/provider — override the default routing for this step
input_from — pull input from a specific earlier step (not just the previous one)

Provider routing has three levels of specificity:

Step-level llm_tier and llm_model
Workflow-level default_provider
Global tier-based fallback chain

12 Registered Skills

Skill	Description
`rss_collector`	Fetch RSS feeds, batch-score relevance, notify
`web_search`	Search DuckDuckGo, fetch results, synthesize answer
`web_browse`	Fetch URL, extract text, answer questions
`research`	Deep research with memory context
`conversational`	Free-text LLM conversation
`telegram_notify`	Send workflow output to Telegram
`llm_transform`	Run a prompt template through the LLM
`api_request`	Authenticated HTTP requests with retries
`github_security_review`	PR/commit diff security analysis
`google_calendar`	Fetch upcoming Google Calendar events
`google_tasks`	List and create Google Tasks
`web_automation`	Browser recording and headless replay

Skills implement the AlexClaw.Skill behaviour — two callbacks:

@callback description() :: String.t()
@callback run(keyword()) :: {:ok, term()} | {:error, term()}

Adding a new skill is one module, one registry entry.

Memory System

PostgreSQL + pgvector for persistent knowledge storage.

Each memory entry has a kind (:news_item, :summary, :conversation, :security_review), content, optional source URL, JSONB metadata, optional vector embedding, and optional TTL.

Search uses cosine similarity on pgvector when embeddings are available, with keyword (ILIKE) fallback. Deduplication by URL prevents the same article from being scored and notified twice.

The RSS collector stores every worthy item. The research skill stores summaries. The conversational skill stores both user messages and assistant responses for context continuity.

Security Features

Session-based authentication on all admin routes
TOTP 2FA — optional two-factor for sensitive workflow execution (setup via Telegram, 2-minute challenge expiry)
Login rate limiting — ETS-based, configurable max attempts and block duration
HMAC-SHA256 webhook verification — raw body cached before JSON parsing for correct signature verification
Telegram chat_id filtering — rejects messages from unauthorized users
Timing-safe comparison — Plug.Crypto.secure_compare for all secret comparisons

Admin UI

Phoenix LiveView — fully server-rendered, no JavaScript hooks. 12 pages covering every aspect of the system.

Deployment

Single docker compose up -d. The stack:

Elixir release — compiled OTP release (Alpine-based, ~125 MB runtime)
PostgreSQL 17 + pgvector — persistent storage
Web automator (optional) — Python/Playwright sidecar for browser automation

Minimum requirements: Docker, a Telegram bot token, and at least one LLM provider (can be fully local).

git clone https://github.com/thatsme/AlexClaw.git
cd AlexClaw
cp .env.example .env
# Edit .env with your credentials
docker compose up -d

Tech Stack

Component	Technology
Runtime	Elixir 1.19 / OTP 28 / BEAM
Web framework	Phoenix 1.7 + LiveView
HTTP server	Bandit
Database	PostgreSQL 17 + pgvector
HTTP client	Req
Cron scheduler	Quantum
RSS parsing	SweetXml
HTML parsing	Floki
2FA	NimbleTOTP + EQRCode
Browser automation	Playwright (Python sidecar)

What's Next

The ROADMAP.md in the repository tracks planned features. Current priorities include embedding integration for semantic memory search, additional LLM providers, and workflow branching logic.

AlexClaw is open source under the Apache License 2.0.

GitHub: github.com/thatsme/AlexClaw

Built by Alessio Battistutta.

How We Told a Stranger's Node Where Its Cache Should Be

Alessio Battistutta — Wed, 11 Mar 2026 07:49:08 +0000

We connected to a remote BEAM node we don't own. No access to its source code. No instrumentation planted beforehand. No agents installed, no probes compiled into the target. Pure black-box runtime observation over Distributed Erlang.

From that, we produced a concrete architectural recommendation: your schema registry is hitting PostgreSQL 354 times per observation window for metadata that almost never changes — that should be ETS, not a database table.

This article walks through how we got there — 7 observation sessions against a live Elixir application, progressing from coarse process-level metrics to function-level tracing that revealed the internal structure of code we'd never read.

The Target: Nexus

Nexus is an Elixir API gateway that manages dynamic PostgreSQL connections, runtime schema metadata, and table operations. It exposes a REST API on port 4040 and handles CRUD, aggregations, search, CSV export, batch operations, and multi-database routing — all backed by Ecto dynamic repos.

We exercised it with an integration test suite: 61 tests across 8 categories including connection management, extended CRUD (JSONB, NULLs, hierarchical categories), edge cases (SQL injection, unicode, 10KB strings), concurrent operations (50 concurrent inserts, 5000+ req/s bursts), large datasets (50k rows), and a 60-second stress test with 20 concurrent workers.

The Setup

Giulia runs as two Docker containers from the same image:

Worker (port 4000) — static analysis engine: AST indexing, Knowledge Graph, dependency topology, complexity metrics, embeddings
Monitor (port 4001) — runtime observer: connects to target BEAM nodes via Distributed Erlang, collects snapshots at configurable intervals, pushes data to the Worker for fusion

The observation workflow is command-driven:

giulia-observe start nexus@192.168.10.174                           # process-level only
giulia-observe start nexus@192.168.10.174 cookie 5000 Nexus.Repo    # + function tracing
<run your workload>
giulia-observe stop nexus@192.168.10.174                            # finalize fused profile

First Bug: `--sname` vs `--name`

First connection failed — Nexus uses long names (--name nexus@192.168.10.174), Giulia used short names (--sname worker). Erlang refuses to connect across name modes. One flag change in docker-compose.yml fixed it.

Process-Level Profiling: Finding the Logger

The first five sessions used process-level observation: BEAM metrics (memory, process count, scheduler run queue) and top-process rankings by CPU. This tells you which modules own the hottest processes but not which functions are being called.

We ran the same workload under different configurations:

Session	Duration	Run Queue	Top Module	#2 Module	Logger CPU
1. First contact (21/49 tests)	180s	17	`:proc_lib` 100%	(below threshold)	0%
2. Idle baseline	21s	0	`:code_server` 38.7%	(below threshold)	0%
3. Full suite, info log	77s	1	`:proc_lib` 54.3%	`:logger_std_h_default`	28.5%
4. Full suite, warning log	77s	5	`:logger_std_h_default`	`:proc_lib` 33.2%	39.1%
5. Full suite, async handler	71s	2	`:proc_lib` 98.9%	(below threshold)	Gone

Session 1 flagged scheduler contention (run queue of 17 — more tasks queued than schedulers available). Sessions 3-4 found something more interesting: Erlang's default logger handler does synchronous IO. Every log call blocks the calling process until the write completes. The overhead is per-call, not per-byte — so raising the log level to :warning (Session 4) actually made it worse. Fewer messages, but each one still blocks, and now the logger is a larger proportion of total work.

The fix:

:logger.update_handler_config(:default, :config, %{
  sync_mode_qlen: 1000,     # async until queue hits 1000
  drop_mode_qlen: 2000,     # start dropping above 2000
  flush_qlen: 5000           # emergency flush
})

Session 5 confirmed: logger gone from top modules, run queue dropped from 5 to 2. Bottleneck eliminated.

But process-level profiling had hit its ceiling. Session 5 showed :proc_lib at 98.9% CPU — which is like saying "OTP processes are running." Every GenServer, Task, and supervised process runs through :proc_lib. We needed to see inside.

The Stress Test: 919 ops/s, Zero Errors

Before diving deeper, we ran the full suite including a 60-second sustained stress test. Nexus handled the load cleanly — the interesting question was how.

Duration:       60.0s          Latency (ms):
Total ops:      55,151           min:     0.7
Errors:         0 (0.0%)        median:  5.4
Throughput:     919 ops/s        p95:    93.1
                                 p99:   156.9
Operations breakdown:            max:   287.2
  select    13,619    paginate   5,489
  insert    11,203    aggregate  5,440
  count      8,291    distinct   2,818
  search     5,553    exists     2,738

Memory flat at 87-88 MB. Process count stable at 459. The system handles load well — but how it handles it is where the interesting findings live.

Function-Level Tracing

The Remote Tracing Problem

Erlang's :erlang.trace only works on the local node. Giulia's Monitor runs on a different node than Nexus. There are three ways to run code on a remote BEAM:

:rpc.call(node, Module, :function, args) — module must be loaded on the target
Anonymous function via RPC — closures reference their defining module; fails with :undef
Code.eval_string via RPC — source code as a string, compiled on the target using only its stdlib

Options 1 and 2 fail because Nexus doesn't have Giulia's code, and Erlang closures aren't portable. Build 135 uses option 3: a self-contained Elixir code string sent via :rpc.call(node, Code, :eval_string, [code]). It spawns a collector, enables tracing for up to 2 seconds (kill switch at 1,000 events), aggregates call counts, and returns. The entire lifecycle runs on the target with zero Giulia dependencies.

Session 6: Single Module (73 seconds, 11 snapshots)

Tracing Nexus.Registry.TableRegistry only:

Function	Calls	Samples
`TableRegistry.get/1`	9,238	10

The single hottest function on Nexus. Every CRUD operation goes through get/1 to look up the schema definition before executing. But the call count alone doesn't tell you how it's implemented.

Session 7: Dual Module Trace (85 seconds, 11 snapshots)

Tracing both Nexus.Registry.TableRegistry and Nexus.Repo:

Nexus.Repo — 7,000 total calls:

Function	Calls	Share
`prepare_opts/2`	1,266	18.1%
`default_options/1`	1,263	18.0%
`get/1`	1,263	18.0%
`prepare_query/3`	956	13.7%
`all/1`	593	8.5%
`one/1`	353	5.0%
`insert_all/3`	307	4.4%
`put_dynamic_repo/1`	12	0.2%

Nexus.Registry.TableRegistry — 2,004 total calls:

Function	Calls	Share
`get/1`	354	17.7%
`prepare_opts/2`	354	17.7%
`default_options/1`	354	17.7%
`prepare_query/3`	292	14.6%
`all/1`	186	9.3%
`one/1`	106	5.3%
`insert_all/3`	62	3.1%

And there it is. TableRegistry has prepare_opts/2, default_options/1, prepare_query/3 — the telltale signature of an Ecto Repo. The schema registry is backed by a database table, not ETS. Every schema lookup is a round-trip to PostgreSQL.

What the Numbers Say

TableRegistry Should Be ETS (The Headline Finding)

354 get/1 calls to PostgreSQL per observation window, for table metadata that almost never changes. Boot-time load, invalidate on DDL events, done. This eliminates ~2,000 function calls per window (the full Ecto pipeline runs on every registry query) and removes an entire class of unnecessary database round-trips from the hot path.

The Workload Is 87% Reads

Read operations:  2,209 (87%)    Read/Write ratio: 6.9:1
Write operations:    320 (13%)

Repo.get/1 at 1,263 calls is the single hottest operation. Combined with all/1 (593) and one/1 (353), reads dominate. A read-through ETS cache with TTL on frequently-accessed records would cut database pressure significantly.

Ecto Overhead Is Structural (Don't Touch It)

Ecto pipeline calls:  3,485 (prepare_opts + default_options + prepare_query)
Actual DB operations:  2,530
Overhead ratio:         1.38x

1.38 pipeline calls per operation. This is Ecto doing its job — query preparation, option merging, type validation. You're not going to bypass it, and the savings would be marginal. The real wins come from not hitting the database at all.

Dynamic Repo Switching Works Fine

12 put_dynamic_repo/1 calls across 2,530 operations (1 switch per 211 ops). Connection switching is not a concern.

The BEAM Has Headroom

Run queue of 1, stable 90 MB memory, ~30 DB ops/sec during trace windows. The VM isn't stressed. If there's a throughput bottleneck, it's on the PostgreSQL side — pg_stat_statements would confirm.

Priority

#	Action	Impact	Effort
1	Cache `TableRegistry` in ETS	Eliminates ~2,000 DB calls/window	Low
2	Read cache for hot `get/1` paths	Reduces 87% read load	Medium
3	Profile PostgreSQL side	Confirms where latency lives	Low

What Each Layer Revealed

Finding	Process-Level	Function-Level
Synchronous logger at 39% CPU	Yes	—
Logger worse at warning level (Sessions 3→4→5)	Yes (comparative)	—
`TableRegistry.get/1` is the #1 function	No	9,238 calls
TableRegistry is database-backed	No	Ecto signatures
87% read-heavy workload	No	Ratio analysis
1.38x Ecto overhead (structural, ignore it)	No	Yes
Dynamic repo switching is fine	No	12 switches
BEAM has headroom, DB is the constraint	Partial	Throughput data

Process-level found the logger bug — a real operational fix. Function-level found the architectural issue — a schema registry that should never have been database-backed. One is a config change. The other changes how you design the system.

Both came from a node we don't own, running code we've never read, with zero instrumentation on the target.

How This Works

No source access. No agents installed on the target. No probes compiled in. Giulia's Monitor connects to any BEAM node via Distributed Erlang, collects process snapshots and function-level traces by sending self-contained code strings over RPC (Code.eval_string), and pushes the results to the Worker for AST correlation and fused profiling. The entire observation lifecycle runs on the target using only its own stdlib — Giulia never needs to be loaded there.

Connect, observe, trace, recommend.

Forem: Alessio Battistutta

Tomato — Visual DAG editor for NixOS configurations

The Complexity Trap: What Tainter Teaches Us About Agentic Systems

Agentic systems break the model.

The architectural response is a clean membrane.

AlexClaw: A BEAM-Native Personal AI Agent Built on Elixir/OTP

AlexClaw: A BEAM-Native Personal AI Agent Built on Elixir/OTP

What It Does

Architecture

Supervision Tree (13 children)

Why the BEAM

The LLM Router

Cost Tracking

Runtime Configuration

Workflow Engine

12 Registered Skills

Memory System

Security Features

Admin UI

Deployment

Tech Stack

What's Next

How We Told a Stranger's Node Where Its Cache Should Be

The Target: Nexus

The Setup

First Bug: --sname vs --name

Process-Level Profiling: Finding the Logger

The Stress Test: 919 ops/s, Zero Errors

Function-Level Tracing

The Remote Tracing Problem

Session 6: Single Module (73 seconds, 11 snapshots)

Session 7: Dual Module Trace (85 seconds, 11 snapshots)

What the Numbers Say

TableRegistry Should Be ETS (The Headline Finding)

The Workload Is 87% Reads

Ecto Overhead Is Structural (Don't Touch It)

Dynamic Repo Switching Works Fine

The BEAM Has Headroom

Priority

What Each Layer Revealed

How This Works

First Bug: `--sname` vs `--name`