Forem: Gustavo Gondim

Field Learnings with OpenClaw and WhatsApp

Gustavo Gondim — Tue, 28 Apr 2026 21:45:42 +0000

Technical notes extracted by Claude from deploying an agentic WhatsApp bot to production (OpenClaw 2026.4.23). Focus on things not in the official docs or that cost hours of debugging.

High-Level Architecture

OpenClaw is a self-hosted agentic gateway that routes messages between:

Channels (WhatsApp via Baileys, Slack, Discord, Telegram, etc).
Agents (isolated objects with workspace, persona, model).
Tools via MCP (Model Context Protocol — standard protocol).

The main process is the gateway (Node 24, listens on :18789), which maintains a Baileys session per WhatsApp account and triggers agents on demand.

Config: JSON, Not YAML

The active config is ~/.openclaw/openclaw.json (under the node user, uid 1000). The env var OPENCLAW_CONFIG=/path/to/yaml is ignored by the gateway. The schema is huge (49.5k lines), validated via JSON Schema draft-07.

Useful commands:

openclaw config schema        # Full JSON Schema (stdout)
openclaw config get <path>    # read value
openclaw config set <path> <value> --strict-json [--dry-run] [--replace]
openclaw config set --batch-file /tmp/batch.json --strict-json

config set automatically creates a .bak before overwriting. A gateway restart is required to apply changes (docker restart openclaw-gateway).

Gotcha — . in paths: if the path contains . (e.g., a JID like 120363406566286319@g.us), the parser interprets it as an object separator. Workaround: set the entire object one level up (channels.whatsapp.groups, with value being a dict).

Inheritance Bug: Must Set on Both Channel + Account

channels.whatsapp.<X> is not inherited by channels.whatsapp.accounts.default.<X> during resolveMergedAccountConfig. Silent symptom: you set it on the channel, the value is "default" at runtime, and the gateway applies the fallback. Config changes must be duplicated:

{
  "channels.whatsapp.groupPolicy": "allowlist",
  "channels.whatsapp.accounts.default.groupPolicy": "allowlist",
  "channels.whatsapp.groupAllowFrom": [...],
  "channels.whatsapp.accounts.default.groupAllowFrom": [...]
}

`groupAllowFrom` Is a Sender List, Not a Group List

The name is misleading. groupAllowFrom is validated against senderE164 (the phone number of the message sender), via isNormalizedSenderAllowed(). It is not an allowlist by group.

There is no native group JID allowlist in OpenClaw. To restrict to a specific group, the options are:

Operational: the bot is in only one group.
Per user: list authorized phone numbers in groupAllowFrom. The bot responds when ONE of them speaks in ANY group (with default mention).
Combined: requireMention: true (default) everywhere, with an exception via groups.<JID>.requireMention: false for the main group.

WhatsApp in DinD

The <container_name> container is Docker-in-Docker. The Docker daemon that mounts volumes belongs to the outer host (<hostname>), not the inner container. Symptoms:

Bind-mounting a file creates a directory instead: the daemon looks for the file on its own filesystem, doesn't find it, and creates an empty dir. Solution: package files in a thin Dockerfile via COPY (instead of bind mounts).
localhost doesn't resolve in Alpine containers: use 127.0.0.1.
Binding 127.0.0.1:18789 on the host conflicts: use expose: instead of ports:. To access the UI externally: docker exec or port-forward via SSH.

Baileys Pairing

docker exec -it openclaw-gateway openclaw channels add --channel whatsapp
docker exec -it openclaw-gateway openclaw channels login --channel whatsapp

The second command opens an ASCII QR code in the terminal. It must be a TTY (docker exec -it + ssh -t if via SSH). Pair with the phone app under Linked Devices. Session is persisted in a Docker volume.

Session expires after a few days. Symptom: channels status --probe reports linked, connected, in:Xm ago but messagesHandled: 0. Common root cause: error 1006 from failed hydrating participating groups on connect. Solution: logout + login (new QR).

docker exec openclaw-gateway openclaw channels logout --channel whatsapp
docker exec -it openclaw-gateway openclaw channels login --channel whatsapp

After login, verify with directory groups list — if it returns the group list, hydration completed successfully.

`append` Skip Bug After Restart

Each gateway restart triggers a Baileys reconnect. Messages arriving during this window come with upsert.type === "append" (history sync). The code path in /app/dist/extensions/whatsapp/monitor-BXydC-6q.js around line ~967 checks:

const msgTsNum = msg.messageTimestamp != null ? Number(msg.messageTimestamp) : NaN;
if ((Number.isFinite(msgTsNum) ? msgTsNum * 1e3 : 0) < connectedAtMs - 60_000) continue;

If messageTimestamp is absent, the fallback 0 makes the comparison always true — skipping the message. Patch: change : 0 to : Date.now(). This is a manual edit in a dist file; it must be redone after each docker compose build. Issue tracker: openclaw/openclaw#19856.

`groupAllowFrom` Bug (issue #54613)

Multiple issues report "DMs work, groups don't" or "messagesHandled stuck at 0 even when connected". The actual cause is a combination of:

groupPolicy must be set on both channel + account (inheritance bug).
groupAllowFrom must contain E.164 sender numbers, not JIDs.
groups.<JID>.requireMention defaults to true and silently blocks in applyGroupGating if the message has no real mention (and WhatsApp Web doesn't always mark mentions correctly).

Full workaround is documented in @pandeysoni's comment on issue #54613.

Sessions and Conversation Context

OpenClaw has native sessions per channel+group. SessionKey:

agent:<agentId>:whatsapp:group:<JID>

Persisted in ~/.openclaw/agents/<agentId>/sessions/<uuid>.jsonl — JSONL with each turn (user message + tool calls + agent response). Consecutive messages in the same group share context, so "add 1 more" after "how much sugar does it have?" works.

To debug a conversation:

docker exec openclaw-gateway cat ~/.openclaw/agents/main/sessions/<uuid>.jsonl

MCP Transport: Legacy SSE, Not Streamable HTTP

OpenClaw 2026.4.x uses legacy HTTP+SSE for MCP, not Streamable HTTP (the newer protocol). The client sends a GET to the configured URL and expects a text/event-stream response with an event: endpoint handshake. Servers that return 405 on GET (common with Streamable HTTP-only stateless servers) fail with SSE error: Non-200 status code (405).

Solution: implement SSEServerTransport from @modelcontextprotocol/sdk alongside Streamable HTTP, exposing GET /sse (handshake) + POST /messages?sessionId=<id> (client messages). Configure MCP in OpenClaw pointing to /sse:

openclaw mcp set <name> '{"url":"http://host:port/sse"}'

Gateway Mode Is Required

gateway.mode must be set (local or remote). Without it, gateway start is blocked and openclaw doctor complains. The default is not populated — missing this once cost a full day of debugging.

Useful Debug Commands

# General status
openclaw doctor
openclaw channels status --probe

# View recent inbound/outbound
docker exec openclaw-gateway cat /tmp/openclaw/openclaw-$(date +%F).log | grep web-inbound | tail -5
docker exec openclaw-gateway cat /tmp/openclaw/openclaw-$(date +%F).log | grep web-auto-reply | tail -5

# MCP discovery
docker logs openclaw-gateway | grep bundle-mcp

# Heartbeat (check messagesHandled and lastInboundAt)
docker exec openclaw-gateway tail -3 /tmp/openclaw/openclaw-$(date +%F).log

# List groups visible to Baileys (after hydration)
openclaw directory groups list

Strategy for Debugging Silent Symptoms

When "messages aren't arriving" and no log explains it, patch the dist file with console.error at critical points. The path is /app/dist/extensions/whatsapp/monitor-BXydC-6q.js (the hash in the filename changes with each release). Useful points to instrument:

handleMessagesUpsert (entry) — confirms Baileys delivers
normalizeInboundMessage null returns — ACL, recent outbound echo, status broadcast
enrichInboundMessage null return — unsupported format
claimRecentInboundMessage — dedup
enqueueInboundMessage — confirms entry into the pipeline

Back up first (cp file file.bak), then cp file.bak file to restore.

Recommendations for Other Deployments

Avoid DinD if possible. The outer host's Docker daemon vs the container runs in different namespaces, and bind mounts break silently.
Use mcpServers from the start instead of proprietary TS plugins. MCP is the official path and is reusable (Claude Desktop, Cowork, etc. consume the same server).
Patch dist files carefully. Keep versioned backups; an OpenClaw upgrade will overwrite the files, losing patches. Consider forking the image with a multi-stage Dockerfile.
Baileys sessions are fragile. Schedule a preventive weekly restart (cron) and monitor messagesHandled in the heartbeat — if it stays at 0 for more than 1h while connected, it's time to re-login.

O Claude terminou com o OpenClaw… Será mesmo?

Gustavo Gondim — Sun, 05 Apr 2026 21:50:28 +0000

A internet surtou recentemente depois que a Anthropic resolveu banir ferramentas de terceiros utilizadas junto com o Claude Code, especialmente com o OpenClaw. Mas como isso vai funcionar na prática?

A má notícia oficial

Se você é assinante do Claude, com certeza recebeu um e-mail da Anthropic nesse sábado, 4 de abril, dizendo que o Claude Code não funcionaria mais com ferramentas de terceiros que utilizam da sua assinatura para funcionar.

Mencionando especificamente o OpenClaw - que foi recentemente comprado pela rival OpenAI, a dona do Claudinho ainda diz que você ainda poderá utilizar essas ferramentas, mas que o consumo delas virá de um consumo extra, não da sua assinatura.

Para engajar com o custo extra, a Anthropic diz que bonifica com um ~~calaboca~~ generoso de créditos para uso extra e ainda um desconto de 30% para esse tipo de crédito.

Bastou esse e-mail para o Linkedisney e o Twitter ficarem em alvoroço total, com pessoas apoiando, outras criticando, algumas sofrendo e muitas - como eu - confusas.

A contradição da própria Anthropic

O que não fica claro para o usuário - em todos os meios oficiais da empresa, é quando e como as ferramentas de terceiro serão identificadas para cobrar do seu uso extra. Em seu site, a Anthropic afirma categoricamente que “reserva o direito de cobrar essas ferramentas do seu uso extra ao invés da sua assinatura”, mas não especifica quais métodos utiliza pra isso.

Logging in to your Claude account | Claude Help Center

Também não fica claro outras capacidades de reutilização da assinatura, como logins utilizando tokens em outros computadores, agent sessions e o Agent SDK, como muito bem evidenciado por esse usuário do Twitter:

https://x.com/mattpocockuk/status/2040536403289764275

O cenário atual

O criador do OpenClaw, Peter Steinberger, como um bom funcionário vestindo a camisa da OpenAI, não deixou de destilar suas críticas e provocações à Anthropic, defendendo o software Open Source em seu Twitter.

Porém, a melhor notícia é que - aparentemente - o mecanismo que valida se o Claude Code está executando um prompt vindo do OpenClaw é uma mera verificação da palavra “OpenClaw” no system prompt, como apontado pelo próprio @steipete no Twitter:

Assustado com esse novo movimento, eu mesmo testei o Claude Code no GitHub Actions, que usa uma integração oficial, porém junto com a minha assinatura do Claude Code (através do OAuth token). E - ufa - ainda está funcionando e não cobrou nada do extra usage, por enquanto.

Próximos episódios

Desde janeiro desse ano a Anthropic vem fazendo mudanças repentinas no modelo de assinatura, nos termos de uso e até mesmo vem sendo acusada de “piorar” gradativamente o modelo do Opus para (talvez) lançar um novo modelo.

Fato é que, nessa guerra fria das LLMs e dos agentes, não podemos ser radicais e leais a nenhum desses providers.

Pelo menos não enquanto nenhum deles ter algum diferencial e modelo de negócio lucrativo e competitivo.

Migrating from Claude Sub-agents to duckflux

Gustavo Gondim — Fri, 03 Apr 2026 15:15:10 +0000

Claude Code's sub-agent system is powerful. You define specialized agents with focused prompts, restricted tools, and independent contexts. Claude decides when to delegate, spawns sub-agents in foreground or background, and synthesizes results. It works.

But there's a design choice buried in the architecture that matters more than any individual feature: who decides what happens next? In Claude Sub-agents, the answer is the LLM. The parent agent reads your request, evaluates sub-agent descriptions, and decides which one to spawn. The routing logic lives in inference, not in config.

This article explores why that matters, when it becomes a problem, and how duckflux offers an alternative where the orchestration is deterministic while the work inside each step stays as creative as the LLM needs to be.

How Claude Sub-agents work

Claude Sub-agents are markdown files with YAML frontmatter that define specialized AI assistants. Each sub-agent has its own system prompt, tool restrictions, model choice, and permission mode.

---
name: code-reviewer
description: Reviews code for quality and best practices
tools: Read, Grep, Glob, Bash
model: sonnet
---

You are a senior code reviewer. When invoked, analyze the code
and provide specific, actionable feedback on quality, security,
and best practices.

At runtime, Claude reads the description field of each available sub-agent and decides whether to delegate. You can nudge this with natural language ("use the code-reviewer agent") or force it with @-mentions, but the routing is fundamentally an LLM decision.

Sub-agents run in their own context window. They can't spawn other sub-agents. Results return to the parent, which synthesizes them. For parallel work, you can run multiple sub-agents in the background, or use agent teams for cross-session coordination.

Key capabilities:

Isolation. Each sub-agent has its own context, tools, and permissions.
Model routing. Haiku for cheap exploration, Opus for complex reasoning, Sonnet as default.
Worktree isolation. isolation: worktree gives a sub-agent a temporary git worktree.
Persistent memory. Sub-agents can accumulate learnings across sessions.
Hooks. PreToolUse, PostToolUse, Stop hooks for lifecycle control.
Background execution. Sub-agents run concurrently while you keep working.

The non-determinism problem

Here's the thing: Claude Sub-agents are orchestrated by inference. The LLM decides:

Whether to delegate at all.
Which sub-agent to spawn.
What prompt to write for the sub-agent.
When to synthesize results vs. spawn more agents.
Whether to chain sub-agents or return to you.

Each of these decisions is a probabilistic inference. On a good day, Claude makes the right calls. On a bad day, it forgets to delegate, picks the wrong sub-agent, writes a vague task prompt, or synthesizes prematurely.

This is fine for interactive, exploratory work. You're in the loop, you can redirect, you can say "no, use the reviewer agent." But the moment you want a repeatable pipeline (plan, code, test, review, deploy), you're asking the LLM to be a reliable router. And LLMs are unreliable routers. They forget steps, miscount iterations, and silently skip transitions.

The sub-agent docs themselves acknowledge this: sub-agents cannot spawn other sub-agents, so chaining requires the parent to orchestrate. But the parent's orchestration logic is just... its next token prediction.

Compare this to how we treat human workflows. Nobody says "here are five specialists, figure out the order." We define processes, assign roles to steps, and execute deterministically. The specialists bring creativity; the process brings structure.

What is duckflux?

duckflux is a declarative, YAML-based workflow DSL. The execution order is defined in config, not inferred by an LLM. The runtime handles sequencing, loops, parallelism, retries, events, and tracing.

flow:
  - type: exec
    run: npm test

The key difference: duckflux separates orchestration from execution. The workflow file defines what happens in what order. Each step can invoke an LLM, run a shell command, call an HTTP API, or trigger a sub-workflow. The LLM does creative work inside each step. The workflow DSL handles the plumbing between steps.

The determinism spectrum

It's not binary. Different parts of a pipeline need different levels of determinism.

Concern	Needs determinism?	Why
Step ordering	Yes	Plan before code, test before deploy. Not negotiable.
Retry logic	Yes	"Retry 3 times with backoff" is a policy, not a creative decision.
Quality gates	Yes	Tests pass or they don't. Exit codes, not vibes.
Error handling	Yes	"If deploy fails, notify Slack" is a business rule.
Code generation	No	The LLM should be creative here.
Code review	No	The LLM should reason freely about quality.
Planning	No	Breaking tasks into subtasks is inherently creative.

Claude Sub-agents put everything on the non-deterministic side. duckflux lets you draw the line where it makes sense: deterministic orchestration, non-deterministic execution.

Concepts side by side

Claude Sub-agents	duckflux	Notes
Sub-agent (markdown file)	Participant	A unit of work. In duckflux, not limited to LLM invocations.
`description` (LLM-routed)	Flow position / `when` guard	Explicit placement replaces LLM routing decisions.
Parent decides delegation	`flow` array	Ordering is declared, not inferred.
`maxTurns`	`retry.max` / `loop.max`	Iteration caps per step, not per agent context.
`isolation: worktree`	`cwd` per participant	Working directory isolation per step.
Background sub-agents	`parallel:` construct	Concurrent execution declared in config.
Chained sub-agents	Sequential flow	No LLM needed to decide "run B after A."
Sub-agent hooks	`onError`, `when`, `emit`/`wait`	Lifecycle control in the DSL, not in hook scripts.
Persistent memory	`execution.context` / `set`	Workflow-scoped state. Cross-session memory is outside duckflux scope.
Model routing	N/A (bring your own agent CLI)	duckflux orchestrates commands; model choice is per-agent.

Migration patterns

Chained sub-agents

In Claude Sub-agents, chaining requires the parent to decide the sequence:

Use the code-reviewer subagent to find performance issues,
then use the optimizer subagent to fix them

The parent LLM interprets "then" and decides to spawn the optimizer after the reviewer. If it misunderstands, it might run them in parallel, skip the optimizer, or synthesize prematurely.

duckflux:

participants:
  review:
    type: exec
    run: cat PROMPT_REVIEW.md | $AGENT

  optimize:
    type: exec
    run: cat PROMPT_OPTIMIZE.md | $AGENT

flow:
  - review
  - optimize

"Then" is a line break in the YAML. No inference needed.

Parallel research

Claude Sub-agents can run research in parallel via background tasks:

Research the authentication, database, and API modules
in parallel using separate subagents

Again, the parent decides whether to actually parallelize, which sub-agents to use, and how to synthesize.

duckflux:

flow:
  - parallel:
      - as: auth-research
        type: exec
        run: cat PROMPT_AUTH.md | $AGENT

      - as: db-research
        type: exec
        run: cat PROMPT_DB.md | $AGENT

      - as: api-research
        type: exec
        run: cat PROMPT_API.md | $AGENT

Parallelism is declared. All three run concurrently. The outputs are collected in an array for the next step. No LLM routing decision required.

Review loop with quality gates

A common Claude Sub-agents pattern: code, then review, then fix if needed. The parent decides when to stop.

Use the coder subagent to implement the feature,
then use the reviewer subagent to check it.
If there are issues, have the coder fix them.
Repeat until the reviewer approves.

The parent LLM manages the iteration. It decides whether to loop, how many times, and when to stop. If it loses track, the loop might run forever (capped by maxTurns) or stop too early.

duckflux:

participants:
  code:
    type: exec
    run: cat PROMPT_CODE.md | $AGENT
    onError: retry
    retry:
      max: 3
      backoff: 2s

  test:
    type: exec
    run: npm test

  lint:
    type: exec
    run: npm run lint

  review:
    type: exec
    run: cat PROMPT_REVIEW.md | $AGENT

flow:
  - loop:
      until: review.output.approved == true
      max: 5
      steps:
        - code
        - test
        - lint
        - review

The loop condition, iteration cap, and quality gates are all in the config. The LLM does creative work inside code and review. The DSL handles the loop, the exit condition, and the gates. test and lint are real commands with real exit codes, not prompt instructions asking the agent to self-report.

Event-driven coordination

Claude Sub-agents have no event system. If sub-agent A needs to signal sub-agent B, the parent synthesizes A's output and writes B's prompt. The coordination happens in the parent's inference.

duckflux has native emit + wait for cases where steps genuinely need to signal each other:

flow:
  - parallel:
      - as: data-prep
        type: exec
        run: ./prepare-data.sh

      - as: wait-for-data
        type: exec
        run: |
          # This branch waits for data-prep to signal readiness

  - wait:
      event: "data.ready"
      timeout: 5m

  - as: process
    type: exec
    run: ./process.sh

Events work across parallel branches, across parent/child workflows, and with external event hubs (NATS, Redis). This is coordination infrastructure that the sub-agent model lacks entirely.

When to keep sub-agents

Sub-agents are the right tool when:

You're working interactively. Typing in Claude Code, exploring a codebase, asking questions. The LLM-routed delegation is exactly right here because you're in the loop.
The workflow is genuinely emergent. You don't know the steps upfront. The agent needs to figure out what to do based on what it finds.
Context preservation matters. Each sub-agent's isolated context window prevents pollution of the main conversation. This is a real advantage for high-volume operations.
You need model routing. Sending cheap tasks to Haiku and expensive tasks to Opus within a single session is built into the sub-agent model.

When to switch to duckflux

Switch when:

The workflow is repeatable. If you've typed the same chaining instructions more than twice, it should be a config file.
You need guaranteed step ordering. Plan, code, test, review, deploy. Always in that order. No exceptions.
You need real quality gates. Not "please run the tests", but npm test as an actual step with an exit code.
You need audit trails. Structured JSON traces per step, visible in the web server UI.
You need cross-agent events. Steps signaling each other, waiting for external events, publishing to message queues.
You want provider independence. duckflux orchestrates $AGENT, not Claude specifically. Swap agents per step.

What you gain

Concern	Claude Sub-agents	duckflux
Routing	LLM decides (probabilistic)	Config declares (deterministic)
Step ordering	Parent LLM inference	`flow` array, top to bottom
Quality gates	Prompt instructions	Real commands with exit codes
Retry	`maxTurns` (global per agent)	`retry.max` with backoff (per step)
Parallel	Background sub-agents (LLM decides)	`parallel:` construct (declared)
Events	None	`emit` + `wait` (cross-branch, cross-workflow)
Tracing	Transcript files	Structured JSON + web server UI
Provider lock-in	Claude Code only	Any agent CLI, any runtime

What you lose

Interactive delegation. The natural "use the reviewer agent" UX in Claude Code. duckflux is a runner, not an interactive assistant.
Context isolation. Sub-agents protect the parent's context window. duckflux steps are independent commands, but they don't share a conversation context across steps.
Tool restriction enforcement. Sub-agents have framework-level tool control. In duckflux, that's the agent's responsibility.
Model routing within the workflow. Sub-agents can use different models per agent. In duckflux, each exec step invokes whatever CLI you point it at.
Persistent memory. Sub-agents accumulate learnings across sessions. duckflux has execution.context for within-workflow state, but cross-session memory is outside scope.

A hybrid approach

You don't have to choose one or the other. The most practical architecture uses both:

# ci-pipeline.flux.yaml
participants:
  plan:
    type: exec
    run: claude --agent planner --print "$(cat SPEC.md)"

  code:
    type: exec
    run: claude --agent coder --print "Implement the plan in PLAN.md"
    onError: retry
    retry:
      max: 3

  test:
    type: exec
    run: npm test

  lint:
    type: exec
    run: npm run lint

  review:
    type: exec
    run: claude --agent reviewer --print "Review the implementation"

flow:
  - plan

  - loop:
      until: review.output.approved == true
      max: 5
      steps:
        - code
        - test
        - lint
        - review

Each claude --agent step is a Claude Sub-agent invocation. The sub-agent gets its isolated context, restricted tools, and specialized prompt. But the orchestration (ordering, looping, gating, retrying) is declarative. The LLM does creative work. The YAML handles plumbing.

This is the core argument: decouple what the LLM is good at (reasoning, generation, analysis) from what config files are good at (sequencing, retrying, branching, gating). Don't ask the LLM to be a router when you already know the route.

Getting started

Install the runtime:

bun add -g @duckflux/runner

Identify your repeatable workflows. Which sub-agent chains do you run the same way every time?
Extract the ordering into a .flux.yaml. Each sub-agent becomes a participant. The chain becomes the flow.
Add real quality gates. Replace "please run the tests" with actual npm test steps.
Run it:

duckflux run my-pipeline.flux.yaml

Observe via duckflux server --trace-dir ./traces for a visual trace of every step.

Tip: Keep using Claude Sub-agents for interactive exploration and ad-hoc tasks. Use duckflux for the workflows you've already figured out and want to run reliably, repeatably, and without babysitting.

Final thoughts

Claude Sub-agents represent a real step forward in AI-assisted development. The isolated contexts, tool restrictions, and model routing are well-designed primitives.

But the orchestration layer, where the LLM decides what to delegate, when, and in what order, is the weak link. Not because Claude is bad at it, but because orchestration is fundamentally a deterministic problem being solved with a probabilistic tool.

duckflux doesn't replace the agents. It replaces the part of the system that shouldn't be guessing.

Check the duckflux docs for the full DSL reference, or jump straight to the spec.

Migrating from Ralph Orchestrator to duckflux

Gustavo Gondim — Fri, 03 Apr 2026 15:04:54 +0000

If you've been using Ralph Orchestrator to coordinate coding agents through hats and events, you already understand that complex AI tasks need structure beyond a bash loop. You've invested in event topologies, backpressure gates, and persona-based coordination.

This guide shows how to express those same patterns in duckflux, where you get explicit flow control, native events (emit/wait), and the choice between deterministic sequencing and event-driven decoupling depending on what the problem actually needs.

What is Ralph Orchestrator?

Ralph Orchestrator is a hat-based orchestration framework for AI coding agents. It builds on the Ralph Wiggum iteration technique but adds a coordination layer: specialized personas called hats communicate through typed events to break complex tasks into phases.

The core model is an event loop. You define hats with triggers (events they react to) and publishes (events they can emit). Ralph routes events between hats based on pattern matching. The AI agent inside each hat decides which event to emit based on its reasoning, making the orchestration partially emergent from the agent's behavior.

# ralph.yml
event_loop:
  starting_event: "task.start"
  completion_promise: "LOOP_COMPLETE"
  max_iterations: 50

cli:
  backend: claude

hats:
  planner:
    triggers: ["task.start"]
    publishes: ["tasks.ready"]
    instructions: |
      Break the task into subtasks.

  builder:
    triggers: ["tasks.ready", "review.rejected"]
    publishes: ["review.ready"]
    instructions: |
      Implement the planned tasks.
      Run tests before emitting review.ready.

  critic:
    triggers: ["review.ready"]
    publishes: ["review.passed", "review.rejected"]
    instructions: |
      Review the implementation. Reject if tests fail.

  finalizer:
    triggers: ["review.passed"]
    publishes: ["LOOP_COMPLETE"]
    instructions: |
      Verify everything passes and emit LOOP_COMPLETE.

In this example, the critic hat can publish either review.passed or review.rejected. Ralph doesn't decide which one fires. The agent does, based on what it sees in the code. If it emits review.rejected, the builder hat re-activates (because it triggers on review.rejected). If it emits review.passed, the finalizer activates. The workflow topology is static, but the execution path through it is dynamic.

Where the event model creates friction

Ralph's event-driven approach is genuinely powerful for multi-agent coordination. But it comes with tradeoffs that get harder to manage as workflows grow:

Implicit execution order. You can't read the ralph.yml top to bottom and know what happens. The actual execution path depends on which events each hat's agent chooses to emit at runtime. To understand the workflow, you have to mentally trace the event graph across all hat definitions.
Agent-decided routing. When a hat can publish review.passed OR review.rejected, the orchestration logic lives partly inside the agent's prompt, not in the config file. If the agent misunderstands its instructions and emits the wrong event, the entire flow goes off-track in ways that are hard to debug from the topology alone.
Events as implicit state. Hats share context through event payloads and the file system ("Disk Is State"). But event payloads are intentionally kept small (routing signals, not data transport), so the real state transfer happens via files that aren't declared anywhere in the config. The workflow's data flow is invisible.
Topology validation gaps. Ralph validates that each trigger maps to exactly one hat (no ambiguity), but it can't validate that the agent will actually emit sensible events. A hat with publishes: ["build.done", "build.blocked"] might always emit build.blocked if the prompt is unclear, creating an infinite rejection loop that only max_activations can break.

These aren't bugs in Ralph. They're inherent to the model: event-driven orchestration with LLM-decided routing trades predictability for flexibility. The question is whether your workflow actually needs that flexibility, or whether explicit flow control would make it easier to reason about, debug, and maintain.

What is duckflux?

duckflux is a declarative, YAML-based workflow DSL. You describe what should happen and in what order. The runtime handles execution, retries, parallelism, error handling, and tracing.

flow:
  - type: exec
    run: npm test

Crucially, duckflux also has a native event system (emit + wait) for cases where decoupled communication genuinely helps. The difference from Ralph is that events are opt-in per step, not the entire coordination model. You use explicit flow for the deterministic parts and events for the parts that need asynchronous signaling.

Two models of coordination

Before diving into migration patterns, it's worth understanding the fundamental difference.

Ralph Orchestrator: event-driven, emergent flow. You define a topology of hats and events. The execution path emerges at runtime from which events agents choose to emit. The config declares relationships, not order.

duckflux: explicit flow with optional events. You write a top-to-bottom sequence. Control flow constructs (loop, parallel, if, when) handle branching and iteration. Events (emit/wait) handle cross-branch and cross-workflow signaling. The config declares order, with events for decoupled coordination.

Neither model is universally better. But for most iterative coding workflows, the execution path is predictable enough that explicit flow gives you better debuggability without losing expressiveness.

Concepts side by side

Ralph Orchestrator	duckflux	Notes
Hat	Participant	A named unit of work. In duckflux, not tied to agent personas.
Event (routing)	Flow order / `when` guard / `if`	Explicit sequencing replaces event routing for deterministic paths.
Event (signaling)	`emit` + `wait`	For async communication across branches or workflows.
`starting_event`	First step in `flow`	The flow starts at the top.
`completion_promise`	Workflow ends when `flow` completes	No magic string needed.
`max_iterations`	`loop.max` / `retry.max`	Scoped per-loop or per-step, not global.
`triggers` + `publishes`	`loop` + `until` condition	Feedback loops are explicit, not event-inferred.
Backpressure (guardrails)	Real steps with exit codes	`npm test` as a flow step vs. prompt instruction.
Memories	`execution.context` / `set`	Workflow-scoped state. Cross-session memory is outside duckflux scope.
Glob patterns (`build.*`)	`wait` with `match` expression	CEL expressions instead of glob matching.

Migrating the code-assist pipeline

The builtin code-assist preset defines four hats with a feedback loop: planner, builder, critic, finalizer. The critic can reject, sending the builder back to retry.

Ralph Orchestrator

event_loop:
  starting_event: "build.start"
  completion_promise: "LOOP_COMPLETE"
  max_iterations: 30

hats:
  planner:
    triggers: ["build.start", "task.complete"]
    publishes: ["tasks.ready"]
    instructions: |
      Read the spec. Break work into small tasks.

  builder:
    triggers: ["tasks.ready", "review.rejected"]
    publishes: ["review.ready"]
    instructions: |
      Implement tasks. Before emitting review.ready:
      - tests: pass
      - lint: pass

  critic:
    triggers: ["review.ready"]
    publishes: ["review.passed", "review.rejected"]
    instructions: |
      Review implementation. Reject if quality gates fail.

  finalizer:
    triggers: ["review.passed"]
    publishes: ["task.complete", "LOOP_COMPLETE"]
    instructions: |
      Run final validation. Emit LOOP_COMPLETE if all checks pass.

The execution path here depends on the critic. If it emits review.rejected, the builder re-activates. If it emits review.passed, the finalizer runs. This feedback loop is implicit in the event topology.

duckflux

# code-assist.flux.yaml
participants:
  plan:
    type: exec
    run: cat PROMPT_PLAN.md | $AGENT

  build:
    type: exec
    run: cat PROMPT_BUILD.md | $AGENT
    onError: retry
    retry:
      max: 5
      backoff: 2s

  test:
    type: exec
    run: npm test

  lint:
    type: exec
    run: npm run lint

  review:
    type: exec
    run: cat PROMPT_REVIEW.md | $AGENT

flow:
  - plan

  - loop:
      until: review.output.approved == true
      max: 10
      steps:
        - build
        - test
        - lint
        - review

The feedback loop is explicit: loop repeats until the review approves or the cap is hit. Quality gates (test, lint) are real commands that fail the flow if they fail. The builder doesn't need to self-report "tests: pass" in an event payload; the runtime knows because it ran npm test and saw the exit code.

In Ralph, the critic decides whether to reject. In duckflux, the loop + until condition decides. You can still use an agent-based review step, but the loop control lives in the DSL, not in the agent's reasoning. This means a confused agent can't accidentally skip the rejection loop by emitting the wrong event.

Migrating coordination patterns

Pipeline (linear handoff via events)

In Ralph, even a simple pipeline uses events to chain hats:

Ralph Orchestrator:

hats:
  test_writer:
    triggers: ["tdd.start"]
    publishes: ["test.written"]
  implementer:
    triggers: ["test.written"]
    publishes: ["test.passing"]
  refactorer:
    triggers: ["test.passing"]
    publishes: ["refactor.done"]

Three hats, three event hops. The ordering is deterministic (each hat publishes exactly one event), but you have to trace the graph to see it.

duckflux:

flow:
  - as: write-tests
    type: exec
    run: cat PROMPT_TESTS.md | $AGENT

  - as: implement
    type: exec
    run: cat PROMPT_IMPL.md | $AGENT

  - as: refactor
    type: exec
    run: cat PROMPT_REFACTOR.md | $AGENT

When the execution path is deterministic, sequential steps are simpler than events. You read the order directly.

Adversarial review (cyclic event routing)

This is where Ralph's event model shines: the red team either approves or finds vulnerabilities, and the fixer loops back.

Ralph Orchestrator:

hats:
  builder:
    triggers: ["security.review", "fix.applied"]
    publishes: ["build.ready"]
  red_team:
    triggers: ["build.ready"]
    publishes: ["vulnerability.found", "security.approved"]
  fixer:
    triggers: ["vulnerability.found"]
    publishes: ["fix.applied"]

The cycle: builder -> red_team -> (vulnerability.found -> fixer -> fix.applied -> builder -> red_team) until security.approved.

duckflux:

participants:
  build:
    type: exec
    run: cat PROMPT_BUILD.md | $AGENT

  security-scan:
    type: exec
    run: cat PROMPT_SECURITY.md | $AGENT

  fix:
    type: exec
    run: cat PROMPT_FIX.md | $AGENT

flow:
  - build

  - loop:
      until: security-scan.output.approved == true
      max: 5
      steps:
        - security-scan
        - fix:
            when: security-scan.output.approved == false

The when guard on fix replaces the conditional event routing. The loop + until replaces the implicit cycle. Same behavior, but the flow reads linearly and the exit condition is declared, not inferred from event topology.

Coordinator-specialist with event signaling

Ralph's coordinator-specialist pattern fans out work to multiple specialists. This is one case where duckflux events (emit/wait) are the right tool, because the branches genuinely need to signal each other.

Ralph Orchestrator:

hats:
  analyzer:
    triggers: ["gap.start", "verify.complete", "report.complete"]
    publishes: ["analyze.spec", "verify.request", "report.request"]
  verifier:
    triggers: ["analyze.spec", "verify.request"]
    publishes: ["verify.complete"]
  reporter:
    triggers: ["report.request"]
    publishes: ["report.complete"]

duckflux:

participants:
  analyze:
    type: exec
    run: cat PROMPT_ANALYZE.md | $AGENT

  verify:
    type: exec
    run: cat PROMPT_VERIFY.md | $AGENT

  report:
    type: exec
    run: cat PROMPT_REPORT.md | $AGENT

  signal-verified:
    type: emit
    event: "verify.complete"
    payload: verify.output

  signal-reported:
    type: emit
    event: "report.complete"
    payload: report.output

flow:
  - analyze

  - parallel:
      - verify
      - report

  - as: notify
    type: emit
    event: "analysis.done"
    payload:
      verified: verify.output
      reported: report.output

Here, parallel: runs verify and report concurrently (replacing the fan-out). The emit at the end publishes a completion event that other workflows or external systems can consume. If the branches needed to coordinate mid-execution, you could add wait steps inside each branch.

Cyclic rotation with I/O chaining

Ralph's mob-programming pattern (navigator, driver, observer) rotates through roles with events carrying feedback between them. This pattern benefits from the I/O chain in duckflux, where each step's output automatically feeds the next.

Ralph Orchestrator:

hats:
  navigator:
    triggers: ["mob.start", "observation.noted"]
    publishes: ["direction.set"]
  driver:
    triggers: ["direction.set"]
    publishes: ["code.written"]
  observer:
    triggers: ["code.written"]
    publishes: ["observation.noted", "mob.complete"]

duckflux:

participants:
  navigate:
    type: exec
    run: cat PROMPT_NAVIGATE.md | $AGENT

  drive:
    type: exec
    run: cat PROMPT_DRIVE.md | $AGENT

  observe:
    type: exec
    run: cat PROMPT_OBSERVE.md | $AGENT

flow:
  - loop:
      until: observe.output.complete == true
      max: 10
      steps:
        - navigate
        - drive
        - observe

The I/O chain passes each step's output as input to the next. The observer's output feeds back to the navigator on the next iteration via the chain. No events needed here because the data flows linearly within the loop.

Backpressure: prompt-injected vs. real gates

This is the biggest philosophical difference between the two systems.

Ralph Orchestrator enforces quality through prompt instructions and guardrails. The agent is told to run tests, and the hat instructions say "Before emitting build.done, you MUST have: tests: pass, lint: pass." But the agent could emit build.done anyway. The backpressure is advisory.

core:
  guardrails:
    - "Tests must pass before declaring done"
    - "Never skip linting"

hats:
  builder:
    instructions: |
      Before emitting build.done:
      - tests: pass
      - lint: pass
      - typecheck: pass

duckflux enforces quality through actual steps. If npm test returns a non-zero exit code, the flow fails. The agent can't bypass the gate because the gate isn't a prompt instruction. It's a real command.

flow:
  - as: build
    type: exec
    run: cat PROMPT_BUILD.md | $AGENT

  - as: test
    type: exec
    run: npm test

  - as: lint
    type: exec
    run: npm run lint

  - as: typecheck
    type: exec
    run: npx tsc --noEmit

You can combine both approaches: let the agent run tests during its iteration (for fast feedback), and then verify with a dedicated step in the flow (for guaranteed enforcement).

When to use events vs. explicit flow

Not every Ralph event pattern needs to become a duckflux event. Here's a rule of thumb:

Pattern in Ralph	duckflux equivalent	Why
Linear pipeline (`A -> B -> C`)	Sequential flow	Deterministic order doesn't need events.
Feedback loop (`critic -> builder -> critic`)	`loop` + `until`	Exit condition is declared, not event-inferred.
Conditional routing (`passed` vs `rejected`)	`if` / `when`	Flow constructs handle branching explicitly.
Cross-branch signaling	`emit` + `wait`	Parallel branches that need to coordinate.
Cross-workflow communication	`emit` + `wait` (shared event hub)	Parent/child workflows exchanging signals.
External system notifications	`emit` (with event hub provider)	Fire-and-forget or acknowledged delivery to Kafka, NATS, etc.

What you gain

Concern	Ralph Orchestrator	duckflux
Flow readability	Trace event graph mentally	Read YAML top to bottom
Routing control	Agent decides which event to emit	Flow constructs (`if`, `when`, `loop`)
Quality gates	Prompt instructions (advisory)	Real steps with exit codes (enforced)
Retry	Global `max_iterations`	Per-step `retry.max` with backoff
Parallel	Git worktrees + `features.parallel`	`parallel:` construct, single trace
Event system	Core coordination model	Opt-in for cross-branch/cross-workflow
Agent coupling	Every hat invokes an agent	Mix agents, shell, HTTP, sub-workflows
State passing	Event payloads + filesystem	I/O chain + `execution.context`
Observation	TUI + web dashboard	Web server UI with trace viewer + real-time SSE

What you lose

To be fair about the tradeoffs:

Memory system. Ralph's memories persist learnings across sessions. duckflux doesn't manage agent memory; you'd handle that in your prompts or agent configuration.
TUI. Ralph provides a real-time terminal UI for monitoring loops. duckflux has a web server UI with a trace viewer, execution history, and real-time SSE updates, but no embedded TUI.
Preset library. Ralph ships 31 presets for common patterns. duckflux workflows are written from scratch (or copied from docs like this one).
Agent-aware prompting. Ralph injects hat instructions, guardrails, and memory context into each agent invocation. duckflux orchestrates commands; what goes into the prompt is up to you.

Getting started

Install the runtime:

bun add -g @duckflux/runner

Map your ralph.yml hats to duckflux participants. Each hat becomes a participant with type: exec (or http, workflow, etc.).
Replace event topology with flow constructs. Linear chains become sequential steps. Feedback loops become loop + until. Conditional routing becomes if or when.
Keep events where they add value. Cross-branch signaling, external notifications, and workflow-to-workflow communication use emit + wait.
Move backpressure from prompts to steps. Add npm test, npm run lint, etc. as explicit participants in the flow.
Run it:

duckflux run my-workflow.flux.yaml

Tip: Start by migrating a pipeline preset (linear, no feedback loops). Once the basic flow works, add loop + until for the feedback patterns. Save event-based patterns (emit/wait) for last.

Final thoughts

Ralph Orchestrator's event-driven model is a real innovation for multi-agent coordination. The hat system, the typed events, the backpressure philosophy (Tenet #2: "create gates that reject bad work") are solid ideas.

duckflux takes a different bet: most of the time, you know the execution order. When you do, explicit flow is easier to read, debug, and maintain than event topology. And when you genuinely need decoupled coordination, emit + wait are there.

The question isn't "which tool is more powerful." It's "how much of your workflow is actually non-deterministic?" If the answer is "not much," explicit flow wins. If you're orchestrating five agents that genuinely need to signal each other asynchronously, Ralph's model has merit. duckflux gives you the choice.

Check the duckflux docs for the full DSL reference, or jump straight to the spec.

Migrating from Ralph Loops to duckflux

Gustavo Gondim — Fri, 03 Apr 2026 00:06:09 +0000

If you've been running coding agent tasks inside Ralph Loops, you already understand the core insight: iteration beats perfection. You've seen what happens when you hand a well-written prompt to an AI agent and let it grind until the job is done.

This guide shows how to take that same philosophy and express it as a declarative, reproducible workflow in duckflux. You gain structure, observability, and composability without giving up the power of iterative automation.

What are Ralph Loops?

Ralph Wiggum is an iterative AI development methodology built on a deceptively simple idea: feed a prompt to a coding agent in a loop until the task is complete. Named after the Simpsons character (who stumbles forward until he accidentally succeeds), the technique treats failures as data points and bets on persistence.

Although it originated in the Claude Code ecosystem, the pattern is agent-agnostic. It works with any CLI-based coding agent (Codex, Gemini CLI, aider, etc.). The canonical form is a bash one-liner:

while :; do cat PROMPT.md | <your-agent-cli> ; done

Some agent plugins offer structured commands for it. For example, with the Claude Code Ralph plugin:

/ralph-loop:ralph-loop "Build the auth module" --max-iterations 15 --completion-promise "ALL_TESTS_PASS"

Ralph works. It has shipped hackathon projects overnight, completed $50k contracts for under $300 in API costs, and built entire programming languages. The methodology rests on four principles:

Iteration over perfection: refinement through repetition, not first-pass accuracy.
Failures are data: deterministic failures give you predictable, actionable feedback.
Operator skill matters: prompt quality determines outcomes, not just model capability.
Persistence wins: retry logic continues until the task is done.

Where Ralph starts to hurt

Ralph Loops excel at greenfield, single-agent tasks with clear completion criteria. But as your automation grows, the cracks show:

No visibility. A bash loop gives you no structured trace of what happened, which iteration failed, or why.
No composition. Chaining multiple Ralph Loops means writing more bash glue (conditionals, file watchers, error handling), all imperatively.
No reuse. Each loop is a bespoke script. There's no shared vocabulary for "retry 3 times", "run these in parallel", or "skip this step if X".
No portability. The loop is tied to your shell, your machine, your specific agent CLI setup.

These aren't flaws in Ralph. They're the natural ceiling of an imperative approach. Once you need orchestration, you need a DSL.

What is duckflux?

duckflux is a declarative, YAML-based workflow DSL. You describe what should happen and in what order. The runtime handles execution, retries, parallelism, error handling, and tracing.

flow:
  - type: exec
    run: npm test

No SDK. No boilerplate. No vendor lock-in. A workflow is a .flux.yaml file that any conforming runtime can execute.

Key features that matter for this migration:

Retry & error handling: built into the spec, not bolted on with bash.
Loops: native loop construct with until conditions and max caps, using CEL expressions.
Parallel execution: declare concurrent steps without & and wait.
I/O chaining: output from one step flows as input to the next, automatically.
Execution tracing: structured logs of every step, input, output, and error.

Side-by-side comparison

Let's look at a real pattern: running a code generation prompt iteratively until tests pass, with a maximum number of retries.

The Ralph way

# PROMPT.md contains the generation instructions
# $AGENT is your coding agent CLI (claude, codex, aider, etc.)
MAX=10
i=0
while [ $i -lt $MAX ]; do
  cat PROMPT.md | $AGENT
  if npm test 2>/dev/null; then
    echo "Tests pass. Done."
    exit 0
  fi
  i=$((i + 1))
  echo "Iteration $i/$MAX — tests failed, retrying..."
done
echo "Gave up after $MAX iterations."
exit 1

What's happening here:

Feed the prompt to the coding agent.
Run the test suite.
If tests pass, stop. Otherwise, loop.
Give up after 10 iterations.

This works, but the logic is scattered across bash control flow, there's no structured output, and extending it (add a lint step? run two agents in parallel?) means rewriting the script.

The duckflux way

# codegen-loop.flux.yaml
flow:
  - as: generate-and-test
    type: exec
    run: cat PROMPT.md | $AGENT && npm test
    onError: retry
    retry:
      max: 10

That's it. The same behavior (iterative execution with a retry ceiling) expressed in 6 lines of YAML.

But duckflux lets you go further. Let's decompose the steps and add observability:

# codegen-loop-v2.flux.yaml
participants:
  generate:
    type: exec
    run: cat PROMPT.md | $AGENT

flow:
  - loop:
      until: run-tests.status == "success"
      max: 10
      steps:
        - generate
        - as: run-tests
          type: exec
          run: npm test
          onError: skip

Now each iteration is traced individually. You can see exactly which iteration failed, what the test output was, and how many cycles it took. The loop construct replaces the bash loop, onError: skip replaces the silent 2>/dev/null, and until replaces the implicit exit condition.

Migration cookbook

Below are common Ralph patterns and their duckflux equivalents.

Simple loop until completion

Ralph:

while :; do cat PROMPT.md | $AGENT ; done

duckflux:

flow:
  - type: exec
    run: cat PROMPT.md | $AGENT
    onError: retry
    retry:
      max: 50

Phased loops (multi-step)

Ralph:

# Phase 1
/ralph-loop:ralph-loop "Build the API" --max-iterations 20 --completion-promise "API_DONE"
# Phase 2
/ralph-loop:ralph-loop "Build the UI" --max-iterations 20 --completion-promise "UI_DONE"

duckflux:

participants:
  build-api:
    type: exec
    run: cat PROMPT_API.md | $AGENT
    onError: retry
    retry:
      max: 20

  build-ui:
    type: exec
    run: cat PROMPT_UI.md | $AGENT
    onError: retry
    retry:
      max: 20

flow:
  - build-api
  - build-ui

Each phase is a named participant. Execution is sequential by default, so phase 2 only starts after phase 1 succeeds.

Parallel worktrees

Ralph:

git worktree add ../project-auth -b feature/auth
git worktree add ../project-api -b feature/api

cd ../project-auth
/ralph-loop:ralph-loop "Build auth" --max-iterations 30 &

cd ../project-api
/ralph-loop:ralph-loop "Build API" --max-iterations 30 &

wait

duckflux:

flow:
  - parallel:
      - as: auth
        type: exec
        run: cat PROMPT_AUTH.md | $AGENT
        cwd: ../project-auth
        onError: retry
        retry:
          max: 30
      - as: api
        type: exec
        run: cat PROMPT_API.md | $AGENT
        cwd: ../project-api
        onError: retry
        retry:
          max: 30

No &, no wait, no PID management. The runtime handles concurrency, and the trace shows both branches side by side.

Conditional continuation

Ralph:

cat PROMPT.md | $AGENT
if [ -f "output.json" ]; then
  cat PROMPT_PHASE2.md | $AGENT
fi

duckflux:

participants:
  phase1:
    type: exec
    run: cat PROMPT.md | $AGENT

  phase2:
    type: exec
    run: cat PROMPT_PHASE2.md | $AGENT

flow:
  - phase1
  - phase2:
      when: phase1.status == "success"

What you gain

Concern	Ralph Loop	duckflux
Retry logic	Hand-rolled bash	`onError: retry` + `retry.max`
Parallel execution	`&` + `wait` + PID tracking	`parallel:` with named branches
Error handling	`set -e` / `trap` / `if` chains	`onError: fail \
Execution trace	Terminal scrollback	Structured JSON trace with step-level detail
Composition	Copy-paste scripts	Named participants + nested workflows
Portability	Bash + your machine	Any duckflux-conforming runtime
Readability	Grows linearly with complexity	Declarative: complexity stays flat

Getting started

Install the runtime: {% raw %}

bun add -g @duckflux/runner

Write your workflow as a .flux.yaml file using the patterns above.
Run it:

duckflux run codegen-loop.flux.yaml

Inspect the trace to see exactly what happened at each step.

Tip: You don't have to migrate everything at once. Start with your most painful Ralph loop (the one with the most bash glue around it) and express it as a duckflux workflow. Keep your simpler loops as-is until you feel the benefit.

Final thoughts

Ralph Loops proved that iterative AI automation works. duckflux takes that insight and gives it structure. The philosophy stays the same (iteration over perfection, persistence wins), but you trade bash glue for a declarative spec that's reproducible, traceable, and composable.

The best prompt in the world still needs an orchestrator. That's what duckflux is for.

Check the duckflux docs for the full DSL reference, or jump straight to the spec.

duckflux : A Declarative Workflow DSL Born from the Multi-Agent Orchestration Gap

Gustavo Gondim — Wed, 11 Mar 2026 00:01:02 +0000

TL;DR: After months exploring multi-agent orchestration with OpenClaw and Lobster, I hit a wall: no existing tool offered simple declarative spec + runtime-agnostic execution + first-class control flow. So I designed duckflux, a minimal YAML-based workflow DSL with loops, conditionals, parallelism, and events built in. The spec is now at v0.7, the TypeScript runtime ships as a CLI (quack) and an embeddable library (@duckflux/core), with pluggable event hub backends (in-memory, NATS, Redis) and built-in execution tracing. Full docs at duckflux.openvibes.tech.

Previously, on this series
The gap that remained
What is duckflux
Alternatives considered
The spec at a glance
The TypeScript runtime
What's next

Previously, on this series

This article is the third in a series about building deterministic multi-agent development pipelines. If you're joining now, here's the short version.

In the first article, I documented two months of trial and error trying to build a code -> review -> test pipeline with autonomous AI agents. The core thesis: LLMs are unreliable routers, they forget steps, miscount iterations, skip transitions. Orchestration must be deterministic and implemented in code, not delegated to inference. After five failed attempts (Ralph Orchestrator, OpenClaw sub-agents, a custom event bus, skill-driven self-orchestration, and plugin hooks), I found Lobster, OpenClaw's built-in workflow engine. It was close, but lacked native loop support. I contributed a pull request adding sub-workflow steps with loops.

In the second article, I zoomed out. The problem wasn't just orchestration, it was multi-agents x multi-projects x multi-providers x multi-channels. I compiled a dataset of agent configuration formats across providers, proposed the Monoswarm pattern (a monorepo layout for managing agent swarms), and identified the still-missing piece: an orchestration layer that ties agent events to workflow transitions across projects.

Both articles ended with the same conclusion: we need a proper workflow DSL.

The gap that remained

Lobster was the closest thing to what I needed, but it was designed for linear pipelines with approval gates. My pull request added loops, but the deeper issues remained:

No conditional branching (if/then/else).
No parallel execution of multiple agents.
No event system for inter-agent coordination.
No typed expressions, conditions were shell commands returning exit codes.
Tied to OpenClaw's runtime, not portable to other environments.

I looked at the broader landscape:

Tool	Where it falls short
Argo Workflows	Turing-complete YAML disguised as config. A conditional loop requires template recursion, manual iteration counters, and string-interpolated type casting.
GitHub Actions	No conditional loops. Workarounds require unrolling or recursive reusable workflows.
Temporal / Inngest	Code-first (Go/TS/Python SDKs). The code IS the spec. No declarative layer.
Airflow / Prefect	DAGs are acyclic by definition, conditional loops are architecturally impossible.
n8n / Make	Visual-first, JSON-heavy specs. Loop constructs require JavaScript function nodes. Specs are unreadable as text.
Lobster	Linear pipelines with approval gates. No native loops, no parallelism, no conditionals.

The gap was clear: no existing tool combines a simple declarative spec + runtime-agnostic execution + first-class control flow (loops, conditionals, parallelism) + events.

So I built one.

What is duckflux

duckflux is a minimal, deterministic, runtime-agnostic DSL for orchestrating workflows through declarative YAML. The spec is at v0.7, with a complete TypeScript runtime and a documentation site at duckflux.openvibes.tech.

The design principles are deliberate:

Readable in 5 seconds -- any developer understands the flow by glancing at the YAML.
Minimal by default -- features are only added when absolutely necessary.
Convention over configuration -- sensible defaults everywhere.
Runtime-agnostic -- the DSL defines WHAT happens and in WHAT ORDER. The runtime decides HOW.
String by default -- every participant receives and returns strings unless a schema is explicitly defined, like stdin/stdout, the universal interface.
Reuse proven standards -- expressions use Google CEL (used in Kubernetes, Firebase, Envoy), schemas use JSON Schema, format is YAML.

The simplest possible workflow:

flow:
  - type: exec
    run: echo "Hello, duckflux!"

That's it. One flow, one step. No boilerplate, no mandatory fields beyond what's needed.

A more realistic example: an agentic coding pipeline where a planner breaks work into tasks, then a loop fetches each task, a coder implements it, and a reviewer checks it:

id: agentic-coding-pipeline
name: Agentic Coding Pipeline
version: "0.7"

defaults:
  timeout: 10m
  cwd: ./repo

inputs:
  goal:
    type: string
    required: true
    description: "High-level description of what needs to be built"
  taskQueueUrl:
    type: string
    required: true
  maxRounds:
    type: integer
    default: 3
    minimum: 1
    maximum: 10

participants:
  planner:
    type: exec
    run: >
      claude -p
      "Break the following goal into discrete coding tasks.
      Return a JSON array of {id, description} objects.
      Goal: " + workflow.inputs.goal
    timeout: 5m
    output:
      type: array
      items:
        type: object
        required: true

  fetchTask:
    type: http
    url: workflow.inputs.taskQueueUrl + "/next"
    method: GET
    headers:
      Accept: application/json

  coder:
    type: exec
    run: >
      claude -p
      "Implement the following task in the current repository.
      Task: " + fetchTask.output.description
    timeout: 15m
    onError: retry
    retry:
      max: 2
      backoff: 10s

  reviewer:
    type: exec
    run: >
      claude -p
      "Review the changes for the following task. Return a JSON
      object with 'approved' (boolean) and 'feedback' (string).
      Task: " + fetchTask.output.description
    timeout: 10m
    output:
      approved:
        type: boolean
        required: true
      feedback:
        type: string

flow:
  - planner

  - loop:
      max: workflow.inputs.maxRounds
      steps:
        - fetchTask
        - coder:
            input:
              task: fetchTask.output.description
        - reviewer:
            input:
              task: fetchTask.output.description

output:
  approved: reviewer.output.approved
  feedback: reviewer.output.feedback
  rounds: loop.iteration

Compare this to the same scenario in Argo Workflows (~40 lines of template recursion), GitHub Actions (~50+ lines with unrolled iterations), or Temporal (~35 lines of Go code that requires compilation and a server).

Alternatives considered

Before landing on a custom YAML format, I evaluated two other approaches:

Extending Argo Workflows. Argo's YAML is expressive, but its power came from 6+ years of incremental feature additions. A conditional loop in Argo requires template recursion, manual iteration counters, and string-interpolated type casting, 13+ lines for what should be 6. The complexity is the feature, not a bug, and that's the problem.

Mermaid as executable spec. Mermaid sequence diagrams already have loop, par, and alt constructs. The DX for reading and writing is excellent, and diagrams render natively in GitHub. However, extending Mermaid for real workflow concerns (retry policies, timeouts, error handling, typed variables) requires hacking Note blocks for config and $var for expressions, creating a custom parser as proprietary as a new YAML format, just disguised as something familiar.

Custom minimal YAML (chosen). A new format, intentionally constrained, inspired by Mermaid's visual clarity but with the extensibility and tooling ecosystem of YAML. The tradeoff: a new DSL to learn, but one designed to be readable in 5 seconds and writable in 5 minutes.

The spec at a glance

The full spec is at github.com/duckflux/spec, with complete documentation at duckflux.openvibes.tech. Here's a walkthrough of the key features.

Participants

Participants are the atomic unit of work. Each has a type that determines its behavior:

Type	Description
`exec`	Shell command
`http`	HTTP request
`mcp`	MCP server tool call
`workflow`	Sub-workflow (composition)
`emit`	Fire an event to the event hub

Participants can be defined in three ways: in a reusable participants block, as named inline steps (with as), or as anonymous inline steps (without a name at all):

# Reusable (in participants block)
participants:
  build:
    type: exec
    run: npm run build

flow:
  # Reference a reusable participant
  - build

  # Named inline (one-off, but addressable by name)
  - as: notify
    type: http
    url: https://hooks.slack.com/services/...
    method: POST

  # Anonymous inline (output accessible only via the I/O chain)
  - type: exec
    run: echo "done"

Implicit I/O chain

One of the most impactful features added since v0.2: the output of each step is automatically passed as input to the next step, forming a chain analogous to Unix pipes.

flow:
  - type: exec
    run: curl -s https://api.example.com/data
  - type: exec
    run: jq '.items[] | .name'
  - type: exec
    run: wc -l

Each step receives the previous step's output on stdin. No explicit input mapping needed for linear pipelines. When a participant also has an explicit input mapping, the runtime merges the chained value with the explicit mapping.

Control flow

Loops -- repeat until a CEL condition is true or N iterations:

- loop:
    until: reviewer.output.approved == true
    max: 3
    steps:
      - coder
      - reviewer

Parallel -- run steps concurrently:

- parallel:
    - as: lint
      type: exec
      run: npm run lint
    - as: test
      type: exec
      run: npm test

Conditionals -- branch based on CEL expressions:

- if:
    condition: tests.status == "success"
    then:
      - deploy
    else:
      - rollback

Guards -- skip a single step conditionally:

- deploy:
    when: reviewer.output.approved == true

Wait -- pause for an event, a timeout, or a polling condition:

# Wait for an external event
- wait:
    event: "approval.received"
    match: event.requestId == submitForApproval.output.id
    timeout: 24h

# Sleep
- wait:
    timeout: 30s

# Poll until a condition is true
- wait:
    until: now >= timestamp("2024-04-01T09:00:00Z")
    poll: 1m
    timeout: 48h

Set -- write values into a shared execution context without producing output:

- set:
    token: workflow.inputs.api_token
    region: env.AWS_REGION

- as: fetchData
  type: http
  url: "'https://api.example.com/data'"
  headers:
    Authorization: "'Bearer ' + execution.context.token"

set is transparent to the I/O chain: the chain passes through unchanged.

Exec input passing semantics

How input reaches an exec subprocess depends on its type:

Map input -> environment variables. When the resolved input is an object, each key-value pair is injected as an environment variable. The run command references them via shell interpolation (${KEY}).
String input -> stdin. When the resolved input is a string, it's passed via stdin, enabling Unix pipe-style chaining.

# Map input: keys become environment variables
- as: deploy
  type: exec
  run: ./deploy.sh --branch="${BRANCH}" --env="${TARGET_ENV}"
  input:
    BRANCH: workflow.inputs.branch
    TARGET_ENV: execution.context.environment

# String input: passed via stdin
flow:
  - type: exec
    run: echo '{"name": "World"}'
  - type: exec
    run: jq -r '.name'

Expressions with Google CEL

All conditions, input mappings, and output mappings use Google CEL. CEL is non-Turing-complete, sandboxed (no I/O, no side effects), type-checked at parse time, and has a familiar C/JS/Python-like syntax:

- if:
    condition: reviewer.output.approved == false && loop.iteration < 3

The runtime ships with the full CEL standard library: has, size, matches, contains, startsWith, endsWith, timestamp, duration, filter, map, exists, all, and more.

CEL was chosen over JavaScript eval (security surface, runtime dependency), custom mini-DSLs (implementation burden), and JSONPath/JMESPath (poor logic support).

Variable namespaces

Since v0.3, input and output are participant-scoped: inside a participant, input means "my input" and output means "my output". Workflow-level I/O lives under workflow.inputs.* and workflow.output.

Key runtime variables:

Namespace	Description
`workflow.inputs.*`	Workflow input parameters
`workflow.output`	Workflow final result
`<step>.output`	A step's output (auto-parsed if JSON)
`<step>.status`	`success`, `failure`, or `skipped`
`execution.context.*`	Shared read/write scratchpad (set via `set`)
`env.*`	Environment variables (read-only)
`loop.iteration`	Current loop iteration index
`input`	Current participant's resolved input

Events

emit publishes events, wait subscribes. Events propagate both internally (within the workflow) and externally via the event hub:

- as: notifyProgress
  type: emit
  event: "task.progress"
  payload:
    taskId: workflow.inputs.taskId
    status: coder.output.status
  ack: true  # block until delivery confirmed

Error handling

Configurable per participant, per flow step invocation, or globally via defaults, with four strategies:

# Global defaults
defaults:
  onError: retry
  retry:
    max: 2
    backoff: 5s

participants:
  coder:
    type: exec
    run: ./code.sh
    onError: retry       # retry with exponential backoff
    retry:
      max: 3
      backoff: 2s
      factor: 2          # exponential: 2s, 4s, 8s

  deploy:
    type: exec
    run: ./deploy.sh
    onError: notify      # redirect to a fallback participant

Error strategy resolution chain: flow override > participant > defaults > fail.

Inputs and outputs

Everything is string by default, like stdin/stdout. Schema is opt-in via JSON Schema (written in YAML):

inputs:
  repoUrl:
    type: string
    format: uri
    required: true
  branch:
    type: string
    default: "main"

output:
  approved: reviewer.output.approved
  score: reviewer.output.score

Input mapping supports flow-level overrides that merge with the participant's base input (instead of replacing it), so you never have to repeat shared configuration on every call:

participants:
  fetch_page:
    type: exec
    input:
      NOTION_TOKEN: execution.context.token   # base input, always present
    run: curl -sS "https://api.notion.com/v1/pages/$(cat)" -H "Authorization: Bearer ${NOTION_TOKEN}"

flow:
  - fetch_page:
      input:
        PAGE_ID: workflow.inputs.story_id    # merged with base input

JSON Schema for editor support

A JSON Schema ships with the spec, giving you autocomplete and validation in VS Code for free:

{
  "yaml.schemas": {
    "./duckflux.schema.json": "*.duck.yaml"
  }
}

Workflow files use the .duck.yaml convention (e.g., deploy.duck.yaml, review-loop.duck.yaml).

The TypeScript runtime

The original plan was a Go runner, chosen for its native CEL implementation (cel-go) and single-binary distribution. After prototyping, I switched to TypeScript: Go's plugin model can't support extensibility via npm packages, which is the core extensibility primitive for duckflux plugins. The runtime targets Bun and ships as both a CLI tool and an embeddable library.

Packages

Package	Description
`duckflux`	CLI tool (`quack run`, `quack lint`, `quack validate`)
`@duckflux/core`	Engine, parser, CEL evaluator, event hub (in-memory)
`@duckflux/hub-nats`	Optional NATS JetStream event hub backend
`@duckflux/hub-redis`	Optional Redis Streams event hub backend

Installation

# Universal installer (auto-detects apt, brew, bun, npm; falls back to standalone binary)
curl -fsSL https://duckflux.github.io/apt-repo/install.sh | bash

# Or via Homebrew
brew install duckflux/tap/quack

# Or via npm/bun
npm install -g duckflux   # or: bun add -g duckflux

# Or run without installing
npx duckflux run workflow.yaml

Standalone binaries (no Node.js or Bun required) are also available for macOS, Linux, and Windows on the GitHub Releases page.

CLI usage

# Run a workflow
quack run deploy.duck.yaml --input branch=main --input env=staging

# Run from stdin
echo '{"branch": "main"}' | quack run deploy.duck.yaml

# Validate (schema + semantics)
quack lint deploy.duck.yaml

# Validate with inputs
quack validate deploy.duck.yaml --input branch=main

# Start the web server UI for visual workflow observation
quack server --trace-dir ./traces

# Version
quack version

Library usage

Drop @duckflux/core into any TypeScript project and run workflows in-process:

import { executeWorkflow } from "@duckflux/core/engine";
import { parseWorkflowFile } from "@duckflux/core/parser";

const workflow = await parseWorkflowFile("./pipeline.yaml");
const result = await executeWorkflow(workflow, { env: "production" });

console.log(result.output);  // structured output
console.log(result.steps);   // per-step results, timings, errors

No subprocess, no serialization overhead, full TypeScript types.

Event hub backends

Async workflows that emit and wait on events work out of the box with the built-in in-memory hub. Scale up to NATS or Redis when you need cross-process delivery:

Backend	Package	Cross-process	Use case
In-memory	built-in	No	Development, testing, single-process
NATS JetStream	`@duckflux/hub-nats`	Yes	Distributed, multi-process
Redis Streams	`@duckflux/hub-redis`	Yes	Distributed with persistence

quack run workflow.yaml --event-backend nats --nats-url nats://localhost:4222
quack run workflow.yaml --event-backend redis --redis-addr localhost:6379

Execution tracing

Every run can produce a structured trace, written incrementally as each step completes. Choose the format that fits your workflow:

# Trace to JSON (default)
quack run workflow.yaml --trace-dir ./traces

# Trace to SQLite (queryable with any SQL client)
quack run workflow.yaml --trace-dir ./traces --trace-format sqlite

Each trace captures every step (participants and control-flow constructs alike) with timing, inputs, outputs, errors, and retry counts.

Spec v0.7 feature coverage

The runtime implements the complete duckflux v0.7 spec:

Participant types: exec, http, emit, workflow (+ mcp stub)
Control flow: loop, parallel, if/else, when guards, set, wait
I/O chaining: step output flows automatically as input to the next step
Expressions: full CEL standard library (has, size, matches, timestamp, duration, and more)
Error strategies: fail, skip, retry (exponential backoff), redirect to fallback participant
Input semantics: map input -> env vars, string input -> stdin
Input merge: flow override merges with participant base input instead of replacing it
Timeouts: per-step, per-participant, or global via defaults
Output schema validation: validate step and workflow output against JSON Schema definitions
Circular sub-workflow detection: prevents infinite recursion in nested workflows

What's next

Tooling and ecosystem

The documentation site at duckflux.openvibes.tech covers everything from getting started to the full library API. A browser-based visual editor for building workflows is planned.

On the roadmap

Features deliberately deferred from v0.7, to be prioritized based on real-world demand:

DAG mode -- explicit step dependencies (depends: [stepA, stepB]) for complex graphs
Durability / resume -- workflow survives a runtime crash and resumes from where it stopped
Matrix / fan-out -- combinatorial execution (e.g., tests across 3 Node versions x 2 OS)
Persistent mode -- workflow running as a daemon, reacting to events continuously
Caching between runs -- reuse outputs from idempotent steps across executions

The thesis, revisited

The journey from Protoagent to Lobster to duckflux converged on one insight: LLMs should do what they're good at (writing code, analyzing code, making decisions), and code should do what code is good at (sequencing, counting, routing, retrying).

duckflux is the code side of that equation. A deterministic orchestration layer where the flow is explicit, the execution is predictable, and the spec is readable by both humans and machines.

Links:

duckflux docs -- Full documentation site
duckflux spec -- DSL specification (v0.7)
duckflux on npm -- TypeScript runtime
Article 1 -- Building a deterministic pipeline with Lobster
Article 2 -- Multi-agents x multi-projects x multi-providers x multi-channels

Multi-agents on multi-projects with multi-providers via multi-channels

Gustavo Gondim — Sat, 28 Feb 2026 20:58:37 +0000

TL;DR

This is a “state of multi-agentic-driven development” article. It is a personal consolidation of my last month's learnings, thoughts and vision.

During the research for this article, I also compiled a dataset of agent configuration formats and features across providers, and I tracked the emergence of common patterns and standards in the ecosystem.

In the end, I made a multi-agent development repoitory pattern called Monoswarm, built to work with many agent formats, many providers, and many projects at the same time. It is a monorepo with a shared .ai submodule for agent definitions and symlinks to provider-specific config files in each project. The orchestration logic is still a missing piece, but it would be an OpenClaw plugin that routes events between agents and workflows without relying on LLM inference.

If my opinion offends or contradicts someone, remember everyone is still learning about this industrial revolution everyday and there is no certainty about the outcome.

Obviously, Claude helped me to write this article, but I needed to rewrite myself almost entirely as it didn't fully capture my perspective.

TL;DR
Why multi-agents
Why multi-projects
Why multi-providers
Why multi-channels
Exploring deterministic agent orchestration
OpenClaw pitfalls
Lack of standards + nightly revolutions
Comparison of provider capabilities
Old solutions that work
The Monoswarm pattern
Still a missing piece: an OpenClaw orchestration plugin
Non-addressed theme: Where is multi-instance?

Why multi-agents

The default mental model for AI coding assistants is a single agent answering questions in a chat window. This works for isolated tasks, like explaining a function, generating a snippet, or fixing a bug. But as soon as you try to build a sustained development workflow, a single agent becomes a bottleneck for two distinct reasons:

Role separation. Different tasks demand different system prompts, personas, and domain knowledge. Also, each agent benefits from its own identity - a name, a behavioral profile, constraints on how it should respond, and explicit boundaries on what it should not do.
Skill/tool set. AI coding agents operate through tool calls: file operations, shell commands, API requests, browser actions. A code review agent needs read-only access to a repository and the ability to post review comments. A deployment agent needs access to CI/CD pipelines and infrastructure tooling. A testing agent needs to run test suites and parse their output. Granting all tools to a single agent increases the attack surface and the likelihood of unintended side effects. It also forces the LLM to navigate a larger tool catalog on every invocation, which dilutes attention and wastes context window tokens on irrelevant tool definitions.

Why multi-projects

Parallel clones. Having a well-defined team of agents - a programmer, a reviewer, a tester, etc. - solves the role separation problem. But a single team can only work on one project at a time. If you manage multiple repositories, you need the ability to spawn clones of the same agent configuration across different projects simultaneously. A "reviewer" agent for project A and a "reviewer" agent for project B should share the same behavioral template but operate in completely independent sessions, with no shared context or state between them.

This is a parallel workforce problem. The agent definitions (system prompts, tool permissions, model selection) act as blueprints. Each project instantiates its own copy of the team, running in its own workspace with its own git branch, session history, and checkpointing. In platforms like OpenClaw, this maps to isolated session keys per project, but very thighted to the channel/routing layer. In Ralph Orchestrator, each loop operates on its own workspace directory. The multiplier is straightforward:

  N agent roles × M projects = N×M concurrent sessions

Project-level instructions. Beyond cloning, each project also carries its own custom instructions. Project-level directives, like coding standards, architectural guidelines, and definition of done, need to be injected into every agent that works on that project, layered on top of the agent's own role-specific instructions. The orchestration system must support this two-dimensional configuration:

  agent identity (role) × project context (goals and constraints)

Why multi-providers

Model diversity and volatility. The AI model landscape shifts weekly. A model that leads benchmarks today may be surpassed tomorrow by a release from a different provider. Locking your entire agent infrastructure to a single provider means you cannot capitalize on these shifts without re-architecting.
Cost optimization. Provider diversity also has a direct cost dimension. Different models have vastly different pricing per token, and not every task requires the most expensive model. An agent that plans or reviews work benefits from a high-reasoning model like Claude Opus or GPT-5.2-Codex. A worker agent executing straightforward file edits or running shell commands can use a faster, cheaper model (Haiku, GPT-5-mini, or a local open-weight model using Ollama) without meaningful quality loss.
Feature specialization. Finally, providers differ in features beyond raw model quality. Some offer native tool use with parallel execution, others have larger context windows, others support image input or structured output with JSON schema validation. Some have better streaming performance, others have more generous rate limits. A multi-provider setup lets you match each agent to the provider whose feature set best fits its role, rather than accepting the lowest common denominator of a single vendor.

ℹ️ Projects like OpenRouter and OpenCode attempt to abstract this fragmentation by providing a unified API layer across multiple providers. They solve the interface problem: you get a single endpoint that speaks the same protocol regardless of the underlying model. But they are wrappers, not consolidators. You still manage separate API keys, separate billing accounts, separate rate limits, and separate subscription tiers across providers. The operational complexity of multi-provider doesn't disappear, it just moves one layer down.

Why multi-channels

AI coding agents today are mostly confined to IDE integrations and terminal CLIs. You open vscode, or run Claude Code in a shell, and interact with the agent in that context. This works when you're sitting at your workstation, but it breaks as soon as you step away from the keyboard.

On daily work, development happens across a spectrum of contexts, each one in a different communication channel (Slack, email, messaging apps, etc.). If your agents only listen to one channel in your machine, you miss the opportunity to capture these moments as actionable inputs. OpenClaw's architecture was built around this idea from the start, treating channels as interchangeable transport layers. The same agent, with the same identity and memory, can receive tasks from a Telegram message, a Discord command, a REST API call, or a CLI invocation.

This matters for two practical reasons:

First, task acquisition. Not every task starts at your desk. You might spot a bug while reviewing a PR on your phone, or get a production alert on Slack while commuting. Multi-channel support turns these moments into actionable inputs. The task enters the pipeline without waiting for you to open a terminal.
Second, background and remote work. Agents don't need to run on your local machine. An OpenClaw Gateway running on a home server, a VPS, or any always-on host can execute agent sessions independently of your workstation. You close your laptop, and the agents keep working. This decouples agent execution from your personal computing environment entirely. GitHub's Copilot coding agent already demonstrated this model: you assign an issue to @copilot and it works autonomously in a GitHub Actions runner. The difference with a multi-channel setup is that you retain interactive access to these remote agents through messaging platforms, turning what would be a fire-and-forget job into a supervised but location-independent workflow.

Exploring deterministic agent orchestration

Once you accept the multi-agent premise, the next question is: who decides what runs when? There are two schools:

let the LLM orchestrate (non-deterministic), or
let code orchestrate (deterministic).

The distinction matters because LLMs are unreliable routers. They forget steps (specially after context compaction), miscount iterations, and silently skip transitions. Relying on LLMs is relying on inference.

Work needs repeatable, auditable processes: determinism makes outcomes predictable and debuggable. Organizations already enforce this by layering procedures and state machines on top of inherently non-deterministic human behavior. AI agents require the same treatment. Leaving sequencing and routing decisions to LLM inference introduces fragile, non-repeatable behavior, just like relying on human memory alone, and that's not a good look for "AI improving human workflows".

For reliable, predictable, and inspectable multi-agent workflows, orchestration must be deterministic and implemented in code or a typed runtime, not delegated to the model.

In that world, we have some options for deterministic orchestration:

Lobster is OpenClaw's built-in workflow engine. It takes the deterministic path: YAML-defined pipelines where steps run sequentially, data flows as JSON between them, and approval gates halt execution until explicitly confirmed. The LLM never decides what happens next, Lobster does. Each step can invoke any OpenClaw tool, including agent-send for inter-agent messaging and llm-task for structured LLM calls with schema validation. The result is a system where LLMs do what they're good at (generating and analyzing code) while a typed runtime handles the plumbing (sequencing, looping, conditional branching). However, Lobster was originally designed for single-agent pipelines. It lacked native loop support for sub-workflows, a gap that I unsuccessfully tried to fil with a Lobster pull request.
Ralph Orchestrator takes a different approach. It implements the "Ralph Wiggum technique" (autonomous agent loops with hard context resets between iteration) but it augments Ralph with a "hat-based orchestration framework", which means agents emit structured events (e.g., [event:code_complete], [event:review_rejected]) that trigger transitions to other listening agents. The routing logic is still non-deterministic, as it relies on the LLM to emit the right event at the right time, but at least the decision of "what happens next" is externalized from the agent's system prompt and implemented in code. This is a step in the right direction, but it still leaves a lot of room for error (what if the agent forgets to emit an event? what if it emits the wrong one? what if it emits multiple events?).
OpenProse is a markdown-first orchestration language. It lets you define agents, spawn parallel sessions, and merge results, all in .prose files with a declarative syntax. It is the most expressive option for multi-agent workflows and this expressiveness comes at a cost: OpenProse programs are interpreted by the LLM itself, which means flow control is ultimately non-deterministic. The LLM reads the .prose spec and simulates execution, which works until it doesn't. For workflows where predictability matters more than flexibility, OpenProse is better suited as a planning and preparation layer (define agents, gather context) that hands off to a deterministic engine like Lobster for the actual execution.

OpenClaw pitfalls

Despite being the most complete open-source agent platform available (multi-agent, multi-channel, multi-provider, with a plugin ecosystem and 150K+ GitHub stars), OpenClaw carries significant baggage that makes adoption non-trivial.

Too much context. OpenClaw's architecture revolves around concepts like "souls" (persistent agent identity files), memory compaction, personality evolution, and self-improvement loops. For developers who just want a coding agent pipeline, this is cognitive overhead. Take a look at what OpenClaw injects into the system prompt of every agent session. Context is a precious resource and OpenClaw's assumes a rigid and expensive structure in a time we don't know nothing on how to optimize it in development workflows. There are some recent papers that evidences this absence of certainty:
Opinionated structure. OpenClaw imposes a specific organizational model: agents with isolated workspaces, skills as installable packages, tools with allowlists and permission scoping, a Gateway daemon as the central runtime. This structure makes sense for OpenClaw's core use case (a personal AI assistant connected to messaging platforms), but it becomes friction when you try to use it as pure orchestration infrastructure. You can't easily bring your own agent definition format, your own project layout, or your own tool integration pattern. Everything must conform to OpenClaw's conventions in a time when there are no established conventions in the first place.

But it is the only viable option.

IMHO, this is the uncomfortable reality. As of early 2026, no other open-source project combines multi-agent support, multi-channel transport, a roughly-made workflow engine (Lobster), and a plugin architecture with an active community.

Alternatives solve subsets of the problem:

Ralph Orchestrator seems to be good at agent orchestration, but the event/hat-based model is still non-deterministic at all, as it relies on each agent to emit the right event at the right time. Also, it lacks the bridge between the execution layer and the communication layer (you run workflos using the command line, so you don't have the option to trigger them from an external channel).
Gas Town Hall is a hard code solution and too lyrical for deterministic orchestration, without the multi-channel capability.
Agor is beautiful and one of my favorite Agentic-development projects, but is is more a graphical interface for managing multiple agents than an orchestration engine. It lacks a deterministic workflow layer - the only way to coordinate agents is through LLM inference.

None of them offer the full stack. If you need the complete picture - agents, channels, orchestration, tools, memory - OpenClaw is the only game in town, pitfalls included.

Lack of standards + nightly revolutions

The AI coding agent space has no equivalent of HTTP, SQL, or even REST in terms of common protocols/patterns. Each provider ships its own agent protocol, its own tool calling format, its own configuration schema, and its own orchestration primitives. Claude Code uses CLAUDE.md and markdown-based project instructions. GitHub Copilot uses .github/copilot-instructions.md and AGENTS.md Cursor uses .cursor/rules. Windsurf, Cline, Augment, ...each has its own convention. There is no shared specification for how an agent should discover project context, what format its instructions should follow, or how it should report results.

Some open initiatives are trying to close this gap:

AGENTS.md is gaining traction as a de facto standard for project-level agent instructions - a single file that any compliant agent can read to understand project conventions, regardless of provider.
The Model Context Protocol (MCP) standardizes how agents connect to external tools and data sources through a server-client architecture, giving agents a portable way to access databases, APIs, and file systems without provider-specific integrations.
Agent Skills proposes a shared format for agent capabilities that can be installed and discovered across platforms.

But these are early-stage efforts, and adoption is fragmented. MCP has the most momentum, backed by Anthropic and adopted by multiple editors and platforms. AGENTS.md is simple enough to gain organic adoption but lacks a formal spec. Agent Skills is still finding its audience.

Meanwhile, the ground shifts constantly. A new model release, a new agent framework, a new orchestration pattern, sometimes multiple in the same week. Any architecture you build today must account for the fact that the ecosystem's conventions will look different in three months.

⚠️ Betting on a single provider's format is a guaranteed migration headache. Betting on emerging standards is a calculated risk, but at least the migration path is shared with the rest of the community.

Comparison of provider capabilities

I have been tracking the agent configuration formats and features (agents, instructions, skills, prompts, tools, etc.) of every major provider in a Notion workspace.

While this is a still incomplete and rapidly evolving dataset, it makes clear that there is a primitive common alignment across providers emerging.

AI Agents Dataset

Old solutions that work

While the ecosystem chases new standards and frameworks, the most reliable tools for managing multi-agent, multi-project configurations are decades-old Unix and Git primitives. This is how I'm planning to survive the next few months of rapid change:

Git submodules for agent workforce. Your agent definitions (system prompts, skills, tool configurations, behavioral profiles) are just files. They belong in a repository. When multiple projects need the same agent team, a git submodule lets you share a single source of truth for agent configurations across all of them. Update the submodule, pull in every project, and every agent team is in sync. No package registry, no plugin marketplace, no sync daemon. Just Git.
Symlinks to deduplicate and universalize provider files. Different AI coding tools expect their configuration in different paths and formats: .cursor/rules, AGENTS.md, .github/copilot-instructions.md, .clinerules, and so on. The content is often largely the same - project conventions, coding standards, architectural guidelines - but each provider demands its own file. Symlinks let you maintain a single canonical source and point every provider-specific path to it. One file to edit, N providers served. Do you want to know where did I found this solution? In the OpenClaw's monorepo itself.
Dynamic file includes via references. Some agent instruction formats support file references or includes, loading content from other files at runtime. This enables composable instructions: a base set of project conventions shared across all agents, with role-specific overrides layered on top. Instead of duplicating instructions across agent configs, you reference a shared file and keep the delta minimal.
- Claude supports it with @file references.
- GitHub Copilot supports it with markdown links.
- OpenClaw configuration files can be structured to reference multiple files, so this could be useful for "importing" agent definitions from a shared location.
Monorepos for related projects and shared documentation. When your projects share agents, libraries, or infrastructure, a monorepo eliminates the coordination overhead of keeping multiple repositories in sync. Agent configurations, shared skills, project-specific overrides, and orchestration workflows all live in one tree. Cross-project references are just relative paths. This combined to the previous points creates a powerful synergy: a shared .ai submodule for the workforce, symlinks for provider configs, and project-level instructions all coexisting in a single monorepo structure.

The Monoswarm pattern

Combining the primitives from the previous section into a coherent structure yields what I call the Monoswarm pattern: a monorepo layout designed to host and manage a swarm of AI coding agents across multiple projects.

The core structure:

monoswarm/
├── .ai/
│   ├── common/                 # git submodule - shared AI definitions
│   └── ...                     # project-level - local AI overrides
├── .claude
│   └── CLAUDE.md               → ../.ai/common/instructions/always-on.md (symlink)
├── .github/
│   ├── copilot-instructions.md → ../.ai/common/instructions/always-on.md (symlink)
│   └── custom.instruction.md   → ../.ai/instructions/project-specific.md (symlink)
├── packages/                   # project source code, splitted in repositories
├── docs/                       # project-level documentation
└── AGENTS.md                   → ../.ai/common/instructions/always-on.md (symlink)

The .ai/common directory is a git submodule: a standalone repository containing every agent definition, skill, tool, prompts and other resources. It is the single source of truth for the workforce. Every project in the monorepo mounts it, and every developer (or CI runner) that clones the monorepo gets the same agent team. Updating agent behavior across all projects is a submodule bump.
The .ai/ directory also contains project-specific overrides: instructions or definitions that only apply to a subset of projects. This is where you put project-level and agent-specific context that needs to be injected into agents working on that project, without affecting the global workforce.
Symlinks bridge the gap between the shared .ai configs and each provider's expected file paths. Where each provider expects its own configuration file, you point it to the shared source or the project-specific override as needed. This way, you maintain a single canonical set of instructions and definitions, but every provider gets what it needs without duplication.
The docs/ directory holds cross-cutting documentation that humans and agents can reference: architecture decision records, API contracts, shared conventions. This is context that doesn't belong to any single project but is relevant to agents working across the monorepo.

💡 The next step? Building a CLI tool to automate the setup of this structure, manage submodule updates, and help building symlinks for existing provider configs.

Still a missing piece: an OpenClaw orchestration plugin

The Monoswarm pattern solves the configuration and file structure problem. OpenClaw provides the agent runtime, channels, and tool ecosystem. Lobster handles deterministic workflow execution. But there is still a gap between them: there is no orchestration layer that ties agent events to workflow transitions across projects.

Using the power of internal hooks. OpenClaw's plugin architecture exposes lifecycle hooks: TypeScript handlers that fire on events like message_sent, tool_result_persist, session_start, and others. These hooks are the natural extension point. An orchestration plugin can intercept structured events emitted by agents (e.g., [event:code_complete], [event:review_rejected]) and route them to the appropriate next step without the LLM making that decision. The agent writes code and emits a completion event; the plugin catches it and triggers the review agent. The review agent rejects; the plugin routes back to the programmer with the feedback. The LLM never touches the routing logic.
Building an event hub. The plugin acts as a lightweight event bus within the OpenClaw Gateway. Agents publish events, the hub matches them against registered workflow rules, and dispatches the corresponding actions (spawning sessions, sending messages to other agents, triggering Lobster pipelines), just like the hat-based orchestration framework proposed by Ralph Orchestrator. Event schemas are defined per project, so code_complete in project A can trigger a different workflow than in project B. The hub maintains a registry of active pipelines and their current state, enabling pause, resume, and inspection.
Integrating DAGs for complex workflows. Simple linear pipelines (code → review → test) are a starting point, but real development workflows branch. A review might pass on the first attempt or require multiple iterations. A test failure might route back to the programmer or escalate to a human. These are directed acyclic graphs, not sequences. The orchestration plugin needs to support conditional transitions, fan-out (parallel agents working on different aspects), fan-in (merging results before proceeding), and iteration caps. All defined declaratively, all executed deterministically.
Inserting human-in-the-loop gateways. Not every transition should be automatic. Deploying to staging, merging to main, approving a security-sensitive change - these require human judgment. The plugin should support approval gates at any point in the DAG, exposed through OpenClaw's channel system. This is Lobster's approval mechanism elevated to the orchestration level, operating across agents and projects rather than within a single pipeline.

💡 The next step? Building a prototype of this plugin, as I proposed in my last article.

Non-addressed theme: Where is multi-instance?

This article covers multi-agents, multi-projects, multi-providers, and multi-channels. There is a fifth dimension that deserves mention but sits outside the scope of this discussion: multi-instance.

Most developers operate in at least two contexts: personal and work. Each project have their own repositories, their own API keys, their own cost budgets. These are fundamentally different trust boundaries with different data isolation needs.

OpenClaw already supports this through multiple Gateway instances running on the same host with isolated profiles. A personal Gateway and a work Gateway can coexist on the same machine - or run on separate hosts entirely - with zero shared state between them.

Although it is not the focus of this article, the proposed submodule and symlink patterns could also be used for extending/reusing agent definitions across instances. A personal Gateway could mount the same .ai/common submodule as the work Gateway, if the agent definitions are generic enough and not too sensitive to be shared.

How I Built a Deterministic Multi-Agent Dev Pipeline Inside OpenClaw (and Contributed a Missing Piece to Lobster)

Gustavo Gondim — Mon, 23 Feb 2026 01:40:51 +0000

TL;DR: I needed a code → review → test pipeline with autonomous AI agents, where the orchestration is deterministic (no LLM deciding the flow). After two months exploring Copilot agent sessions, building my own wrapper (Protoagent), evaluating Ralph Orchestrator, and diving deep into OpenClaw's internals, I found that Lobster (OpenClaw's workflow engine) was the right foundation — except it lacked loops. So I contributed sub-workflow steps with loop support to Lobster, enabling fully deterministic multi-agent pipelines where LLMs do creative work and YAML workflows handle the plumbing. GitHub Copilot coding agent wrote 100% of the implementation.

The Backstory: Two Months of Chasing Autonomous Dev Agents
The Problem
Attempt 1: Ralph Orchestrator
Attempt 2: OpenClaw Sub-Agents
Attempt 3: The Event Bus Architecture (Overengineered)
The Breakthrough: Reading the Docs More Carefully
Attempt 4: Skill-Driven Self-Orchestration
Attempt 5: Plugin Hooks as an Event Bus
The Solution: Lobster + Sub-Lobsters
The Architecture
What I Learned
Current Status
How This Was Built

The Backstory: Two Months of Chasing Autonomous Dev Agents

This didn't start last weekend. It started two months ago when GitHub shipped the Copilot coding agent — the ability to assign a GitHub issue to @copilot and have it work autonomously in a GitHub Actions environment, pushing commits to a draft PR. The Agent Sessions view in VS Code gave you a mission control for all your agents, local or cloud.

That planted the seed: if a cloud agent can work on one issue autonomously, what if you could chain multiple specialized agents into a pipeline? Programmer → reviewer → tester, all running in the background, all pushing to PRs.

Building Protoagent

The first thing I built was Protoagent — a multi-channel AI agent wrapper in TypeScript/Bun that bridges Claude SDK and GitHub Copilot CLI to Telegram and REST API. The idea was to control AI agents from my phone, using my own subscriptions, with no vendor lock-in. It supported multi-provider switching, voice messages via Whisper, session management, crash recovery, and a REST API for Siri/Apple Watch integration.

Protoagent solved the "talk to an agent from anywhere" problem, but not the orchestration problem. It was still one agent, one session, one task at a time. I needed the pipeline.

Discovering Ralph and OpenClaw

Around the same time, I found Ralph Orchestrator — an elegant pattern for autonomous agent loops with hard context resets. And then OpenClaw — which turned out to be a much more complete version of what I was trying to build with Protoagent: multi-channel, multi-agent, with a full tool ecosystem, skills marketplace, and a Gateway architecture.

OpenClaw made Protoagent redundant. But none of these tools solved the specific problem I was after.

The Problem

I wanted autonomous AI agents working as a dev team: a programmer, a reviewer, and a tester, running in parallel across multiple projects. The pipeline: code → review (max 3 iterations) → test → done. No human in the loop unless something breaks.

The requirements were clear:

Deterministic orchestration — a state machine controls flow, not an LLM deciding what to do next
Parallel execution — 4 projects × 3 roles = up to 12 concurrent agent sessions
Event-driven coordination — agents finish work and the next step triggers automatically
Full agent capabilities — each agent gets its own tools, memory, identity, and workspace

I spent a full day exploring options. This is the journey.

Attempt 1: Ralph Orchestrator

Ralph Orchestrator implements the "Ralph Wiggum technique" — an elegant pattern where you trade throughput for correctness by doing hard context resets between iterations. The agent has no memory except a session file (goal, plan, status, log), and each iteration starts fresh with only that file as context.

Ralph is solid, and it does support multiple parallel loops with Telegram routing (reply-to, @loop-id prefix). But for my use case it fell short:

Event detection is opaque. Ralph expects agents to emit events (like human.interact for blocking questions), but it's unclear how to define custom events — say, code_complete or review_rejected — that would trigger transitions between different loops. The orchestration between agents (programmer finishes → reviewer starts) would require inventing the event emission and routing mechanism myself.
Limited channel connectivity. Ralph has basic Telegram integration for human-in-the-loop, but it's not a multi-platform messaging gateway. I needed agents reachable from Telegram, WhatsApp, Discord, and potentially webhooks from CI systems.
No tool ecosystem. Each agent in my pipeline needs different tools — the programmer needs code execution and write access, the reviewer needs read-only access, the tester needs test runners. Ralph doesn't have a plugin/skill/MCP management layer; you'd hardcode tool access per loop.
Agents aren't fully customizable. No isolated workspaces, no per-agent identity or personality, no per-agent model selection (e.g., Opus for the programmer, Sonnet for the reviewer to save costs).

Ralph solved the "how to make one agent iterate reliably with hard context resets" problem beautifully. The session file pattern (goal, plan, status, log) is elegant. But I needed inter-agent coordination with event-driven transitions, not better intra-agent loops.

Attempt 2: OpenClaw Sub-Agents

OpenClaw is the open-source AI agent platform (150K+ GitHub stars) that connects to messaging platforms and runs locally with full tool access. It already had multi-agent support, so the obvious question was: can I use OpenClaw's built-in sessions_spawn to create my pipeline?

Short answer: no. Here's why.

sessions_spawn creates child agents within a parent session. The parent is an LLM that decides when to spawn children. This means:

Non-deterministic flow control. The LLM decides when the reviewer runs, when to retry, when to give up. That's exactly what I wanted to avoid.
Auto-generated session IDs. Sub-agent sessions get keys like agent:<agentId>:subagent:<uuid>. I can't address them by project name.
Spawn depth limits. maxSpawnDepth defaults to 1, max 2. An orchestrator pattern needs depth 2, and sub-agents at depth 2 can't spawn further children.
Concurrency ceiling. maxConcurrent: 8 globally. With 4 projects × 3 roles, I'd hit the limit immediately.

The sub-agent model is designed for "main agent delegates subtask to helper" scenarios, not for peer-to-peer agent coordination with deterministic state machines.

Attempt 3: The Event Bus Architecture (Overengineered)

At this point I started sketching a custom architecture:

[Telegram] → [OpenClaw Gateway] ← WebSocket ← [External Orchestrator]
                    │                                    │
              [Agent Workspaces]                   State Machine
              - programmer/                        Redis Streams
              - reviewer/                          Worker Pool
              - tester/

The idea: use OpenClaw purely as I/O (messaging + agent execution), and build an external event bus with Redis Streams or NATS for routing, a state machine engine per project, and a worker spawner with pool control.

It would work. It would also be a massive amount of infrastructure for what should be a simple pipeline. I was reinventing half of what OpenClaw already does.

The Breakthrough: Reading the Docs More Carefully

Three OpenClaw features changed everything when I actually found them:

1. `agentToAgent` — Native Peer Messaging

Buried in the multi-agent docs:

{
  "tools": {
    "agentToAgent": {
      "enabled": true,
      "allow": ["programmer", "reviewer", "tester"]
    }
  }
}

When enabled, agents can send messages directly to other agents. Not sub-agents, not spawned children — peer agents with their own workspaces and identities.

2. `sessions_send` — Addressable Sessions

sessions_send(sessionKey, message, timeoutSeconds?)

An agent can send a message to any session key. Fire-and-forget with timeoutSeconds: 0, or synchronous (wait for the response). Combined with OpenClaw's session key convention (agent:<agentId>:<key>), this means:

agent:programmer:project-a
agent:reviewer:project-a
agent:tester:project-b

The session key is the address. Agent + project as coordinates.

3. Webhooks with Session Routing

curl -X POST http://127.0.0.1:18789/hooks/agent \
  -H 'Authorization: Bearer SECRET' \
  -d '{
    "message": "Implement JWT auth",
    "agentId": "programmer",
    "sessionKey": "hook:project-a:programmer",
    "deliver": false
  }'

External triggers that route to specific agents and sessions. The deliver: false flag keeps everything internal — no Telegram notification until you explicitly want one.

Attempt 4: Skill-Driven Self-Orchestration

With these primitives, I could have each agent carry a "pipeline skill" that tells it to use sessions_send to pass the baton:

# Pipeline Skill
When you finish coding, call sessions_send to notify the reviewer.
When you finish reviewing, call sessions_send to notify the tester or programmer.
Read the session history to know which iteration you're on.

This works, but the state machine lives inside the LLM's head. It's reading the skill, interpreting rules, and deciding what to do. If the LLM misinterprets the iteration count or forgets to call sessions_send, the pipeline breaks silently.

I wanted deterministic orchestration. The LLM does creative work (writing code, reviewing code, running tests). A machine does the routing.

Attempt 5: Plugin Hooks as an Event Bus

OpenClaw supports custom hooks — TypeScript handlers that fire on events like message_sent, tool_result_persist, etc. My idea:

Each agent emits a structured event at the end of its response: [event:code_complete] {"project": "project-a"}
A plugin hook intercepts the output, parses the event
The hook looks up a subscriptions.json to find the next agent
It calls POST /hooks/agent to trigger the next step

const handler: HookHandler = async (event) => {
  const match = event.context.lastMessage.match(/\[event:(\w+)\]\s*(\{.*\})/s);
  if (!match) return;

  const [, eventType, payload] = match;
  const targets = subscriptions[eventType];

  for (const target of targets) {
    await fetch("http://127.0.0.1:18789/hooks/agent", {
      body: JSON.stringify({
        message: data.message,
        agentId: target.agentId,
        sessionKey: `hook:${data.project}:${target.role}`,
        deliver: false
      })
    });
  }
};

This was closer — deterministic routing, testable without LLMs, extensible via JSON config. But it required writing a custom plugin, maintaining subscription mappings, and handling iteration counting in the hook.

Then I found the real solution.

The Solution: Lobster + Sub-Lobsters

What is Lobster?

Lobster is OpenClaw's built-in workflow engine. It's a typed, local-first pipeline runtime with:

Deterministic execution — steps run sequentially, data flows as JSON between them
Approval gates — side effects pause until explicitly approved
Resume tokens — paused workflows can be continued later without re-running
One call instead of many — OpenClaw runs a single Lobster tool call and gets a structured result

The analogy: Lobster is to OpenClaw what GitHub Actions is to GitHub — a declarative pipeline spec that runs within the platform.

A Lobster workflow file looks like this:

name: email-triage
steps:
  - id: collect
    command: inbox list --json
  - id: categorize
    command: inbox categorize --json
    stdin: $collect.stdout
  - id: apply
    command: inbox apply --json
    stdin: $categorize.stdout
    approval: required

Lobster can call any OpenClaw tool via openclaw.invoke, including agent-send (to message other agents) and llm-task (for structured LLM calls with JSON schema validation).

The Missing Piece: Loops

My pipeline needs to loop the code→review cycle up to 3 times. Lobster's step model was linear — no native loop construct.

So I built it.

Sub-Lobsters: Nested Workflows with Loops

I opened PR #20 on the Lobster repo, introducing sub-lobster steps — the ability to embed a .lobster file as a step, with optional loop support.

New fields on WorkflowStep:

Field	Description
`lobster`	Path to a `.lobster` file to run as a sub-workflow
`args`	Key/value map passed to the sub-workflow
`loop.maxIterations`	Maximum number of iterations
`loop.condition`	Shell command evaluated after each iteration. Exit 0 = continue, non-zero = stop

The loop condition receives LOBSTER_LOOP_STDOUT, LOBSTER_LOOP_JSON, and LOBSTER_LOOP_ITERATION as environment variables, so you can inspect the sub-workflow's output to decide whether to continue.

The Final Pipeline

Main workflow (dev-pipeline.lobster):

name: dev-pipeline
args:
  project: { default: "project-a" }
  task: { default: "implement feature" }

steps:
  - id: code-review-loop
    lobster: ./code-review.lobster
    args:
      project: ${project}
      task: ${task}
    loop:
      maxIterations: 3
      condition: '! echo "$LOBSTER_LOOP_JSON" | jq -e ".approved" > /dev/null'

  - id: test
    command: >
      openclaw.invoke --tool agent-send --args-json '{
        "agentId": "tester",
        "message": "Test the approved code: $code-review-loop.stdout",
        "sessionKey": "pipeline:${project}:tester"
      }'
    condition: $code-review-loop.json.approved == true

  - id: notify
    command: >
      openclaw.invoke --tool message --action send --args-json '{
        "provider": "telegram",
        "to": "${chat_id}",
        "text": "✅ ${project}: pipeline complete"
      }'
    condition: $test.exitCode == 0

Sub-workflow (code-review.lobster):

name: code-review
args:
  project: {}
  task: {}

steps:
  - id: code
    command: >
      openclaw.invoke --tool agent-send --args-json '{
        "agentId": "programmer",
        "message": "${task}. Iteration $LOBSTER_LOOP_ITERATION.",
        "sessionKey": "pipeline:${project}:programmer"
      }'

  - id: review
    command: >
      openclaw.invoke --tool agent-send --args-json '{
        "agentId": "reviewer",
        "message": "Review this: $code.stdout",
        "sessionKey": "pipeline:${project}:reviewer"
      }'
    stdin: $code.stdout

  - id: parse
    command: >
      openclaw.invoke --tool llm-task --action json --args-json '{
        "prompt": "Did the review approve? Return approved (bool) and feedback (string).",
        "input": $review.json,
        "schema": {
          "type": "object",
          "properties": {
            "approved": {"type": "boolean"},
            "feedback": {"type": "string"}
          },
          "required": ["approved", "feedback"]
        }
      }'
    stdin: $review.stdout

Here's what happens when someone sends "project-a: implement JWT" on Telegram:

Lobster runs code-review.lobster as a sub-workflow
The programmer agent writes code (full OpenClaw agent with tools, memory, identity)
The reviewer agent reviews it (different agent, different workspace, potentially different model)
llm-task parses the review into structured JSON: {approved: false, feedback: "..."}
The loop condition checks $LOBSTER_LOOP_JSON.approved — if false and iteration < 3, go to step 2
When approved (or max iterations reached), control returns to the parent workflow
The tester agent runs tests
Telegram notification sent

All deterministic. All inside OpenClaw. Zero external infrastructure.

The Architecture

Telegram
    │
    ▼
OpenClaw Gateway (:18789)
    │
    ├── Agents (isolated workspaces, tools, identity, models)
    │   ├── programmer/
    │   ├── reviewer/
    │   └── tester/
    │
    ├── Lobster (workflow engine)
    │   ├── dev-pipeline.lobster    (main: loop → test → notify)
    │   └── code-review.lobster     (sub: code → review → parse)
    │
    ├── llm-task plugin (structured JSON from LLM, schema-validated)
    │
    └── Webhooks (/hooks/agent)
        └── Trigger pipelines per project with isolated session keys

Each agent is a full OpenClaw agent:

Own workspace with AGENTS.md, SOUL.md
Own tools (programmer gets exec, write; reviewer gets read only; tester gets exec + test runners)
Own model (Opus for programmer, Sonnet for reviewer to save cost)
Own memory and session history

The LLMs do what LLMs are good at: writing code, analyzing code, running tests. Lobster does what code is good at: sequencing, counting, routing, retrying.

What I Learned

1. Don't orchestrate with LLMs. Every time I tried to put flow control in a prompt ("when you're done, send to the reviewer"), I introduced a failure mode. LLMs are unreliable routers. Use them for creative work, use code for plumbing.

2. Read the docs twice. I almost built an entire external event bus before discovering that OpenClaw already had agentToAgent, sessions_send, and webhooks with session routing. The primitives were there — I just hadn't found them yet.

3. Contribute the missing piece instead of working around it. Lobster didn't have loops. Instead of building a wrapper script or a plugin hook to simulate loops, I added loop support to Lobster itself. The sub-lobster PR is 129 lines of implementation + 186 lines of tests. It took less time than any of the workarounds would have.

4. Session keys are your data model. The pattern pipeline:<project>:<role> gives you project isolation, role separation, and addressability in one string. No database needed — the session key is the address.

5. Typed pipelines beat prompt engineering for coordination. A YAML file with condition, loop, and stdin piping is infinitely more reliable than telling an LLM "if the review is negative, go back to step 2, but only up to 3 times."

Current Status

PR #20 is open on the Lobster repo — sub-workflow steps with optional loop support
The architecture works end-to-end with OpenClaw's existing multi-agent, webhooks, and Lobster tooling
Next step: production testing with real projects

If you're building multi-agent systems, consider whether your orchestration layer needs to be an LLM at all. Sometimes the best agent architecture is one where the agents don't know they're being orchestrated.

How This Was Built

This article describes work that spanned about two months and involved several different tools and approaches.

Claude helped me think through the architecture options — bouncing ideas, evaluating trade-offs between approaches, and structuring the decision tree. It was a thinking partner for the design phase.

The exploration of OpenClaw's internals was largely manual. Claude wasn't able to fully parse OpenClaw's documentation and source code to surface the key primitives I needed (agentToAgent, sessions_send, Lobster workflows, plugin hooks). I found those by reading the docs myself, tracing through the codebase, and connecting dots that weren't obvious from search results alone. If you're building on a fast-moving open-source project, there's no substitute for reading the source.

GitHub Copilot coding agent wrote 100% of the Lobster fork code. I assigned the task, described what I wanted (sub-workflow steps with loop support), and Copilot worked autonomously in its cloud environment. My only involvement was code review on the PR. The irony isn't lost on me: an autonomous coding agent built the loop primitive that enables autonomous coding agent pipelines.

AI Development Maturity Model

Gustavo Gondim — Fri, 07 Nov 2025 00:08:38 +0000

As AI-assisted development matures, developers evolve from manual coding to strategic orchestration.

The AI Development Maturity Model (AIDMM) defines five levels of evolution, from purely human to fully autonomous AI-driven codebases.

Why It Matters

Benchmark your AI adoption across projects.
Prioritize investment in automation and oversight.
Define new metrics like AI contribution ratio and review autonomy.
Foster trust through auditable maturity levels.

The Five Levels

Level	Name	Typical Use Case
⚙️ 0	Human-Only Development	Legacy systems, compliance-heavy code
💬 1	AI-Inspired Development	Brainstormings, study, Proof-of-Concept
🤝 2	AI-Collaborative Development	Real life projects, personal or enterprise
🤖 3	AI-Delegated Development	PR bots, repo agents, async automation
⚡️ 4	Fully Autonomous AI Development	Project cloned from templates, application replicas, CRUD-related stuff

Level 0 — Human-Only Development

No AI involvement. Every commit, test, and refactor is done manually.

Traits

100% human-written code.
No chatbots, completions, or AI suggestions.
Legacy or controlled environments.

Analogy: Coding on a typewriter: precise, deliberate, but limited in scale.

Level 1 — AI-Inspired Development

Developers use AI conversationally, as an idea partner, not a code editor.

Traits

Human writes all code.
AI influences thinking and structure.
Prompts replace StackOverflow searches.

Examples

ChatGPT, Gemini or Claude for brainstorming, planning, debugging, or refactoring logic.
GitHub Copilot Code completions for snippets and syntax hints.

Analogy: A silent mentor who helps you think, not type.

Level 2 — AI-Collaborative Development

The AI works inside the IDE, actively contributing to the code being written.

Traits

Shared authorship between human and AI.
Developer still curates and accepts all changes.
Focus on flow and rapid iteration.

Examples

GitHub Copilot inside a Visual Studio Code, suggesting multi-line logic in real-time.
GitHub Copilot (agent mode), OpenAI Codex or Cursor executes local edits, fills in functions, completes tests, run commands in terminal.

Analogy: Pair-programming with a machine that anticipates your next thought.

Level 3 — AI-Delegated Development

The developer delegates entire coding tasks to autonomous agents. AI operates as a background contributor: commits code, opens PRs, and self-tests.

Traits

Human reviews and merges.
AI acts as a proactive teammate.
True “agent mode” where AI works asynchronously.

Examples

GitHub Copilot
Devin

Analogy: A junior developer you supervise — except it works 24/7 in the cloud.

Level 4 — Fully Autonomous AI Development

AI independently builds, tests, and deploys software aligned with strategic goals. Human input is reduced to high-level constraints and evaluation metrics.

Traits

100% AI-written and maintained code.
Continuous feedback loops.
Humans oversee outcomes, not syntax.

The Future of Development

AI won’t just assist, it will participate, delegate, and eventually own the development loop.

Developers will evolve from coders → curators → orchestrators of autonomous systems.

The true artistry of future development lies not in typing code, but in teaching systems how to build and reason.

Use it in your GitHub repo!

You may use AIDMM badges in your GitHub repository to declare your AIDMM level and encourage developers to use it.

[![AIDMM0](https://img.shields.io/badge/AIDMM-0🧑‍💻-lightgrey?style=for-the-badge)](https://dev.to/ggondim/ai-development-maturity-model-4i47)
[![AIDMM1](https://img.shields.io/badge/AIDMM-1💬-blue?style=for-the-badge)](https://dev.to/ggondim/ai-development-maturity-model-4i47)
[![AIDMM2](https://img.shields.io/badge/AIDMM-2🤝-brightgreen?style=for-the-badge)](https://dev.to/ggondim/ai-development-maturity-model-4i47)
[![AIDMM3](https://img.shields.io/badge/AIDMM-3🤖-orange?style=for-the-badge)](https://dev.to/ggondim/ai-development-maturity-model-4i47)
[![AIDMM4](https://img.shields.io/badge/AIDMM-4🧠-purple?style=for-the-badge)](https://dev.to/ggondim/ai-development-maturity-model-4i47)

(Low)Code Maturity Model

Gustavo Gondim — Fri, 08 Aug 2025 22:22:33 +0000

TL;DR

The (Low)Code Maturity Model (LCMM) is a framework to classify how a technology team balances governance, flexibility, and delivery speed across three levels:

Codeful – Full control over code, strict governance, and strong consistency.
Low-deploy – A hybrid of code and low-code tools, enabling faster delivery while retaining some flexibility.
Low-code – Tool-driven development with minimal coding, ideal for quick wins and non-developer contributions.

Why it matters: Choosing the right level for each initiative helps avoid the extremes of slow, over-governed delivery or uncontrolled, unmaintainable quick hacks.

When to use:

Codeful → For core systems, high-risk domains, long-term maintainability, improving reusability, refactor of mature features.
Low-deploy → For experimental features, internal tools, or agentic workflows.
Low-code → For MVPs, integrations, internal tools, and empowering non-developers to deliver value quickly.

Why a Maturity Model for “(Low)Code”?

Technology teams constantly face the trade-off between speed, control, and extensibility. Without a clear framework, teams may:

Over-engineer simple solutions, delaying value delivery.
Push quick, tool-based solutions into production without proper governance.
Fail to align the choice of tooling with the risk profile and lifespan of the project.

A maturity model provides a shared language to discuss and align on these trade-offs, ensuring the right approach is used for each specific initiative.

The Spectrum at a Glance

The (Low)Code Maturity Model defines some distinct levels:

Level	Governance	Speed	Flexibility	Experience level
Codeful	🟢 High	🔴 Low	🟢 High	🔴 High
Low-deploy	🟡 Medium	🟢 High	🟢 High	🔴 High
Low-code	🔴 Low	🔵 Very High	🔴 Low	🟢 Low

Key observation: This is not a ladder you must climb — teams can and should operate at multiple levels simultaneously, choosing the right approach per initiative rather than standardizing on one level for everything.

Level 1 — Codeful

Codeful represents the traditional, full-code development approach. All logic, infrastructure, and deployment pipelines are defined and maintained in code, with strong engineering discipline.

📋 Core Practices

Maintain type consistency across services and functions.
Share utility code across projects to avoid duplication.
Enforce governance for publication: branch strategies, pull requests, CI/CD checks.

⭐ Strengths

Maximum flexibility to implement any logic or integration.
Strong maintainability when standards are enforced.
Easier compliance with security and regulatory requirements.

⚠️ Risks

Slower time-to-market compared to low-code approaches.
Higher barrier to entry for non-developers.
Risk of over-engineering simple use cases.

🎯 Use cases

Focus on product maturity: For features that are tested enough and have a mature history of user adoption and feedback.
Focus on schema preservation: Core-business domains where correctness and compliance are critical.
Focus on reusability: Components that will be reused across multiple products.

Level 2 — Low-deploy

Low-deploy is a middle ground between full-code and pure low-code. It leverages platforms that offer visual building blocks but still allow embedding and running custom code, often with access to external libraries (e.g., npm).

📋 Core Practices

Use tools such as Windmill or Plasmic.
Combine visual editing with code injection for flexibility.
Allow developers to bypass some governance for rapid iteration.
Integrate with existing services via API, SDKs or external libraries.

⭐ Strengths

Faster delivery than Codeful, without fully sacrificing flexibility.
Enables smaller teams to ship more features quickly.
Lower upfront cost for prototypes compared to full-code builds.

⚠️ Risks

Partial governance bypass can introduce quality and security risks.
Risk of creating unreviewed production logic.
Platform lock-in if proprietary features are overused.

🎯 Use cases

Focus on operational optimization: Internal tools where speed is more critical than governance.
Focus on AI evaluation: Agentic workflows and AI-related features, where sometimes you need to deal with AI models through code.
Focus on public reusability: Situations where developers need a productivity boost without being restricted by a low-code tool.

Level 3 — Low-code

Low-code is a fully tool-driven approach where most of the application is built through visual interfaces and prebuilt components, with minimal to no direct coding.

📋 Core Practices

Use tools such as n8n, Retool, or Bubble.
Empower anyone familiar with the tool to deliver value.
Limited ability to add custom code or external libraries.
Extend via community components, templates, or connectors.

⭐ Strengths

Extremely fast delivery for prototypes and integrations.
Empowers non-developers (“citizen developers”) to contribute.
Minimal initial setup — no need for complex infrastructure.

⚠️ Risks

Vendor lock-in and dependency on platform availability.
Harder to enforce coding standards or architectural consistency.
Limited extensibility for complex or highly custom logic.

🎯 Use cases

Focus on early feedback: Customer-facing features with a short feedback loop, where speed matters more than long-term maintainability.
Focus on automation: Integrations between SaaS tools.

Decision Framework

Selecting the right maturity level can be guided by four practical dimensions:

Speed
- Question: Is there uncertainty about the feature’s business value (never tested before)? Is there an opportunity cost if delivery is delayed?
- Interpretation: The higher the need for speed, the closer to Low-code.
Need
- Question: Is the functionality primarily an integration between known systems/products, or is it proprietary code unique to the business?
- Interpretation: The more proprietary the functionality, the closer to Codeful.
Team Experience / Knowledge
- Question: Are the people delivering it senior engineers or low-tech-skill contributors?
- Interpretation: Lower skill levels push towards Low-code.
Flexibility
- Question: Does the code need to reuse internal libraries? If yes → Codeful. Does it need to reuse external libraries? If yes → Low-deploy. If neither, → Low-code.

Scorecard template

You can use the following scorecard to evaluate your initiatives.

Assign a score from 1 to 3 for each dimension based on the questions above.
Sum the scores to get a total.
The highest score indicates the most appropriate maturity level for the initiative.

Dimension	1	2	3
Speed (Low→High)	1
Need (Core→Automation)		2
Team Experience (High→Low)		2
Flexibility (Internal→None)			3
Scores (sum)	1	4	3

Total Score: 8

The total score can be interpreted as follows:

4-6: Lean towards Codeful.
7-9: Lean towards Low-deploy.
10-12: Lean towards Low-code.

Migration Paths & Hybrids

Moving Between Levels

Low-code → Low-deploy: When quick MVPs need more flexibility via external libraries.
Low-deploy → Codeful: When compliance, usability needs, or complexity increases.
Codeful → Low-deploy/Low-code: For peripheral or experimental features needing faster delivery.

Hybrid Architectures

Core system in Codeful; AI in Low-deploy; automation in Low-code.
Example: An e-commerce platform with a Codeful backend, Low-deploy CMS, and Low-code marketing automations.

Anti-patterns

Shadow IT: Untracked Low-code automations in production.
Glue code sprawl: Over-reliance on Low-deploy without migration to Codeful.
Permanent MVPs: Low-code solutions running as core systems for years.

Recommendation

"Get blood from a stone": stick to the original maturity until you really need to change. Try to work around the obstacles before moving completely to a different maturity level.
Maintain an inventory of solutions and their maturity level for governance oversight.

Challenges & Counterpoints

When Not to Use Low-code

Mission-critical systems where platform downtime is unacceptable.
Highly regulated environments with strict audit/compliance requirements.
Scenarios needing complex, highly optimized algorithms.

Sustainability of Community Components

Community-driven connectors/templates may lack maintenance.
Risk of sudden deprecation or API changes.
Mitigation: Fork and self-maintain critical components.

Long-term Cost vs. Short-term Speed

Low-code may reduce initial development time but increase long-term maintenance and licensing costs.
Codeful has higher upfront cost but better control over lifecycle.

Governance Drift

Teams may start in Low-code for MVPs but never migrate to Codeful when complexity grows.
Requires periodic maturity reassessment.

Vendor Lock-in

The deeper the platform integration, the harder and costlier the migration.
Mitigation: Abstract business logic into reusable, platform-independent services.

Cultural Resistance

Developers may resist Low-code adoption, seeing it as a threat to craftsmanship.
Non-technical teams may resist Codeful due to perceived bureaucracy.
Requires clear communication of trade-offs.

Counterpoint

Low-code is not “less engineering” — it’s a different form of engineering. Success depends on applying the right level of discipline, regardless of the platform.

Bun (1.21) still can’t replace Node (but here’s how I use them together)

Gustavo Gondim — Mon, 03 Feb 2025 23:57:09 +0000

At first, when I saw Bun’s benchmarks, I was amazed. HTTP servers ranking at the top of framework performance lists (like Elysia) were very appealing. Installing npm packages much faster? I would easily trade pnpm for it.

When I read the article Bun hype. How we learned nothing from Yarn here in Dev.to, I was a bit outraged. But today, I tend to agree with it.

As I mentioned in this tweet, I ended up getting somewhat frustrated with how many packages unrelated to the runtime—such as libraries for Amazon S3 and SQLite—were included.

From my experience, I felt that there was a lack of investment in developer experience. Aside from the issues I had with Node/TypeScript eight years ago, the following things had never happened to me while using Node in VS Code—at least until now, in version 1.21.

(A little spoiler here: I ended up using Bun and Node together in the end. Check out the last section to see how I defined this.)

Debugging

bun --inspect did not respect breakpoints unless there was a port listener or some function "hanging" in the runtime. --inspect-brk and --inspect-wait didn’t work either.

The launch.json file described in the official Bun VS Code extension did work, especially when changing the Bun location to node_modules/.bin/bun. However, this caused serious conflicts when using Volta.sh.

The debugger’s behavior fluctuated intermittently between launch.json and “Run file,” particularly when switching between Remote/Tunnel and local development.

The Web Debugger also had issues, especially failing to respect promise waits when stepping through await lines. It would get completely lost.

Even with sourcemaps, transpiled JavaScript code did not correctly map to the TS files in the same monorepo.

Open ports were also left hanging, so I had to create a task in tasks.json to use as a postDebugTask in launch.json to kill processes with open ports.

{
  "version": "2.0.0",
  "tasks": [
    {
      "label": "kill port",
      "type": "shell",
      "command": "lsof -ti:3000 | xargs kill -9",
      "problemMatcher": []
    }
  ]
}

Test runner

Bun's test runner seems excellent, but it is still not integrated with VS Code's Test Explorer like Jest and other runners.

Since it doesn’t appear in the Test Explorer, it’s difficult to visualize and run only the tests that failed in the last run within VS Code.

It does have the --filter option in the CLI, but you would have to manually fetch the failed tests from the last run and specify them manually. Alternatively, you would have to do the same manually with .skip().

Besides that, debugging with the test runner doesn’t seem to be an option, especially given the already poor debugging experience.

Additionally, the test runner has some bizarre assertion errors. I have two packages in a monorepo that reference a third package in their package.json. Even though they are on the same version, the test runner treats this package as if it were different versions. A simple class instance assertion (instanceof) behaves as if the same class had different constructors, causing the assertion to fail.

I believe this happens due to the way it handles monorepos. When running with the test runner, it likely has two different module resolutions—one looking at node_modules and another resolving modules within the runtime in memory.

I ended up solving this in a dirty way. Instead of using instanceof, I compared obj.constructor.toString() === ClassName.toString().

Monorepos & multi-root workspaces

I know that monorepos are not very common, except for developers who publish multiple packages. But Bun falls short when it comes to monorepos.

Many times, the build and install processes got confused and caused conflicts with the repository’s packages—something I have never experienced with npm.

Besides the debugging and sourcemap issues I mentioned earlier, these are completely ignored in monorepos.

Multi-root workspaces were also a problem since VS Code does not respect the launch.json from a single root unless you create a .vscode folder for the entire workspace. Although this is an issue with VS Code rather than Bun, it still results in a poor experience because Bun users are forced to rely on launch.json.

Build

(Only for those who actually need to transpile Bun to JS and publish it to an NPM package, for example.)

Although Bun's build process is very fast, it accepts things that tsc would never allow. It also does not properly respect tsconfig.json, relying much more on flags passed either via CLI or as arguments in Bun.build().

By default, Bun fills the bundle with all external modules from node_modules, and the external and packages flags are quite confusing at first—sometimes even conflicting with each other.

Final definitions

In the end, I ended up formalizing some definitions on where and when to use Bun in the projects I work on.

Libs: Use tsc to transpile and publish (this removes code from the bundle and maintains compatibility with CJS).
Debug: Use Node with sourcemaps, as this allows proper debugging of TypeScript in monorepos (and even in linked packages inside node_modules) in VS Code.
Servers/Applications: Use Bun because it is faster and more efficient at runtime.
Package manager & scripts: Use Bun (although using pnpm is still a good option due to symlinks and reduced disk space usage for development).

TypeScript Advanced Types: My Daily Essentials Now on NPM

Gustavo Gondim — Thu, 14 Nov 2024 20:54:46 +0000

The ts-advanced-types library is a collection of utility types designed to simplify and enhance TypeScript development. I started it four years ago to address common challenges such as excluding specific properties from objects, enforcing mutually exclusive type options, and defining tree-like data models, making your code more expressive and maintainable.

Built to streamline everyday TypeScript tasks, this library consolidates frequently used advanced types into a reusable package. By sharing these tools with the community, my goal is to help developers write cleaner, more reliable code and save time across projects.

Installation and setup

Setting up the ts-advanced-types library is straightforward. Since the project is not transpiled, it directly exposes all types and utilities through its index.ts file. This ensures a lightweight and hassle-free integration into your TypeScript project.

Steps to install:

1 Add the library to your project via npm or yarn:

npm install ts-advanced-types
# or
yarn add ts-advanced-types

2 Import the required types or utilities directly from the package:

import { TypeXOR, Without } from 'ts-advanced-types';

No additional configuration is needed—you’re ready to start using the library. This simplicity allows you to focus entirely on leveraging the provided tools to enhance your TypeScript workflows.

What's inside

The library includes a wide range of utility types and helpers that solve common TypeScript challenges. From enforcing stricter object constraints to creating mutually exclusive type options, each utility is designed to address specific use cases while improving code clarity and maintainability. These types are all ready to use and can seamlessly integrate into your existing projects, saving you time and effort

Utility types

Without<T, U> Remove all properties from T that are assignable to U
TypeXOR<T, U> XOR of two types

Basic types

Falsy JavaScript falsy types
PrimitiveValidIndexSignature JavaScript primitive types accepted as index signatures
Primitive JavaScript primitive non-falsy types
Complex JavaScript non-falsy types
FalsyOrLiteral JavaScript primitive types, including falsy values
Document<T = Complex> An object made of string keys and non-falsy values. To add new types to values, use the T type parameter.
JsonOrString A JSON, as a string or as a parsed object or array

Advanced types/classes

TreeItem<T> A generic tree
EmptyConstructorOf<T> A type that implements a constructor without arguments
ClonableType<T> A type that is clonable: it can be instantiated with a partial object

Utility functions

isFalsyOrSpaces(value) Check if a value is falsy or a string with only spaces, ignoring number 0
withoutProps(obj, ...props) Clones an object, optionally removing a list of properties
equals(a, b) Checks if two objects are equal using the equals method or strict equality
getMethods(obj) List all methods of an object and its prototypes

Contribute and share your feedback

The library is an evolving project, and contributions from the developer community are always welcome. If you have ideas for new utility types, improvements, or discover any issues, feel free to open a pull request or report an issue on the GitHub repository.

Your feedback is invaluable in shaping the library to better serve TypeScript developers. If you’ve used ts-advanced-types in your projects, I’d love to hear about your experience! Leave a comment on this post or share your thoughts directly on GitHub. Together, we can make TypeScript development even more efficient and enjoyable.

Forem: Gustavo Gondim

Field Learnings with OpenClaw and WhatsApp

High-Level Architecture

Config: JSON, Not YAML

Inheritance Bug: Must Set on Both Channel + Account

groupAllowFrom Is a Sender List, Not a Group List

WhatsApp in DinD

Baileys Pairing

append Skip Bug After Restart

groupAllowFrom Bug (issue #54613)

Sessions and Conversation Context

MCP Transport: Legacy SSE, Not Streamable HTTP

Gateway Mode Is Required

Useful Debug Commands

Strategy for Debugging Silent Symptoms

Recommendations for Other Deployments

O Claude terminou com o OpenClaw… Será mesmo?

A má notícia oficial

A contradição da própria Anthropic

O cenário atual

Próximos episódios

Migrating from Claude Sub-agents to duckflux

How Claude Sub-agents work

The non-determinism problem

What is duckflux?

The determinism spectrum

Concepts side by side

Migration patterns

Chained sub-agents

Parallel research

Review loop with quality gates

Event-driven coordination

When to keep sub-agents

When to switch to duckflux

What you gain

What you lose

A hybrid approach

Getting started

Final thoughts

Migrating from Ralph Orchestrator to duckflux

What is Ralph Orchestrator?

Where the event model creates friction

What is duckflux?

Two models of coordination

Concepts side by side

Migrating the code-assist pipeline

Ralph Orchestrator

duckflux

Migrating coordination patterns

Pipeline (linear handoff via events)

Adversarial review (cyclic event routing)

Coordinator-specialist with event signaling

Cyclic rotation with I/O chaining

Backpressure: prompt-injected vs. real gates

When to use events vs. explicit flow

What you gain

What you lose

Getting started

Final thoughts

Migrating from Ralph Loops to duckflux

What are Ralph Loops?

Where Ralph starts to hurt

What is duckflux?

Side-by-side comparison

The Ralph way

The duckflux way

Migration cookbook

Simple loop until completion

Phased loops (multi-step)

Parallel worktrees

Conditional continuation

What you gain

Getting started

Final thoughts

duckflux : A Declarative Workflow DSL Born from the Multi-Agent Orchestration Gap

Table of Contents

Previously, on this series

The gap that remained

What is duckflux

Alternatives considered

`groupAllowFrom` Is a Sender List, Not a Group List

`append` Skip Bug After Restart

`groupAllowFrom` Bug (issue #54613)

1. `agentToAgent` — Native Peer Messaging

2. `sessions_send` — Addressable Sessions