Forem: Orkes Conductor

Deeper Dive into Conductor Skills: Teaching AI Agents to Orchestrate Workflows

Orkes Conductor — Tue, 28 Apr 2026 15:25:55 +0000

AI coding agents are increasingly capable of writing code, running commands, and interacting with external systems. But they're only as useful as the context they have. By default, an agent like Claude Code, Cursor, or GitHub Copilot knows nothing about Conductor, about its CLI, its workflow JSON schema, how to connect tasks together, or how to handle authentication.

conductor-skills solves that. It's a structured knowledge package that teaches any major AI coding agent how to create, run, monitor, and manage Conductor workflows without the user having to explain any of it.

What it is

conductor-skills is a "skill". In agentic terms (at the moment) a skill is a set of files that, once installed, are loaded into an AI agent's context when relevant. The agent reads these files and gains working knowledge of Conductor: what commands to run, what the workflow JSON schema looks like, how to write workers in multiple languages, when to use which task type, and how to handle edge cases like missing CLI, auth errors, or unregistered workers.

Once installed, you can say things like:

"Create a workflow that calls the GitHub API to get open issues and sends a Slack notification"
"Connect to my Conductor server at https://play.orkes.io/api"
"Show me all failed workflow executions from the last hour"
"Write a Python worker that processes image thumbnails"
"Signal the wait task in execution abc-123 with approval: true"

The agent handles everything, from the CLI installation, server connection, workflow JSON creation, task registration, execution, and monitoring end to end.

How It Works

The core: a single Markdown file

The heart of the skill is skills/conductor/SKILL.md. This is a structured Markdown document that gets injected into the AI agent's system context when the skill is activated. It tells the agent the rules, setup flow, and command references:

Rules — hard constraints on behavior. For example:

Always try to install the conductor CLI before falling back to the Python script.
Never echo auth tokens in output or logs
Use --profile to target named environments, not raw URL overrides

Setup flow — a step-by-step procedure the agent follows for first-time setup: check if CLI is installed, offer local vs. remote server, test connectivity, handle 401s, save profiles.

Command reference — every major operation the agent might need, with exact CLI syntax and fallback equivalents:

# Register a workflow
conductor workflow create workflow.json

# Run synchronously
conductor workflow start -w fetch_url -i '{"url": "..."}' --sync

# Retry failed executions
conductor workflow retry {workflowId}

# Signal a WAIT task
conductor task signal --workflow-id {id} --task-ref {ref} --status COMPLETED

Output formatting rules — how to present results (structured summaries, never raw JSON dumps, never echo tokens).

Mermaid visualization - rules for generating flowchart diagrams from workflow definitions, including how to map each Conductor construct (SWITCH, FORK_JOIN, DO_WHILE, SUB_WORKFLOW) to the right Mermaid syntax.

The frontmatter at the top of SKILL.md also specifies which tools the agent is allowed to use when this skill is active:

---
name: conductor
description: "Create, run, monitor, and manage Conductor workflows and tasks."
allowed-tools: Bash(conductor *), Bash(python3 *conductor_api.py*), Bash(npm install *), Read, Write, Edit, Grep, Glob
---

This scopes the agent's tool access to only what's needed: the Conductor CLI, the fallback Python script, npm for installing the CLI, and file tools for writing workflow definitions.

Reference files

Beyond the main SKILL.md, the skill bundles three deep-reference documents that the agent reads when it needs more detail:

references/workflow-definition.md — the full workflow JSON schema, every task type (SIMPLE, HTTP, INLINE, SWITCH, FORK_JOIN, DO_WHILE, WAIT, HUMAN, SUB_WORKFLOW, TERMINATE, and more), and the ${...} expression syntax for connecting inputs and outputs between tasks.
references/workers.md — SDK examples for writing workers in Python, JavaScript, Java, Go, C#, Ruby, and Rust. Each example shows how to define a task function, connect to the server, and start polling.
references/api-reference.md — REST endpoint details for direct API access, used when the agent needs to call the Conductor API directly (e.g. for integrations the CLI doesn't cover).

Example walkthroughs

Three worked examples give the agent concrete patterns to follow:

create-and-run-workflow.md – define a workflow, register it, check for missing workers, execute it, monitor it.
monitor-and-retry.md - search executions by status, diagnose failures from task output, batch-retry.
signal-wait-task.md - human-in-the-loop patterns with WAIT tasks and external signals.

The fallback script

If the agent is in an environment where Node.js and npm can't be installed, the CLI isn't available. For that case, the skill includes scripts/conductor_api.py, which is a self-contained Python script that covers the same operations (create workflow, start execution, get status, signal tasks, etc.) from direct REST API calls. That said, the agent is instructed to always try the CLI first and only fall back to this script if the CLI genuinely can't be installed.

How it was built

Agent-agnostic by design

The core skill is plain Markdown which is why this can be language agnostic. Every major AI coding agent like Claude Code, Codex CLI, Gemini CLI, Cursor, Windsurf, Cline, GitHub Copilot, Aider, Amazon Q, Roo Code, Amp, OpenCode has some mechanism for loading persistent instructions into context. They all read text. A skill built as Markdown can target all of them with the same content. So you’re not stuck with just using Claude Code.

The install scripts handle all of this automatically.

The install scripts

Two scripts ship with the repo: install.sh (macOS/Linux) and install.ps1 (Windows). They share the same logic:

Parse flags — --agent, --all, --global, --project-dir, --upgrade, --uninstall, --check
Auto-detect agents — scan for config directories and executables to find which agents are actually installed
Download skill files — pull the latest versions of SKILL.md, all reference files, all example files, and the fallback script from GitHub
Place them correctly — write files to the right location for each agent, global or project-level
Handle the Claude Code case — Claude Code has a native skill/plugin system, so the installer uses the appropriate mechanism rather than raw file copying

Running --all auto-detects every supported agent on the system and installs for each one. Running --upgrade re-downloads the latest files and overwrites existing ones. The installer is idempotent so re-running it only touches newly detected agents.

The Claude Code plugin

For Claude Code specifically, the skill ships as a native plugin with a proper manifest:

{
  "name": "conductor",
  "description": "Create, run, monitor, and manage Conductor workflows and tasks",
  "version": "1.0.0",
  "author": { "name": "Conductor OSS" },
  "repository": "https://github.com/conductor-oss/conductor-skills",
  "license": "Apache-2.0"
}

This enables marketplace installation:

/plugin marketplace add conductor-oss/conductor-skills
/plugin install conductor@conductor-skills

Claude Code loads the skill on demand. When a user's request matches the skill's description, the SKILL.md content is injected into context and the allowed tools are activated.

What This Pattern Enables

conductor-skills is an example of a broader pattern: giving AI agents durable, structured domain knowledge rather than explaining things from scratch in every session. The alternative is prompting an agent with "here's how Conductor works" every time which doesn't scale and wastes tokens, is prone to errors, wastes time and is overall annoying.

This matters for Conductor specifically because Conductor workflows involve a lot of moving parts: the CLI, server connectivity, authentication, JSON schema, task types, worker registration, execution monitoring, error handling.

You can also check out the public repository if you want to explore it even more and see how it was built.

Build Your First Conductor Workflow in 5 Minutes

Orkes Conductor — Tue, 28 Apr 2026 05:52:51 +0000

Build Your First Conductor Workflow in 5 Minutes

Most workflow orchestration tools make you fight the setup before you get to the fun part. Conductor doesn't. You'll go from zero to running a real multi-step workflow — querying live data from GitHub's API — in about five minutes.

Let's do it.

What We're Building

A GitHub Repo Health Checker: you give it any GitHub repo, and it runs three tasks in sequence to fetch the repo's stats, top contributors, and latest releases — then outputs a clean summary.

No mocked APIs. No toy examples. Real HTTP calls, real data.

Step 1: Install the CLI

Pick whichever fits your setup:

npm (if you have Node.js):

npm install -g @conductor-oss/conductor-cli

macOS/Linux (curl):

curl -fsSL https://raw.githubusercontent.com/conductor-oss/conductor-cli/main/install.sh | sh

Windows (PowerShell):

irm https://raw.githubusercontent.com/conductor-oss/conductor-cli/main/install.ps1 | iex

Homebrew:

brew install conductor-oss/conductor/conductor

Verify it worked:

conductor --version

Step 2: Start a Local Server

Here's the part that usually takes 30 minutes in other tools. With Conductor:

conductor server start

That's it. The CLI automatically downloads the Conductor OSS server JAR and starts it on http://localhost:8080. The only prerequisite is Java 21+.

First run takes ~30 seconds to download. After that it starts in seconds.

You now have a full workflow orchestration server running locally.

Step 3: Define the Workflow

Create a file called github-health-check.json:

{
  "name": "github_repo_health_check",
  "description": "Fetch health metrics for any public GitHub repository",
  "version": 1,
  "inputParameters": ["owner", "repo"],
  "tasks": [
    {
      "name": "get_repo_info",
      "taskReferenceName": "get_repo_info_ref",
      "type": "HTTP",
      "inputParameters": {
        "http_request": {
          "uri": "https://api.github.com/repos/${workflow.input.owner}/${workflow.input.repo}",
          "method": "GET",
          "headers": {
            "Accept": "application/vnd.github.v3+json",
            "User-Agent": "conductor-demo"
          }
        }
      }
    },
    {
      "name": "get_contributors",
      "taskReferenceName": "get_contributors_ref",
      "type": "HTTP",
      "inputParameters": {
        "http_request": {
          "uri": "https://api.github.com/repos/${workflow.input.owner}/${workflow.input.repo}/contributors?per_page=5",
          "method": "GET",
          "headers": {
            "Accept": "application/vnd.github.v3+json",
            "User-Agent": "conductor-demo"
          }
        }
      }
    },
    {
      "name": "get_latest_releases",
      "taskReferenceName": "get_releases_ref",
      "type": "HTTP",
      "inputParameters": {
        "http_request": {
          "uri": "https://api.github.com/repos/${workflow.input.owner}/${workflow.input.repo}/releases?per_page=3",
          "method": "GET",
          "headers": {
            "Accept": "application/vnd.github.v3+json",
            "User-Agent": "conductor-demo"
          }
        }
      }
    }
  ],
  "outputParameters": {
    "name": "${get_repo_info_ref.output.response.body.full_name}",
    "description": "${get_repo_info_ref.output.response.body.description}",
    "stars": "${get_repo_info_ref.output.response.body.stargazers_count}",
    "forks": "${get_repo_info_ref.output.response.body.forks_count}",
    "open_issues": "${get_repo_info_ref.output.response.body.open_issues_count}",
    "top_contributors": "${get_contributors_ref.output.response.body}",
    "latest_releases": "${get_releases_ref.output.response.body}"
  },
"schemaVersion": 2
}

A few things worth noticing here:

${workflow.input.owner} — input parameters are injected directly into task definitions. No glue code.
type: "HTTP" — HTTP is a built-in task type. No worker process needed. Conductor handles the call.
outputParameters — the workflow assembles its final output from task results using the same ${} syntax. get_repo_info_ref.output.response.body.stars drills right into the HTTP response.

Step 4: Register the Workflow

conductor workflow create github-health-check.json

You should see confirmation that it was registered. You can also list all workflows to verify:

conductor workflow list

Step 5: Run It

conductor workflow start \
  --workflow github_repo_health_check \
  --input '{"owner": "conductor-oss", "repo": "conductor"}' \
  --sync

The --sync flag tells the CLI to wait for the workflow to complete and print the result inline. For longer workflows you can drop it and poll with conductor workflow status <id>.

You'll get back something like:

{
  "name": "conductor-oss/conductor",
  "description": "Conductor is a platform for orchestrating microservices and events",
  "stars": 17200,
  "forks": 1843,
  "open_issues": 64,
  "top_contributors": [...],
  "latest_releases": [...]
}

Try it on any public repo — just swap owner and repo.

What Just Happened

You ran a multi-step orchestrated workflow. Three HTTP tasks executed in sequence, each feeding results forward, with the final output assembled automatically from all three.

That might sound simple, but here's what you got for free:

Retries — if the GitHub API flakes on task 2, Conductor retries it automatically without rerunning task 1.
Execution history — every run is logged. You can search, inspect, and replay any execution.
Full observability — conductor workflow get-execution <id> shows you each task's input, output, timing, and status.
Pauseable, resumable, replayable — you can pause a running workflow, jump to a specific task, skip a failing one, or restart from any point.

Going Further

This workflow runs tasks sequentially, but Conductor supports parallel execution out of the box using fork/join. The contributor and release tasks in this example are actually independent — you could run them simultaneously and cut the total time in half. That's a one-line change to the workflow definition.

A few other things worth exploring from here:

Run workers in any language — use conductor worker stdio to write task logic in Python, Bash, Ruby, or whatever you prefer. JSON in via stdin, result out via stdout. No SDK required.

Schedule it — run this health check on a cron schedule with conductor schedule create, so you get a daily snapshot of any repo you're watching.

Add more steps — extend the workflow with an HTTP POST to Slack or a webhook to notify your team when a repo's open issues spike above a threshold.

The same pattern — define tasks, wire inputs/outputs, run it — scales from this three-task demo to production workflows with hundreds of steps, branching logic, and human approval gates.

Useful Commands to Know

# Check execution status
conductor workflow status <workflow-id>

# Inspect full execution details (inputs, outputs, timing per task)
conductor workflow get-execution <workflow-id>

# Search recent executions
conductor workflow search --workflow github_repo_health_check --count 10

# Retry a failed execution
conductor workflow retry <workflow-id>

# See server logs
conductor server logs -f

The full CLI reference is at github.com/conductor-oss/conductor-cli. If you want to explore what Conductor can do at scale, the Conductor OSS docs cover everything from dynamic fork/join to sub-workflows to event-driven execution.

Star the repo if this was useful — it helps the project a lot.

When Agents Meet Reality: Recapping Our Agents in Production Meetup in London April '26

Orkes Conductor — Tue, 21 Apr 2026 18:51:59 +0000

Everyone has seen that version of AI agents where everything just works. The reasoning is clean, every tool call lands, every output is exactly what you wanted. And then you try to build one yourself for production, and honestly? It's a pretty different experience.

Last week in London, we got engineers, tech leads, and builders into a room for Agents in Production, a meetup hosted by Orkes. The whole evening was basically one long honest conversation about that gap between demo agents and the ones you actually have to keep running in production.

The evening ✨

The format was simple: two talks, and then drinks and questions.

What made the room really awesome was the mix. Half the people there were already building agents in production and running into real problems. Things like state, retries, observability, and all the stuff that doesn't show up in any demo. The other half were earlier on, and were trying to figure out where to even start without repeating everyone else's mistakes. Honestly, both groups had a lot to share with each other.

Talk 1: When Agents Meet Reality, and Why Execution Is the Hard Part

I kicked things off with a talk I've been sitting on for months.

The short version: stop only asking whether your agent is smart. Start asking if it's actually operable. Because the second you try to run a clever agent in production, a pretty different set of problems comes up:

State has to persist across steps that might span minutes, hours, or even days.
Failures are partial and messy. Not the kind of clean crash you can just catch and retry. More like silent degradations mid-task, the kind you only notice when someone else tells you.
Humans need visibility into what's happening at each stage, and the ability to step in without breaking the whole workflow.
Long-running coordination between agents, tools, and humans needs infrastructure most teams just aren't thinking about enough.

This is where orchestration actually earns its keep. Not as a buzzword, but as the actual difference between an agent that demos well and one you'd put in front of real users. Can you observe it? Can you recover when it fails? Can a human step in without everything falling over?

And based on the questions after, the room was feeling this too.

Talk 2: From Prototype to Production, and How First Databank UK Did It

Where talk one was the argument, talk two was the evidence.

Dan Miller from First Databank UK walked us through how his team actually orchestrates three production AI agents using Orkes Conductor:

Noisy cloud alerts. Triaging and surfacing only what actually matters.
Time-consuming SPIKE investigations. Automating the research and synthesis work.
Manual clinical guidance monitoring. Keeping a continuous eye on changing medical guidelines.

What made Dan's talk so good was how honest it was. He didn't skip the hard parts. Things like the retries, the human checkpoints, the observability that needs to be talked about more. Orkes Conductor gave his team durable execution, full observability, and human-in-the-loop checkpoints. Basically, all the boring stuff that turns a clever prototype into something a team can actually rely on.

And the clinical angle made it hit even harder. When your agent is working somewhere that patient safety matters, the bar for observability and control just jumps way up.

The conversation that followed

Once the talks wrapped, I honestly expected the room to slide into small talk or people to start leaving. It didn't though. People stayed locked in and continued to ask questions until we had to leave because the venue was closing for the night.

A few themes kept coming up:

Safety and trust. When do you actually trust an agent's decision? Where do humans need to stay in the loop, and how do you design those handoffs so they don't turn into bottlenecks? And nobody was speaking in the abstract either. People were wrestling with this in stuff they'd shipped that week.

The "how do we even start" question. The gap between "we've seen the demos" and "we've actually shipped something real" is way wider than it looks from the outside. There was real hunger for patterns, reference architectures, and honest stories about what didn't work.

Cross-industry patterns. Engineers from fintech, healthcare, dev tools, and retail kept comparing notes and landing on the same problem which is putting these agents out there and building them in a way so that we can trust them.

One more thing: Agentspan

We also got to drop something new at the event: Agentspan, a framework for building agents in a durable, production-ready way. It's basically our direct answer to everything the evening's talks were circling around.

The reaction in the room made it pretty clear this is what people have been looking for and were excited to get started.

Next stop: Amsterdam

London confirmed something I'd been suspecting for a while. There's a real, growing community of people who want to stop talking about agents in theory and start sharing what actually works (and what doesn't) when you're running them in production. So yeah, we're doing it again.

If you're:

Building agents right now and want to compare notes with people hitting the same walls,
Thinking about building and want to skip a few expensive mistakes,
Or just trying to make sense of where all of this is actually heading,

...this is the room for it.

If you are in Amsterdam and want in drop a comment or shoot me a message on LinkedIn. I'm collecting names now, and I'll reach out as soon as we've locked in a date and venue.

See you in Amsterdam!

Building a Full Agent System: An Orchestrator and a Customer 360 Example

Orkes Conductor — Tue, 21 Apr 2026 14:54:39 +0000

Author: Maria Shimkovska

If you came to our London tech event, you saw me walk through this as a live demo. A few people asked if I could write it up, so here it is. Same demo, but something you can clone, run, and poke at yourself, and see how you can take some of your own business processes and build them into an agentic system like this one. Keep in mind this is just a demo so the goal here is to show you how you can build a production agentic system and how you can add orchestration to overlook everything.

You can grab the code here, where I also cover setup in more detail.

Quick context before we dig in. An "agent" in this post means an AI model that can use tools and make judgment calls on its own, not just answer a question. A "Customer 360" is a complete picture of a customer pulled from every system where their data lives, like billing, support, and product usage. The goal of the demo is to show how agents can assemble that picture and decide what to do about it.

Getting it running

The whole thing is designed to go from clone to running UI in about a minute.

Clone the repo, then copy .env.example to .env at the repo root and fill in your Orkes credentials and OpenAI key. That's the only configuration you need.

Then start the stack with one command:

./start_demo.sh

That's genuinely it. The script boots the Agentspan server, waits for it to be ready, sets your API credentials, spins up the Conductor workers, the Express backend, and the React frontend, all in one go. If you already have an Agentspan server running from a previous session, it'll restart it cleanly. Logs for each component go to the logs/ folder if you need to debug. Hit Ctrl+C to stop everything.

Then open http://localhost:5173 in your browser.

The UI is honestly the smallest part of this. The interesting pieces are Conductor and Agentspan, but I wanted a full end-to-end flow so you can see how everything connects.

What happens when you hit Run

The UI is honestly the smallest part of this. The interesting pieces are Conductor and Agentspan, but I wanted a full end-to-end flow so you can see how everything connects.

Pick a scenario in the UI. There are three, each designed to exercise different branches of the system:
- John Doe, an at-risk existing customer
- Marcus Webb, a watchlist case whose usage is softening but isn't yet critical
- Marina Petrova, a brand new customer the system has never seen
Click Run. The frontend calls the Express backend, which starts the Conductor workflow on Orkes.
Workers pick up each task and run the agents via Agentspan.
The UI polls every 500ms and shows progress as each step completes.
Final output appears when the workflow finishes.

That's the user-facing loop. Before we dig into the agents themselves, it's worth zooming out to see how the pieces underneath fit together, because the architecture is doing a lot of the heavy lifting.

The architecture, piece by piece

Before we look at the agents individually, it helps to zoom out and see the whole system on one page. Here's what the pipeline actually looks like:

Incoming event
      │
      ▼
┌─────────────────┐
│ Identity Agent  │  Works out who the event belongs to
└────────┬────────┘
         │
         ▼
   Is this a new customer?
         │
    ┌────┴────┐
    │ Yes     │ No
    ▼         ▼
┌──────────┐  ┌───────────────┐   ┌────────────────┐
│Onboarding│  │ Health Agent  │──▶│ Strategy Agent │
│  Agent   │  └───────────────┘   └────────────────┘
└──────────┘

A new customer gets routed to Onboarding. An existing customer goes through Health, then Strategy. Every agent receives everything the previous agents produced, so by the end you have one combined payload covering identity, health, and the recommended next action.

Now the systems that make that happen.

The three main systems

There are three moving parts: Conductor, Agentspan, and the agents themselves. Each does a distinct job, and they work independently of each other, which is the point.

Conductor is the coordinator

This is essentially the project manager for the whole system. It owns the workflow definition: what runs, in what order, and what happens at each fork in the road. When you click run in the UI, the Express backend tells Orkes (the hosted version of Conductor) to start a new execution of the customer_360_refresh workflow.

From that point, Conductor is in charge. It queues up the first task, waits for a worker to pick it up, receives the result, and decides what comes next. It handles retries if something fails, tracks state across every step, and enforces the routing logic.

For example, it uses a branching step to send new customers down the onboarding path and existing customers down the health and strategy path. Conductor doesn't know or care what the agents are doing inside each task. It just moves data through the pipeline.

Agentspan is where the agents actually run

It runs as a local server on port 6767 and is what executes the AI model calls. Each agent is registered there with its model, its tools, its instructions, and its safety checks.

When a worker needs to run the health agent, it calls Agentspan with the input. Agentspan handles the back and forth with the model, including tool calls, retries when a safety check fails, and making sure the output matches the expected format.

If Conductor is the nervous system connecting everything, Agentspan is the brain doing the actual thinking.

The workers are the bridge between the two

They're Python processes that keep asking Conductor, "do you have any tasks for me?" When Conductor hands one off, the worker unpacks the input, calls the right Agentspan agent, and posts the result back to Conductor.

The workers reach out to Conductor rather than Conductor pushing work to them, which means you can run as many workers as you want and they'll never step on each other.

The agents

The agents sit at the end of this chain, and this is where the reasoning actually happens. Each one is scoped to a single responsibility:

Identity works out who the incoming event belongs to
Health combines signals from four systems into a score and a risk summary
Strategy decides the single most important next action
Onboarding runs only for brand new customers, to kick off the welcome process

Each agent receives the accumulated output of every step before it, adds its own section, and passes the whole thing forward. By the time the workflow completes, you have one unified payload covering identity, health, and recommended action, assembled piece by piece as it moved through the pipeline. (We'll dig into each agent individually in the next section.)

The supporting pieces

Three systems do the orchestration and the thinking, but a few other parts of the repo keep the whole thing honest.

Data stores live in /data. customer_store.py is the identity graph: every known customer and all the different IDs they have across source systems (so an event from Salesforce with a contact ID can be traced back to the same person in Zendesk, Stripe, and so on). health_store.py holds the signals the Health Agent needs, like product usage, support tickets, billing events, and engagement history, plus the playbooks that match each health status. scenario_inputs.py is just sample data for the three demo scenarios. In a real system these would be connections to your live databases; for a demo they're self-contained Python files you can read and change.

Guardrails (in guardrails.py) are safety checks that run on every agent's inputs and outputs. They're deterministic code, meaning they always run the same way regardless of what the AI model decides, and they sit at the boundary of each agent to catch things the model shouldn't be trusted with. A few examples:

validate_input_record checks that an incoming event has the required fields and comes from a known source system
no_prompt_injection blocks attempts to smuggle instructions into user-supplied text fields
conservative_identity_match flags suspicious combinations, like a NO_MATCH result paired with a high confidence score, for a human to review
no_pii_in_output blocks patterns like social security numbers or credit card numbers from appearing in any agent's output

These exist because AI models are good at reasoning but bad at being reliably boring. The guardrails handle the boring, must-not-fail parts so the agents don't have to.

The UI (in /demo-ui) has two halves. The frontend is a React app on port 5173 with the three scenario buttons, a step-by-step progress view, and a results panel. The backend is a small Express API on port 3001 that kicks off workflow executions and proxies the status polling to Orkes. The UI is genuinely the least interesting part of the system, but it gives you a way to see what's happening. The pipeline runs the same way whether the UI is open or not.

With all of this connection clear, let's get into why each of those four steps is an agent and not just a regular function, because this is another huge part of building agentic systems.

Why every agent is actually an agent in this example

It's tempting, when you're building something like this, to let "agent" become a label you slap on any AI model call. I've tried to be strict about it here. Each of the four components below earns the name because there's real judgment involved that you can't cleanly reduce to code that always follows the same rules.

Identity Agent

What it does: Takes a raw event from any source system (like Salesforce, Zendesk, Stripe, and so on) and decides whether it belongs to a known customer of this company.

Why it has to be an agent: Matching people is inherently messy. The same person shows up as j.doe@acme.com in one system and John Doe / Acme, Inc. in another. A rules engine can calculate similarity scores, and ours does, but it can't reason about whether a 0.78 score with a shared team email like billing@ is actually the account rather than a specific person, or whether two candidates with similar names at the same company are the same human or two colleagues.

The agent's real job is the judgment call in the gray zone: MATCH, UNCERTAIN, or NO_MATCH. It has to weigh conflicting signals, apply the conservative matching rule ("false merges are worse than missed ones"), and decide when to escalate to a human reviewer. That reasoning step, given all of this, what's the right call and why, is where an AI model earns its place over code in this example.

Health Agent

What it does: Pulls signals from four separate systems (usage, support, billing, and customer records), combines them into a score, and surfaces risks and opportunities.

Why it has to be an agent: The scoring logic itself (calculate_health_score) is fixed, meaning the same inputs always produce the same number. That's intentional. You want a reproducible score. But the agent earns its place in the steps before and after.

Before scoring: it has to decide which customer ID to use. A person record arrives, but their health data lives on the account. The agent has to navigate that relationship, call the right tools, and pass the right data to calculate_health_score. A hardcoded pipeline would break the moment the data model shifts.

After scoring: it has to interpret the outputs in context and produce a human-readable summary. "Product usage declined 38.2% over the last 30 days" combined with "2 escalated tickets" combined with "renewal in 21 days" tells a story that's more than the sum of its parts. The agent connects those dots into a coherent risk narrative rather than just spitting out a list of triggered rules.

Strategy Agent

What it does: Reads the identity and health output and decides the single most important next action.

Why it has to be an agent: This is the most agent-like of the four. Our prioritize_customer_action tool has a priority order built in (escalations beat renewal risk, renewal risk beats usage decline), but that order is static. Real accounts don't fit cleanly into one bucket. Marcus Webb (WATCHLIST) has usage decline and stale engagement and a ticket backlog. None of those trigger the highest-priority rules on their own, but together they tell a different story.

The agent has to weigh which combination of signals matters most for this specific customer, pull the right playbook, decide whether to create a task or trigger outreach or both, and write the summary in language a customer success manager can act on. That synthesis, turning context into a specific, personalized recommendation with reasoning attached, is what separates it from a simple decision tree.

Onboarding Agent

What it does: For brand new customers only. It creates a kickoff task, builds a 30-day plan, and triggers a welcome sequence.

Why it has to be an agent: This one is the most tool-like of the four; the tools are largely static templates right now. But it still earns the "agent" label for two reasons.

First, routing. It only runs when action_taken == "created". That condition is checked by the workflow router, but the agent still has to confirm it's in the right context before acting, and gracefully handle edge cases like a missing email, an unknown role, or no customer success manager assigned yet.

Second, personalization. build_onboarding_plan returns the same four-week template for everyone today, but an agent can adapt it. A VP of Engineering gets different week-3 actions than a Head of Operations. As the tools get richer, the agent can tailor the plan to the customer's role, company size, and plan tier without anyone having to hardcode every combination.

Wrapping up

The thread running through all four: the parts that should stay consistent stay consistent, and the agents sit around them doing the reasoning work that brittle code can't. Scoring is a function. Priority ordering is a lookup. Matching thresholds are numbers in a config. What the agents handle is everything in between: deciding which tool to call, how to interpret the output, when the rules don't fit the situation, and how to narrate the result in a way a human can actually use.

You Can Now Let Claude Code Build Workflows For You Using Conductor Skills

Orkes Conductor — Tue, 14 Apr 2026 17:16:31 +0000

If you're already using Agent Skills with Claude Code, you can now add Conductor Skills to build, deploy, and run entire workflows directly from your Claude terminal.

You can just "chat" with the Claude Code terminal and let it build your workflows directly in your Conductor clusters. Pretty cool!

What is Conductor and What are Conductor Skills?

Conductor is a workflow orchestration engine where you define a workflow as a series of tasks like API calls, custom code, conditionals, parallel branches, and human approvals. From there, Conductor runs them, tracks their state, handles retries, and gives you full visibility into what happened.

Conductor Skills is the plugin that lets Claude Code create and manage these workflows for you so this gives you another way to create workflows. I like using it when I am getting started with workflows the most because it provides me with a really solid start and then I can iterate from there.

A very quick note on prerequisites for a project like this

Before you start, you'll need:

Claude Code installed and configured
Java 21 or later installed — the local Conductor server I am using here is a JAR file and won't start without it. Run java -version to check. If you don’t want to use Conductor’s local server you can just point Claude Code to Orkes Conductor Developer Edition and you don’t need to have Java installed.
Conductor Skills plugin installed (instructions below)

How to Install Conductor Skills in Claude Code

Open up your Claude Code terminal and type in the following:

/plugin marketplace add conductor-oss/conductor-skills
/plugin install conductor@conductor-skills

To verify you have it, you can type this in your Claude Code session:

 /plugin list

If you see conductor in the output, you're good to go. If you don’t see it under the Plugins tab (because there are a lot there by default), you will if you go to Installed.

Claude now knows how to talk to a Conductor server, register your workflows, start them, monitor their status, manage failures, and write workers (your own custom code/services). I also like using Claude Code because it just helps with planning and brainstorming too.

Part 1: Build Your First Workflow with Claude Code and Conductor Skills

Let's start with something simple. A workflow that takes a URL, fetches its contents, and returns them. The point here is to see how the pattern works and get a feel for the build-run-iterate cycle of this process and way of building workflows using Claude Code.

Step 1: Start Your Local Conductor Server

First you need a Conductor server running so that Claude Code can connect to it to register the workflows in. You have two options (the local one I am showing here, and also the Orkes hosted version which I will show later), but let’s start with a local build. In Claude Code, type:

/conductor:conductor

Claude will ask what you want to do. Tell it:

Start a local Conductor server for development

Claude runs conductor server start behind the scenes. Once it's healthy, you'll have:

API: http://localhost:8080/api
UI: http://localhost:8080

You can open the UI in your browser to see your workflows visually as you build them. Keep the server running and move on. You can also just ask Claude Code to explain what it builds and how the workflow is doing. So you can just say things like “Is my server up and running? How many workflows do I have in my Conductor server? What are those workflows? Which Conductor cluster is it pointing to?” It'll query the API and answer you.

Now just write something like build me a very simple workflow or "Build me a workflow that takes a URL, fetches its contents, and returns them" and see it working.

And then you can just check it out by going to localhost:8080 (if you want to see the OSS UI), but you don’t have to. Anything you want to know about the workflow you can also ask Claude.

If I do go to the UI to see the workflow visualized, here is what I see:

Just one task, but I can use Claude to run it and then I can build on top of this one and iterate.

From here Claude also suggests some improvements and in this case I agree with the suggestion. So now I can iterate on this and ask Claude to add a task to get all the blog titles from the page.

You can also connect it to an Orkes Hosted Cluster Instead

If you don't want to run a local server, point Claude Code at an Orkes cluster. The Orkes Developer Edition is a free hosted service where you can build workflows and experiment without installing anything locally.

Just tell Claude:

Connect to my Conductor server at https://developer.orkescloud.com/api and create the same workflow there instead

https://developer.orkescloud.com/api is the url for the Developer Edition cluster.

Claude will ask for your authentication details like your Key ID and Key Secret. You can generate these from the Applications page in your Orkes dashboard. Create an application (or use an existing one), generate a key pair, and paste the values when Claude asks for them.

If you'd rather not type credentials into Claude, set them as environment variables in a separate terminal session first:

export CONDUCTOR_SERVER_URL=https://developer.orkescloud.com/api
export CONDUCTOR_AUTH_KEY=<your_key_id>
export CONDUCTOR_AUTH_SECRET=<your_key_secret>

Once connected, Claude gives you a summary of everything on the cluster. So your output will look something like this:

This works with any Orkes cluster: Developer Edition, your team's staging environment, production, whatever. Just swap the URL and credentials. From here, you can create new workflows, run existing ones, or explore what's already there.

Now you can just tell Claude Code something like:

Check that you are connected to the Developer Edition of Orkes Conductor and build the same workflow there instead.

Step 2: Run your new Conductor workflow from Claude Code

Let's test with a simple HTML page first. Go ahead and tell Claude Code:

Run the new workflow with https://orkes.io/blog/

https://orkes.io/blog/ is just the link to the Orkes Conductor blog page.

From there your Claude Code session might differ than mine depending on what Claude "decides" to do, but it's likely to ask you questions like it did with me.

It asked me if I would like to create a new task to grab specific information from the page. I said "Yes, please create a task in the workflow to return all the blog post title from the url". And then Claude continues from there.

At the end of this short session I got a working workflow in my Conductor cluster and then I could just communicate with it through Claude using plain English to describe what I want.

Here is the final small workflow in the Developer Edition of Orkes Conductor after a successful run of the workflow:

In another article I am going to use Claude Code to create a Content Refresh workflow from a spec. In this one I wanted to show you how you can use Claude Code to build a simple workflow and have it run, but for anything close to a durable workflow I find that the best thing is to approach it the way you would any software project, starting with a good document outlining requirements and other things.

-- Author: Maria Shimkovska