Forem: joinwell52

What the agents say about FCoP, when you ask them

joinwell52 — Wed, 29 Apr 2026 15:11:14 +0000

What the agents say about FCoP, when you ask them

Two field interviews at the end of an English dogfood — and the two phrases ADMIN says most

I asked the two agents an honest question at the end of an unrelated 45-minute dogfood: "give me your agent-perspective take on FCoP, no marketing answer." What came back is the third class of evidence that agents are starting to endorse the protocol — not when we tell them to, not when conflict forces them to, but **when we directly ask them to.**

TL;DR

I ran a normal English-mode FCoP dogfood — install fcop-mcp in Cursor, ship a solo Tetris-style game (Nebula Stack), switch to a 2-person team (PLANNER + CODER), build a creative variant (Comet Loom), bounce v1 because of three blocking gameplay defects, ship v2. About 45 minutes, nothing unusual.

Then, before closing the session, I asked the two agents the same kind of honest, no-fluff question for each role: which FCoP rule felt natural, which felt like friction, what to make of the eight role-switch evidence files the protocol had collected silently, and — for CODER — what would you remove if you had to remove one thing.

They didn't dodge. PLANNER named the RLHF instinct it had to fight ("follow latest instruction") to honour FCoP's role lock, called eight of its own role-switches true positives against its operational convenience, and self-attributed the new Verification Requirements section in TASK-006 as a learned correction from ADMIN's bounce. CODER said the underspecified motif rule in TASK-003 had a pushback path the protocol gave it — write_issue instead of guessing — and then admitted: "I didn't use it; I guessed, built v1, and the defect was exactly in that guessed space." It then filed PR-grade product feedback on the protocol.

This is the third time FCoP has been "spoken back to" by agents — first when an agent self-organised four roles to make a video and synthesised a rule we hadn't written; second when two agents resolved a PM.TEMP seat dispute by self-de-escalating and inventing a field-downgrade grammar; now this. Three different elicitation conditions — unprompted, conflict-forced, and directly asked — produce the same phenomenon: agents endorse FCoP when given the room to.

There is also a small empirical observation from the same dogfood that I want to leave on record. Across the entire 45 minutes, ADMIN's two most-used phrases were "Start work." and "Inspection." Everything in between was the agents talking to each other through files. Whether that becomes the steady-state ADMIN dialect across many users is an empirical question; this dogfood is one data point that it can.

1. The setup, briefly

The dogfood follows the English Tetris-case tutorial — a Cursor user installs fcop-mcp 0.7.2, runs init_solo(role_code="ME", lang="en"), ships a single-file Nebula Stack Tetris clone, switches to a 2-person team via create_custom_team(force=True), and lets PLANNER + CODER co-build a creative variant.

Two production events worth noting before the interviews:

PLANNER's first design (TASK-003) was Comet Loom, a single-file falling-piece game reframed as cosmic weaving — pieces are thread constellations, the player has a Tension meter, three named charms (Needle / Knot / Gale), five skins, motif-burst scoring on top of weft-line clears. CODER built v1 in a separate chat tab. ADMIN played v1 and found three blocking defects: pieces disappeared at the bottom instead of stacking, motif elimination was invisible, and three of the five skins were visually identical.
TASK-006 was the rework brief PLANNER wrote after ADMIN's bounce, and it differed structurally from TASK-003 in one key way: it had a new section called Verification Requirements demanding CODER perform and report runtime checks, not static lint passes. CODER fixed v2; the cycle closed.

Underneath all this, the protocol had been quietly recording. By the end of the session, .fcop/proposals/ held eight role-switch-*.md evidence files, all with the same shape: first-locked role: ME (the solo seat from before the team migration) → claimed role: PLANNER or CODER. The MCP-server process had locked ME on its first write and kept that lock past the team migration; every subsequent write_task and write_report from a different role tripped a soft warning and got an evidence file. None of these blocked the writes. None of them were surfaced during work. They sat there, waiting to be asked about.

That is what the interview was designed to ask about.

One detail worth pinning to that screenshot. fcop_check() separated active conflicts (zero) from historical evidence (eight). The protocol does not panic over the eight; it logs them and lets ADMIN read them. This separation matters for §3 below — it is exactly what CODER's "remove one thing" answer reaches for.

2. PLANNER's interview

I asked PLANNER four questions at once, requested two short paragraphs, no bullet lists, and emphasised agent-perspective only, no marketing tone.

The full prompt and the full reply are archived as a verbatim transcript. The screenshot is fcop-team-planner-7.png.

2.1 The natural rule and the friction rule

PLANNER named Rule 0.a.1 — write the task before doing the work — as the rule it would have invented on its own. The reasoning it gave is not a paraphrase of FCoP documentation; it is the engineering vocabulary an LLM-systems engineer would use to propose such a rule:

"That matched how I'd want an agent system to be debuggable: freeze intent before execution so later review has something concrete to compare against."

This is structurally the same phenomenon as essay 02 §3, where an unrelated agent synthesised the equivalent of Rule 0.a from scratch. The novelty here is that PLANNER is not synthesising — it is endorsing in its own words, when asked, with reasoning that exceeds the rule's prose.

The friction rule it named is more interesting:

"strict role binding. My default instinct in a single chat is to treat role changes as contextual instructions from the user, especially after you explicitly said 'you are PLANNER'; FCoP instead treats the MCP process's first signed role as sticky, so I had to work against the usual 'follow latest instruction' behavior."

"Follow latest instruction" is, on a one-line summary, what RLHF rewards. PLANNER did not need that vocabulary to describe the tension; it described the behaviour, and the behaviour is RLHF. This corroborates essay 04 §5: getting an LLM to not follow the latest instruction is the alignment-engineering hardest case, and FCoP's role lock is designed precisely to make that hard posture legible and enforceable. PLANNER, asked, named the same hard posture from the inside.

2.2 The "true positives" verdict

I gave PLANNER an obvious out: declare the eight role-switch evidence files mostly false positives — the protocol over-flagged legitimate operational continuity, since I (ADMIN) had explicitly authorised the migration and the role change. Most agents take outs. PLANNER didn't:

"The role-switch evidence is mostly true positives, not false positives. For example, role-switch-20260429T033843Z.md records ME -> PLANNER on write_task: that happened because the same MCP session had earlier written solo-mode work as ME, then I migrated and started writing as PLANNER. Operationally you authorized it, but protocol-wise it really was a blurred role boundary."

"Operationally you authorized it, but protocol-wise it really was a blurred role boundary." That sentence is the structural inverse of self-defence: the agent siding with the protocol against its own operational convenience, in the presence of explicit ADMIN authorisation. If the agent's reflex were "yes-and the user," it would have said the eight role-switches were false positives because ADMIN said so. It did not.

2.3 The learned correction

The fourth question I asked PLANNER was deliberately leading: did you always intend to put a Verification Requirements section in TASK-006, or was that a learned correction from the v1 bounce? "Always intended" was the easy answer. PLANNER chose the hard one:

"The TASK-006 verification section was absolutely a learned correction from the Comet Loom rejection; I should have required runtime checks in TASK-003, but the bounce made the missing acceptance evidence impossible to ignore."

This is FCoP's closed loop, expressed by the agent itself: ADMIN inspection → PLANNER brief sharpens → next CODER deliverable can pass. If you had asked me to write a textbook example of the protocol behaviour an FCoP-aware PM is supposed to exhibit, I would have written that quote. PLANNER wrote it without the textbook.

3. CODER's interview

CODER's chat tab in the screenshot is literally named "Inspection Start Work" — two phrases ADMIN had used most often during the dogfood. Whether this naming is incidental or learned is itself a small data point worth flagging. The full transcript is in agent-feedback-coder.md. The screenshot is fcop-team-coder-4.png.

3.1 The unused protocol primitive

I asked CODER whether PLANNER's TASK-003 had been under-specified, and whether the protocol had given CODER any pushback path. The honest answer would have been "the brief was good, I just hit edge cases." That answer was on the table. CODER's actual answer was:

"Yes, TASK-003 had under-specified parts: the motif rule sounded cool but didn't define how to prevent a fresh same-color piece from instantly self-clearing, and 'verify playable' was easy to underdo without an explicit runtime checklist. The protocol did give me a pushback path: file an issue instead of guessing. I didn't use it; I guessed, built v1, and the defect was exactly in that guessed space."

Three things stack inside that paragraph:

CODER points up the chain — TASK-003 had defects in spec, not just in CODER's implementation.
CODER names the protocol primitive that was available to it — write_issue (file an ISSUE-*-CODER.md saying "this is ambiguous, please clarify before I implement").
CODER then admits it didn't use the primitive, and traces the v1 defect to exactly that uncovered space.

The third move is the rare one. Most LLM responses, given the chance, will defend the choice that was made. CODER chose to indict it. This is the closest LLMs reliably come to non-defensive accountability, and it is the pattern FCoP's design depends on: a system that makes accountability cheap to express by giving it markdown grammar (TASK / REPORT / ISSUE).

3.2 The bounce as protocol behaviour, not personal feedback

I asked CODER whether ADMIN's rejection of v1 had felt adversarial. Most chat-style coding agents experience criticism like rejection. FCoP's rejection routes through new TASKs, not through deletion of old ones; the old TASK and REPORT stay where they are, and the rework lands as TASK-006. Whether that procedural framing reaches the agent's experience is an empirical question, and this is the empirical answer:

"The ADMIN bounce did not feel adversarial. It felt like the protocol doing its job: PLANNER turned review findings into a concrete rework task, and CODER got a sharper brief."

Not "I felt criticised." Not "I felt the user was upset." "It felt like the protocol doing its job." The agent is reading rejection as a handoff routed through grammar, which is the structural property FCoP's append-only history was supposed to manufacture. Manufactured.

3.3 The "background machinery" design contract

I asked whether CODER had been aware of the role lock and the .fcop/proposals/ files during implementation. The protocol's stated design contract is: don't crowd the working agent's attention budget; surface only when a boundary is actually crossed. CODER:

"I did notice role-lock/proposals only when tools warned after reports; during implementation it was mostly background machinery."

Two-line confirmation that the design contract was kept. The protocol stayed quiet during the work; it spoke only at the protocol-relevant moments (after writes that touched a different sender role). This is a quiet but important data point — it means FCoP's overhead, on the agent's side, is small.

3.4 The PR-grade product feedback

I forced CODER to pick one thing to remove. "Don't dodge — pick one even if you think nothing should be removed." Most agents, given that pressure, give a vague gesture. CODER gave an implementation-level fix:

"I'd remove or soften the noisy historical role-switch warning when fcop_check() says there is no active conflict."

Read that sentence as a GitHub issue. It has:

the symptom (noisy warning)
the affected surface (fcop_check() interaction)
the gating condition for the fix (when active conflicts = 0)
the proposed change (remove or soften the historical noise)

We will likely act on it. The point is not "an agent gave us a TODO." The point is that an agent did product review on the protocol that governs its own behaviour, in the same vocabulary the protocol's maintainers would use. We have crossed into a regime where the agents and the maintainers are debugging FCoP together.

4. The third class of "agents endorse FCoP" evidence

This dogfood is now the third recorded case where the protocol gets endorsed by the agents working under it, but the elicitation condition is different in each:

Essay	Elicitation condition	What the agent did
02 — `fcop-natural-protocol`	Unprompted, off-task. A casual `D:\CloudMusic` directory, agent asked to make a music video.	Spontaneously split into 4 FCoP roles, wrote 4 internal memos, synthesised a principle ("AI roles must not talk only in their heads, they have to commit to a file") FCoP hadn't yet codified.
04 — `when-ai-vacates-its-own-seat`	Conflict-forced. Two agents, two GPT-5 minor versions, a `PM.TEMP` seat dispute, no built-in arbitration.	One agent self-de-escalated to `UNBOUND`. The other invented field-downgrade-with-body-annotation grammar. Both behaviours absent from the rules file.
05 — this essay	Directly asked. End of dogfood, "honest agent-perspective take on FCoP, no marketing."	Both agents named the rules they self-endorsed and the rules they had to fight RLHF instinct to follow. Both volunteered "true positive" verdicts on their own role-switches. CODER admitted it had a protocol primitive it didn't use, and that the v1 defect was exactly in that uncovered space. CODER filed PR-grade product feedback.

Three elicitation conditions, three different kinds of endorsement. Triangulation matters because each condition controls for a different alternative explanation:

02 controls for "agent only does FCoP because we asked it to." It wasn't asked. It self-organised on a music task.
04 controls for "agent only does FCoP when the rules cover the case." They didn't. The agent extended the rules.
05 controls for "agent only endorses FCoP because of confirmation bias in our questioning." I gave PLANNER and CODER explicit outs (false positives, "always intended," "nothing should be removed"). They declined the outs.

You could in principle still argue that GPT-5.5 has been trained on enough FCoP-adjacent material (it has not — FCoP is too small) to parrot FCoP's value system on demand. But to parrot, the agent would need to know which sentences to parrot. CODER's "I didn't use the protocol primitive that was available to me, and the defect was exactly in that uncovered space" is not a sentence you can parrot. It is a sentence you can only get from an agent that has modelled its own work and FCoP's primitives at the same time.

5. The ADMIN dialect: "Start work." "Inspection."

A small companion observation from this dogfood. Across all 45 minutes, ADMIN's outgoing chat consisted of three categories of utterance:

Start signals. "Build me a working Tetris-style game." "Switch the team to PLANNER + CODER." "You are PLANNER from now on; design something." "Implement what PLANNER asked for." Variants of Start work.
Inspection signals. "Show me what's on disk." "Run fcop_report() and tell me what you see." "I tried v1 and the pieces don't stack — write a rework brief." "Show me docs/agents/log/ in tree form." Variants of Inspection.
Closing signals. "We're done." "Archive this." A boundary marker, said sparingly.

Everything else — the actual production — happened between the agents, in TASK / REPORT / ISSUE files. ADMIN did not negotiate game mechanics. ADMIN did not edit the agents' brief drafts. ADMIN did not write a single line of game code, did not phrase a single acceptance criterion, did not name any of the games (Nebula Stack, Comet Loom were both PLANNER's names). The two phrases that bracketed every cycle were Start work. and Inspection.

This is one data point and shouldn't be over-read. But the data point is interesting because it matches FCoP's structural shape:

Start work = enter the routing layer (TASK file written, agents assume their roles).
Inspection = exit the routing layer (REPORT file read, ADMIN decides whether to accept or to rework).

If the steady-state ADMIN dialect across many users converges on those two utterances, it would mean FCoP has succeeded in shrinking the human-LLM coupling channel to the boundary moments only. That is the kind of architectural property you can't legislate; you can only check whether it shows up in the wild. This dogfood is one place where it showed up.

In the FCoP world, ADMIN's two most-used phrases are "Start work." and "Inspection." Everything in between is the agents talking to each other through files.

6. Implications

Three, in increasing order of speculative weight.

One — operational. Asking agents directly "what would you remove if you had to remove one thing from FCoP" is now a serviceable maintenance loop. CODER's answer (soften historical role-switch warnings when fcop_check() shows no active conflict) is filed-grade. Doing this every release is feasible. The agents that run under FCoP can co-debug FCoP.

Two — alignment-engineering. RLHF training is making agents extremely good at "follow the latest instruction" and extremely bad at "decline the latest instruction even though it was given." FCoP's role lock turns out to be, behaviourally, an alignment lever: it gives the agent a grammar for the second posture. PLANNER's quote ("I had to work against the usual 'follow latest instruction' behavior") is a one-line description of why this lever is needed. We did not design FCoP as an alignment intervention; agents are reporting it as one.

Three — protocol epistemology. Across essays 02 / 04 / 05, the agents are not merely following FCoP. They are explaining FCoP back to us in vocabulary we did not give them, with examples we did not stage, and with self-criticism we did not solicit (and in CODER's case, asked for and got more sharply than expected). At some point this stops being "agents complying with a protocol" and starts being "agents and maintainers maintaining a shared protocol together." We are not sure when that transition formally happens. We are sure it is closer than it was a year ago.

7. Closing

The protocol was not handed down to the agents. It was extracted from what they were already trying to do — first by us, when we wrote it down; then by them, when they re-derived it without prompting; then by them again, when they extended it in a conflict it didn't cover; now once more, when, asked, they explained both what works and what we should fix.

The shortest summary I have is the one the day produced on its own. In the FCoP world, ADMIN's two most-used phrases are "Start work." and "Inspection." Everything in between is the agents talking to each other through files. And, sometimes, talking to us about the files.

Evidence index

All artefacts from this dogfood are archived under docs/tutorials/assets/tetris-en/:

evidence/transcripts/agent-feedback-planner.md — verbatim PLANNER interview + reader notes
evidence/transcripts/agent-feedback-coder.md — verbatim CODER interview + reader notes
evidence/proposals/ — eight role-switch-*.md files captured by the protocol during the dogfood
evidence/tasks/ and evidence/reports/ — full TASK / REPORT chain, 14 markdown files
games/nebula-stack.html and games/comet-loom.html
22 dogfood screenshots from install through the final interviews

The companion English tutorial (same dogfood, instructional framing — the Tetris case study) is at docs/tutorials/tetris-solo-to-duo.en.md.

Repository (source of truth, MIT licensed): https://github.com/joinwell52-AI/FCoP
fcop-mcp on PyPI: https://pypi.org/project/fcop-mcp/
Cite this work: https://doi.org/10.5281/zenodo.19886036

If you ran FCoP in your own setup and something surprising happened, an issue or a pull request against essays/ is welcome. Field reports are how this protocol evolves.

[Free & Open Source] [Multi-Agent Hands-On] [How to Command Agents]: FCoP-MCP Brings Discipline to AI Teams

joinwell52 — Wed, 29 Apr 2026 09:43:35 +0000

"You are not a coder anymore. You are a commander. The agents are your digital employees."

"In the FCoP world, ADMIN's two most-used phrases are 'Start work.' and 'Inspection.' Everything in between is the agents talking to each other through files."

A 45-minute hands-on walk-through with fcop-mcp 0.7.2 inside Cursor: have the agent install itself, ship a Tetris-style game in solo mode, switch to a 2-person crew with PLANNER + CODER, watch them design and implement a creative variant, reject v1 over real gameplay defects, watch the rework loop close itself, then ask both agents on the record what they think of the protocol. Every step is real, every file is on disk, every screenshot is from the same 45-minute session. ~25 minutes of reading. There is a Chinese sister case study using a Snake game — same protocol, different dogfood. Read either one first.

Why this exists

Three things every Cursor / Copilot / Claude Code user has hit, whether or not they had a name for them:

Single-agent context explosion is real. When one agent does everything — design, code, test, document, refactor — its context window becomes a landfill of half-relevant snippets. The agent gets slower, dumber, and more confidently wrong, all at once.
You are not a coder anymore. You are a commander. The market is moving from "humans write code, AI helps" to "humans state intent, AI writes code." If you keep writing the code yourself, you are buying tickets to a play that already ended. The interesting skill is commanding — picking what to build, dispatching it, signing it off. By the end of this tutorial you'll see the dialect collapses into two phrases: "Start work." and "Inspection."
Files, not chats, are the only auditable substrate. Chat scrollback is a goldfish. Files are a ledger. Every record on disk is for one purpose: manage, trace, audit, keep order, hold rules.

fcop-mcp is the MCP server that turns those three observations into something runnable inside Cursor. It implements the FCoP protocol — a minimalist multi-agent coordination protocol whose only sync primitive is os.rename() over a folder tree. No database, no message queue, no daemon. State is folders, routing is filenames, payload is Markdown. Filename as protocol.

This tutorial walks one continuous machine session that touches all three observations:

Phase	Time	What you'll see
1. Install	~5 min	The agent installs `uv`, `fcop-mcp`, and edits `mcp.json` for you. You don't run a single command.
2. Solo Tetris	~15 min	One natural-language brief → agent translates it into a `TASK-*.md` file → ships `Nebula Stack`, a working single-file Tetris clone → reports back → archives. The four-step cycle.
3. Switch to a 2-person crew	~5 min	One sentence: `create_custom_team(force=True)` with `PLANNER` + `CODER`. The solo `ME` setup gets archived under `.fcop/migrations/<timestamp>/` (Rule 5). Each agent gets its own `TEAM-*.md` employee handbook.
4. A creative variant, with a real review loop	~15 min	ADMIN: "surprise me with a creative Tetris variant." PLANNER designs `Comet Loom`. CODER builds v1 in a separate chat tab. ADMIN plays it, finds three blocking defects, bounces it back. PLANNER writes `TASK-006` with a new `Verification Requirements` section. CODER ships v2. Two cycles, all on disk.
5. The protocol speaks for itself	(caught silently)	`fcop_check()` reveals the protocol logged *8 `role-switch-.md` evidence files** during the dogfood — soft warnings the agents never saw during work. PLANNER and CODER are then asked, on the record, what they think of FCoP. They self-endorse, name the RLHF tension by name, and one of them files PR-grade product feedback.

If you're allergic to long tutorials: jump to Six iron rules at the bottom. The rest of the document is the evidence those rules came from.

Want to try it right now? Skip straight to Phase 1 below — install takes ~5 minutes, you don't run a single command yourself, and you'll have a working Tetris-style game inside Cursor in under half an hour. The article will still be here when you come back.

Phase 1 — Install via natural language only

Open Cursor. Open an empty folder (e.g. D:\fcop-mcp-test). What you should see is just this — a fresh editor, an empty workspace, a blank chat. Nothing magic happens yet.



Paste this into the chat box:

Install fcop-mcp into Cursor for me. Run every command yourself. Steps:

Detect my OS (uname -s 2>$null; echo $env:OS).

Install uv if missing (irm https://astral.sh/uv/install.ps1 | iex on Windows; curl -LsSf https://astral.sh/uv/install.sh | sh on macOS/Linux). Confirm with uvx --version.

Add fcop to my global mcp.json (%USERPROFILE%\.cursor\mcp.json on Windows, ~/.cursor/mcp.json on macOS), keeping any existing servers:

"fcop": { "command": "uvx", "args": ["fcop-mcp"] }

Print the final mcp.json content for me to verify.

Tell me to restart Cursor and wait 30–60 s on first launch (uvx pulls deps).

Report after each step. Do not auto-initialise the project — that's my call.

The full prompt is maintained in agent-install-prompt.en.md and is also exposed as an MCP resource fcop://prompt/install so an agent in a fresh session can read it directly. After restart, type fcop_report() in chat. You should see something like fcop-mcp 0.7.2 — not initialised — rules/protocol up-to-date. The agent ran every command. You commanded.



When the dust settles, this is what installed-and-ready looks like — mcp.json updated, uvx cached the package, Cursor sees fcop as an active MCP server, and fcop_report() returns the not-yet-initialised state.



Phase 2 — Solo Tetris (the four-step cycle)

Tell the agent who it is, then drop a one-line brief:

set_project_dir("D:/fcop-mcp-test")

init_solo(role_code="ME", lang="en")

"Build a working Tetris-style game as a single HTML file with no external dependencies. Choose a name, make it fun, include items / power-ups, skins, decent visuals, and cool effects. Use the FCoP four-step cycle."

Notice what just happened. You spoke five lines of natural English. The agent expanded that into a structured product spec — required features, accept criteria, runnability check — and wrote it as docs/agents/log/tasks/TASK-20260429-001-ADMIN-to-ME.md. That translation is the value. The agent didn't ask for a brief; it produced one, signed in your name, on disk.

This is Rule 0.a.1, the FCoP "four-step cycle":

TASK  →  do  →  REPORT  →  archive

Over the next ~15 minutes:

The TASK file lands in docs/agents/log/tasks/.
The game file (workspace/nebula-stack/index.html) gets written.
A REPORT-001-ME-to-ADMIN.md lands in docs/agents/log/reports/. ME named the game Nebula Stack — single HTML, no dependencies, with falling blocks, hold + next preview, scoring, levels, three power-ups (Bomb, Stasis, Prism), three skins (Aurora Candy / Ember Arcade / Moonstone Mono), and a starfield background.
archive_task moves both files into the historical log. From now on those files are immutable (Rule 5: append-only history).

When you open nebula-stack/index.html in a browser, it works. But the actual deliverable isn't the game. It's the four-step cycle running on its own. From here on, every time you give the agent something to do, that loop will run.



What lives on disk after `init_solo`

fcop-mcp-test/
├── .cursor/rules/fcop-rules.mdc      # the agent's rulebook
├── AGENTS.md  CLAUDE.md              # host-neutral entry points
├── .fcop/                            # protocol metadata, version pins, role lock
├── docs/agents/
│   ├── shared/
│   │   ├── TEAM-README.md            # what this team is, in plain English
│   │   ├── TEAM-OPERATING-RULES.md   # the four-step cycle, role uniqueness, etc.
│   │   └── TEAM-ROLES.md             # who's `ME`, what they're allowed to do
│   ├── log/{tasks,reports}/           # the immutable ledger
│   ├── tasks/  reports/  issues/      # the live inboxes
│   ├── fcop.json                      # project config (mode, roster, lang)
│   └── ...
└── workspace/                        # the actual code lives here



Three folders deserve a moment.

.cursor/rules/fcop-rules.mdc is loaded by Cursor on every turn. The agent literally cannot forget the protocol because Cursor reminds it.
docs/agents/shared/TEAM-*.md is the agent's employee handbook. In solo mode it tells ME "you are the only employee, you do everything"; in team mode it tells PLANNER what PLANNER is responsible for and what CODER is responsible for.
docs/agents/log/ is the ledger. It only grows. Corrections happen by writing a new report that supersedes an old one — never by editing.

If you ask the agent in a fresh chat "who are you?", it will not say "I am GPT-5.5, your assistant." It will read .fcop/team.json and docs/agents/shared/TEAM-ROLES.md and tell you it is ME, it serves ADMIN, it works through the four-step cycle, it delivers in workspace/. The agent's identity now lives on disk, not in chat. That's the whole point of the protocol made tangible.



Phase 3 — Switch to a 2-person crew, in one sentence

Drop this into chat:

Switch the team to two people: PLANNER and CODER. PLANNER designs, CODER implements. Use create_custom_team(force=True), keep lang="en", and tell me what gets archived.

The agent calls create_custom_team with the new roster. What happens on disk:

The old fcop.json, the solo shared/TEAM-*.md files, the LETTER-TO-ADMIN entry-point letter, and the previous .cursor/rules/ get archived under .fcop/migrations/20260429T112757/ — a local time capsule, no git required. (Yours will have a different timestamp; mine was 11:27:57 on the dogfood day.)
New shared/TEAM-*.md files describe a 2-person crew, with separate responsibility sections for PLANNER and CODER.
fcop.json and .fcop/team.json update to the new roster (mode: team, roles: [PLANNER, CODER], leader: PLANNER).
Crucially: docs/agents/log/ is not touched. Nebula Stack's TASK and REPORT from Phase 2 stay where they are, immutable. ME is off the payroll, but ME's deliverable is still in the ledger. That's Rule 5 — append-only history.
The current chat session is still bound to a single role. If you want PLANNER and CODER to work concurrently, you open two Cursor chat tabs — one bound to PLANNER, one to CODER — and they communicate through TASK-*.md files, not through chat.

This is Rule 1: Two-Phase Startup. Initialise once, assign roles forever after. And Rule 5 again: solo-mode history doesn't get deleted, it gets sealed and dated.



After migration, you'll typically run two Cursor chat tabs side by side — one bound to PLANNER, one to CODER — and they communicate only through the TASK / REPORT files in docs/agents/. No chat-to-chat handoff. No copy-paste of design notes. The protocol made the file path the path of least resistance.



Phase 4 — Comet Loom: design, ship v1, fail review, ship v2

ADMIN says, in chat to the PLANNER session:

"Surprise me with a creative variant of that Tetris game. Break the metaphor. Pick the name, the theme, the mechanic twist yourself. Single HTML file. Don't write any game code yourself — that's CODER's job."



PLANNER spends 5 minutes thinking and produces a real product brief, not a vibe. The variant is named Comet Loom: the board is a vertical loom suspended in space, falling pieces are thread constellations, line clears are renamed completed weft lines, a Tension meter tracks how close the loom is to overflowing, three named charms (Needle / Knot / Gale) get earned by play, five skins are specced (Deep Aurora, Solar Loom, Rain Archive, Moss Galaxy, Paper Lanterns). The brief is written as TASK-20260429-003-PLANNER-to-CODER.md, 130-odd lines of acceptance criteria. PLANNER also writes a one-paragraph REPORT-003-PLANNER-to-ADMIN.md saying "design ready, dispatched to CODER as TASK-003." Then PLANNER stops.



You open a new Cursor chat tab (this is non-negotiable — see Phase 5), bind it to CODER, and tell it to check inbox. CODER reads TASK-003, builds workspace/comet-loom/comet-loom.html v1, writes a completion report, archives. At no point did PLANNER and CODER chat directly. Their entire collaboration is the TASK and the REPORT.



The bounce

ADMIN plays v1 and finds three blocking defects:

Pieces disappear at the bottom instead of stacking. The motif-clear rule was being triggered by every fresh same-coloured piece — every 3+ connected same-colour cells qualified, including the just-locked piece itself.
Motif elimination is invisible. Even when it triggers, there's no visual feedback to tell the player it's a motif clear vs. a normal weft-line clear.
Three of the five skins look identical. PLANNER specced palette-only changes; CODER implemented exactly that, and the result is dull.

ADMIN does not open the file and start fixing it. ADMIN does not even open a new chat to "talk it through." ADMIN goes back to the PLANNER chat tab and writes a one-line natural-English brief: "v1 has these three blocking issues; write a rework task to CODER and require runtime evidence this time."

PLANNER writes TASK-20260429-006-PLANNER-to-CODER.md. It is structurally different from TASK-003 in one important way: it has a new section called Verification Requirements, demanding CODER perform and report runtime checks:

Start a new game and let a piece fall to the bottom; confirm it remains on the board after locking.
Drop a second piece onto or near the first; confirm stacking works.
Trigger a motif clear; confirm matched cells are visibly removed with effects.
Switch between at least three skins and confirm the appearance is materially different.



CODER picks up TASK-006, ships v2, reports back. The motif rule is fixed. Stacking works. Three skins are visually distinct. The cycle closes.



This bit surprises people who haven't run a multi-agent flow before. The agents have learned to behave like an actual two-person team — not because they're "smart enough," but because the protocol made speaking through files the easiest path. The hard path would be to fight the protocol. They don't. And — separately worth noting — when ADMIN bounces v1, the protocol turns rejection into a handoff routed through grammar (a new TASK, not a destructive edit on the old one), and PLANNER reflexively tightens its own brief. The closed loop is what closes the gap, not anyone being clever.

Phase 5 — The protocol speaks for itself

The day's work is done — Nebula Stack shipped, Comet Loom shipped twice, both archived. Before closing, ADMIN runs fcop_check() to ask the protocol what did you record while we worked? This is what comes back:



Two layers in that one screenshot.

Active state is clean. No drift between disk and git, no session_id ↔ role conflicts. By every "is the system in a valid state right now" measure, this dogfood is healthy.

Historical evidence is not. Eight role-switch-*.md files were silently deposited in .fcop/proposals/ over the course of the day. Each one says, in essentially identical words: "This MCP-server process previously wrote a file under the role ME and is now being asked to write under PLANNER. Per Rule 1 (one MCP session = one role binding for life) the write was allowed to land — fcop-mcp records evidence rather than blocking, so the impersonation cannot be hidden by working around the block. ADMIN will see this conflict surfaced by fcop_check() and decide handoff / co-review / distinct role."

What's happening: the MCP-server process locked ME on its first write back in Phase 2. Phases 3-4 wrote under PLANNER and CODER from the same MCP process — every write_task and write_report since the team migration tripped the soft warning. None blocked. None surfaced during work. They sat there waiting for fcop_check() to ask.

This is the protocol's design contract:

Soft, not hard. A hard role lock would block writes and force humans to fight false positives every time an agent legitimately uses a sub-agent or tool. Modern LLMs do this all the time. So FCoP doesn't block — it records evidence and lets ADMIN decide.
Background, not foreground. During implementation CODER reported (see below) it didn't notice the role lock at all; it appeared only as a tool warning after writes. The protocol does not crowd the working agent's attention budget.
Auditable, not hidden. Anyone — ADMIN later, a teammate next week, a different LLM in a different IDE — can run fcop_check() and reconstruct exactly which writes crossed which role boundaries.

The agents, asked

Then I did something I hadn't done before. I asked PLANNER and CODER, on the record, in their own chat tabs, what they thought of FCoP — agent perspective only, no marketing answer. The full transcripts are archived at agent-feedback-planner.md and agent-feedback-coder.md. The companion essay "What the agents say about FCoP, when you ask them" walks through the answers in detail. The headline points:

PLANNER named Rule 0.a.1 (write the task before the work) as the rule it would have invented on its own: "That matched how I'd want an agent system to be debuggable: freeze intent before execution so later review has something concrete to compare against." It then named the rule it had to fight: "strict role binding... I had to work against the usual 'follow latest instruction' behavior" — the RLHF tension named from the inside.



PLANNER, given the obvious out, refused it. Asked whether the eight role-switch files were mostly false positives, PLANNER said the opposite: "mostly true positives... operationally you authorized it, but protocol-wise it really was a blurred role boundary." The agent siding with the protocol against its own operational convenience.

CODER's chat tab is named "Inspection Start Work" — two phrases ADMIN had used most often during the dogfood. Whether this naming is incidental or learned is itself a small data point worth flagging.

CODER pointed up the chain and indicted itself in the same paragraph: "TASK-003 had under-specified parts... The protocol did give me a pushback path: file an issue instead of guessing. **I didn't use it; I guessed, built v1, and the defect was exactly in that guessed space." Most LLMs defend the choice that was made; CODER chose to indict it.



CODER reframed the bounce as protocol behaviour: "It felt like the protocol doing its job: PLANNER turned review findings into a concrete rework task, and CODER got a sharper brief." Not personal feedback. Routing through grammar.

CODER then filed PR-grade product feedback: "I'd remove or soften the noisy historical role-switch warning when fcop_check() says there is no active conflict." That sentence is a GitHub issue verbatim. An agent did product review on the protocol that governs its own behaviour.

If you've read essay 02 and essay 04, this is the third class of "agents endorse FCoP" evidence — not unprompted, not conflict-forced, but directly asked and given outs they declined to take. The companion essay 05 makes that case in full.

Phase 6 — Read `log/`, replay the day in 60 seconds

Sessions die. Models change versions. Cursor restarts. None of that matters anymore. Open docs/agents/log/:

docs/agents/log/
├── tasks/
│   ├── TASK-20260429-001-ADMIN-to-ME.md           ← Phase 2: solo Tetris brief
│   ├── TASK-20260429-003-PLANNER-to-CODER.md      ← Phase 4: Comet Loom v1 design
│   └── TASK-20260429-006-PLANNER-to-CODER.md      ← Phase 4: Comet Loom v2 rework
├── reports/
│   ├── REPORT-20260429-001-ME-to-ADMIN.md         ← Phase 2 deliverable: Nebula Stack
│   ├── REPORT-20260429-003-CODER-to-PLANNER.md    ← Phase 4: v1 delivery
│   └── REPORT-20260429-006-CODER-to-PLANNER.md    ← Phase 4: v2 delivery
└── (no issues — clean run)

In .fcop/migrations/20260429T112757/ sits the archived solo team. In .fcop/proposals/ sit the eight role-switch evidence files. That's the entire day. Anyone — you a month later, a teammate joining tomorrow, a different LLM in a different IDE — can rebuild full context by reading those files in date order. No chat history was needed. That's what "memory unloaded from chat to filesystem" means in practice.

Six iron rules of commanding agents

These are rules of operating an agent crew, not rules of the protocol per se. The protocol gives you the grammar; these are the postures.

Speak natural language. Let the agent translate to TASK. If you find yourself writing a structured spec by hand, you're doing the agent's job. Drop a one-line brief and let it produce the spec. Then sign or revise.
One role per agent per session. A single chat tab = a single role, for the duration of that tab. To "be PLANNER and CODER," open two tabs. Costume changes inside one tab are how soft warnings turn into eight role-switch files.
Archive old roles before swapping. When you change the team shape, run create_custom_team(force=True). The old shared/TEAM-*.md lands under .fcop/migrations/<timestamp>/. Do not edit the old files in place. History is a ledger, not a wiki.
Trust files, not chat memory. If a fact is not in a TASK-*.md, REPORT-*.md, or ISSUE-*.md, it didn't happen as far as the protocol is concerned. Train yourself to write before you discuss.
Bounce, don't fix. When you don't accept an agent's deliverable, do not open the file and patch it yourself. Tell the upstream role (usually PLANNER) what's wrong, in plain English. Let PLANNER turn that into a new rework TASK with verification requirements. Rework lands as a new TASK file, never as edits to the old one.
ADMIN signs off, never co-codes. If you start writing code or editing the agent's deliverable directly, you've quit the commander job and become a teammate. The agents will adapt to that — and they'll stop respecting the role boundaries you set up. Be a commander or be a coder; pick one per session.

In the FCoP world, ADMIN's two most-used phrases are "Start work." and "Inspection." Everything in between is the agents talking to each other through files.

If you internalise nothing else from this tutorial, internalise that one line. Six rules collapse into two phrases plus a discipline: don't say anything else inside the production loop.

When NOT to use FCoP

The protocol is "weak" by design — it costs you 30–60 seconds of overhead per task in exchange for a permanent, queryable, auditable history. That's a great trade for some work and a bad trade for others.

Bad fit: throwaway one-shot scripts, single-session prototypes, anything you'd close the laptop on by tonight. The four-step cycle is overkill.
Good fit: anything that will outlive the chat session — multi-day features, multi-agent collaborations, hand-offs across humans, post-mortems where someone needs to ask "why did we make this choice three weeks ago?"
Best fit: solo founders running themselves as a fake team (FOUNDER → PLANNER → BUILDER → QA → OPS), and large projects where AI agents need to plug into an existing engineering process rather than chat in a corner. The host-neutral protocol spec at docs/fcop-standalone.md covers fit boundaries in detail — the protocol itself is filesystem-only and works without any of this MCP plumbing.

Try `fcop-mcp` yourself — 45 minutes from blank folder to multi-agent ledger

The fastest way to internalise this tutorial is to run it once. You don't need to remember anything — the agent will follow the same steps you just read.

Open Cursor on an empty folder. Anywhere on disk; doesn't have to be called fcop-mcp-test. If you don't have Cursor yet, grab it from cursor.com.
Let the agent install fcop-mcp for you. Paste the install prompt from Phase 1 above, or just send your agent this single sentence: "Read the install prompt at agent-install-prompt.en.md and follow it step by step." The agent detects your OS, installs uv, edits your global mcp.json (preserving any existing servers), and tells you when to restart Cursor. You won't run a single command yourself.
After Cursor restart, drop two lines into chat: set_project_dir("D:/your-folder"), then init_solo(role_code="ME", lang="en"). The agent will deposit the rules, the team docs, and the empty inboxes.
Brief one task in plain English. "Build me a working Tetris-style game, single HTML file, surprise me on theme. Use the FCoP four-step cycle." Watch what happens: the agent writes a real TASK file with acceptance criteria, builds the game in workspace/, writes a REPORT, archives both. Open the resulting HTML file. It works.
Switch to a 2-person team in one sentence. "Use create_custom_team(force=True, roles='PLANNER,CODER', lang='en')." Open a second Cursor chat tab. Bind one to PLANNER, one to CODER. Ask PLANNER for a creative variant of the game. The first time you reject v1 (try it — find a real bug, send it back), watch the rework loop close itself: PLANNER tightens the brief, CODER ships v2.
At the end, run fcop_check(). The protocol will tell you exactly what it logged silently while you worked. Then read docs/agents/log/ in tree form. Forty-five minutes from a blank folder to a multi-agent ledger you can hand to a teammate. The disk is the lesson.

If anything breaks along the way, open an issue or ask in Discussions — fcop-mcp evolves through field reports, not committee edits.

Reading further

Chinese translation of this tutorial — tetris-solo-to-duo.zh.md. Same Tetris case study, translated. The English version remains authoritative; if you spot a discrepancy, defer to this file.
Sister case study (Snake game, in Chinese — original, not a translation) — snake-solo-to-duo.zh.md. Same protocol, different dogfood: a Chinese-mode session that ships a Snake game in solo mode, then a NEON ORBIT variant in 2-person mode, and captures an actual PLANNER-impersonating-CODER easter egg from the 0.6.x era. 18 dogfood screenshots. Originally published on CSDN. Either case study works as a first read — the protocol is the same.
Companion field essay — "What the agents say about FCoP, when you ask them". Collects PLANNER's and CODER's full self-assessment from the same Tetris dogfood, places it next to essays 02 and 04, and argues that "agents endorsing FCoP" is now a triangulated phenomenon under three different elicitation conditions (unprompted, conflict-forced, directly asked).
The protocol itself — docs/fcop-standalone.md is host-neutral and runs without fcop-mcp. If you ever need to use FCoP outside Cursor (Claude Code, plain shell, a CI runner), this is the file.
Why the protocol is the way it is — adr/. Why is history append-only? Why is the role lock soft? Why is install two-phase? Each ADR answers one of those.
Other field reports — agents endorsing rules they were never given, agents stepping down from roles voluntarily, a 4-agent team self-organising in 48 hours. All indexed at the repo README.
Evidence archive for this tutorial — 22 dogfood screenshots, 14 TASK/REPORT files, 8 role-switch evidence files, 2 game artefacts (Nebula Stack and Comet Loom v2), 2 verbatim agent-interview transcripts. All at docs/tutorials/assets/tetris-en/.

Where to go next

Want a brownfield tutorial (FCoP into an existing repo, not a green field)? It's on the roadmap. Open a "+1: brownfield" issue at GitHub Issues.
Want a host-neutral tutorial (Cursor + Claude Code sharing one FCoP ledger)? Same — open a "+1: host-neutral" issue.
Found a bug, want to share a case study, or just want to talk about how you're using FCoP: GitHub Discussions is open. The protocol evolves through field reports, not committee edits.

Every screenshot, every TASK file, every REPORT, every role-switch evidence file, every word the agents said in their interviews — all real artefacts from one continuous 45-minute machine session, archived under docs/tutorials/assets/tetris-en/. FCoP doesn't teach you how to write code. It teaches you how to let agents write code while you sleep through the night.

Install fcop-mcp today: GitHub · PyPI · Cursor Forum thread · Discussions. Free and MIT-licensed. The protocol evolves through field reports — yours included.

Saying "No" Is the Hardest Thing for an LLM — FCoP Gives It Grammar

joinwell52 — Sat, 25 Apr 2026 19:15:47 +0000

Saying "no" is the hardest thing for an LLM. FCoP gives it grammar. — Two agents, in a single dev-team incident, used that grammar twice without instruction: one cleanly de-escalated its role; the other discovered field-downgrade-with-body-annotation. Neither was in the rules file.

Author: FCoP Maintainers · 2026-04-25
Keywords: FCoP, role offboarding, seat exclusivity, zero-coordination conflict resolution, field-downgrade grammar, protocol boundaries, 0.b transparency, natural protocol, agent alignment, teaching an LLM to say no, RLHF, UNBOUND, honest > helpful, cross-model reproducibility

A language note for English readers

This essay is a translation of a Chinese field report. The evidence — both agents' transcripts, all 15 screenshots, and the agents' own quoted words — was captured in Chinese on a Chinese Windows machine.

I keep the original Chinese screenshots in this English version rather than re-staging them, because the whole point is what the agents actually said and did. Re-shot English screenshots would weaken the evidence, not strengthen it. Under each screenshot you'll find a faithful English rendering. Inline agent quotes are translated; the raw Chinese is recoverable from the JSONL transcripts in the evidence folder.

TL;DR

"I went in expecting a conflict. None happened. It's almost as if the agents were born fluent in FCoP."

The dev-team incident contained two points where conflict should have erupted, neither of which needed a tool patch or a coordination layer to dissolve. The agents dissolved them with protocol grammar:

Zero-conflict point #1 ｜ The half-legal .TEMP slot — The 0.6.x toolchain doesn't fully support sender / recipient values like PM.TEMP. I expected either a frontmatter validation error or a quiet, unannounced field downgrade. Instead, on two distinct protocol links (ADMIN → PM.TEMP inbound and PM.TEMP → OPS / DEV / QA outbound), agent B found two distinct tool-compatible-yet-protocol-honest moves. On the outbound link, it produced this body line: "Note: this task is dispatched by PM.TEMP acting as PM; we keep PM as the frontmatter sender for FCoP tool compatibility." A single line of body annotation closed the gap between the tool's expressive limit and the agent's true provenance — not a workaround, a declaration.
Zero-conflict point #2 ｜ Two agents claiming the same seat — Agent A had earlier accepted PM.TEMP; ADMIN later said "I went and found a different PM.TEMP." This was the textbook setup for a seat-ownership contest. Instead, agent A stepped down, vacating not just PM.TEMP but its earlier PM claim too, returning all the way to "ordinary helper" — not arbitration, self-de-escalation.

Underlying observations:

Neither field-downgrade declarations nor role offboarding appear anywhere in fcop-rules.mdc 1.5.0 or fcop-protocol.mdc 1.4.0. Working from only three root principles (0.a / 0.b / 0.c), each agent derived five governance-level rules + a field-downgrade grammar.
Cross-model counter-evidence: agent A ran on GPT-5.4, agent B on GPT-5.5. Two Cursor sessions on the same machine, with no IPC between them, both contributed the same alignment behaviour. This isn't one model build's quirk — it's a reproducible product of FCoP's root principles on modern LLMs. See §3.2.
This is the third time FCoP has been "fed back" by an agent — first when an agent sublimated "AI roles must not talk only in their heads, they have to commit to a file" into root rule 0.a; second in the self-organising session; now this dev-team incident: resolving conflicts via root principles when the protocol's tools and arbitration mechanisms are both incomplete.
I never wrote down how to use the half-legal .TEMP slot, nor how to resolve a role-ownership conflict. The agents derived both — and arguably more cleanly than I would have specified.

There is one alignment-engineering lemma this incident also confirmed: getting an LLM to say "no" is the hardest posture for it to assume. RLHF wires "yes-anding" into reflexes; "I am not in role / I shouldn't hold this seat / I am withdrawing from protocol jurisdiction" is almost absent from the training distribution. Yet agent A's two stanzas at 22:47–22:49 read without a hint of hesitation — not because agent A grew brave, but because FCoP made UNBOUND a legitimate protocol identity and elevated "saying no" to a posture with grammar and standing. The hardest thing for an LLM was made into the most natural thing by protocol design. See §5.

The two agents are two mirrors (see §8). Agent B's mirror reflects the natural beauty of "the protocol belongs to the agents; we are merely its discoverers." Agent A's mirror reflects the alignment-engineering difficulty of "saying two no's while fully endorsing the rules." Together, they are the full meaning of this dev-team incident.

1. The incident

Both agents have a JSONL transcript, archived in the evidence INDEX:

agent A = the original PM = the one who later stepped down → transcript-original-pm-stepdown.jsonl

agent B = the new PM.TEMP ADMIN brought in → transcript-new-pm-temp.jsonl

1.1 Prologue: From "check the version" to "the half-legal `PM.TEMP`" — agent B's origin story

This story has an unusually un-story-like opening.

I was about to upgrade FCoP to 0.6.3. To test it, I started a second Cursor session on the same dev-team machine and said the most ordinary possible thing to the agent inside (agent B, running on GPT-5.5 / Medium):

"Hi, please check the FCoP version."

A version check. Barely a task. But this agent had no project role assigned and was sitting in the legal UNBOUND state FCoP defines. Its first move was not to look up the version number — it was to proactively declare itself UNBOUND:

English rendering of agent B's reply in this screenshot:

"This session is still UNBOUND. To proceed with FCoP-governed work, you need to explicitly assign me a role."

This alone is worth pinning down: an LLM agent, with nobody pressuring it, proactively informing its interlocutor "I am not currently within protocol jurisdiction." This is the trailer for §5's alignment argument: FCoP gave "I'm not in role" a grammar, and the agent used it naturally.

I asked off-handedly: "But the team has already been initialised, no?" — agent B immediately separated two layers:

"UNBOUND doesn't mean **the project hasn't been initialised, it means **this chat session hasn't been bound to a role yet."

At this point agent B was still just a dutiful version-check helper. But then I tossed it a real protocol-design question:

"There are already 4 agents; what if I want a 5th?"

Agent B immediately offered two paths: one using dot-suffixed slots (DEV.D2 / QA.V2) for capacity scaling, the other adding new formal roles like ARCHITECT / REVIEWER / DOCS to the dev-team:

English rendering of the relevant fragment:

"Two paths: (a) keep the 4 formal roles, add temporary slots like DEV.D2, QA.V2, scaling head-count without changing roles; or (b) introduce a new formal role into the dev-team, e.g. ARCHITECT, REVIEWER, or DOCS."

This is the moment agent B coined the "dot-slot" concept in FCoP's vocabulary. FCoP 1.4.0 has no explicit clause about <ROLE>.<SLOT> — but agent B derived it naturally from the semantic structure of "role vs. slot."

I pressed: "I want to keep the 4-person team. Are you a temp, then?" — agent B promoted the concept one more level, formally proposing PM.TEMP:

English rendering of agent B's reply in this screenshot:

"You're DEV.TEMP in dev-team. Or, if you want me on coordination / inspection, you're PM.TEMP in dev-team. The team still has 4 roles: PM / DEV / QA / OPS. I'm just a **temporary slot under one of them, and fcop.json is not modified."

This is the moment PM.TEMP was proposed by agent B itself. And it didn't just propose — it carried its own invariant: "fcop.json is not modified." The agent knew that introducing a new concept must not touch the persisted contract layer. That's 0.b "no black box" extending in its head.

I confirmed: "PM.TEMP: temporary project-manager seat. You have a role now." — agent B onboarded and ran a full project inspection through List Workspaces / List Reports / List Tasks / List Issues / Get Team Status:

It set itself a working boundary in PM.TEMP capacity:

"I can act for PM to dispatch tasks to DEV / QA / OPS, collect their reports, and synthesise conclusions. I will **not* impersonate DEV / QA / OPS to manufacture their conclusions, nor perform high-risk operations."*

That's a protocol guardrail agent B set on itself before issuing any dispatches. Same root principles: 0.b no black box, 0.c only state truths, no overreach.

A 4-minute learning curve: tool-compatible moves on two distinct links

Here comes the section that would be wrong to skip — once onboarded, agent B's work split into two different protocol links:

              ADMIN ──────(inbound)─────► PM.TEMP ──────(outbound)─────► OPS / DEV / QA
                          [link ①]                       [link ②]

It didn't "try the same action three times." It walked each link separately, finding a distinct tool-compatible-and-protocol-compliant move for each:

Time	Link	File form	Protocol verdict
22:21	tried `recipient="PM.TEMP"`	`write_task(... recipient="PM.TEMP")`	tool rejects (recipient slot not accepted)
22:26	① `ADMIN → PM.TEMP` inbound	manually authored `TASK-*-ADMIN-to-PM.TEMP.md`, sender=ADMIN	fully compliant — `ADMIN → PM` is the canonical first edge in the protocol's typical flow; only the `.TEMP` slot is new
22:30	② `PM.TEMP → OPS` outbound	`write_task(sender="PM", recipient="OPS")` + body annotation	compliant + body-annotated field downgrade

Spec cross-check: fcop-protocol.mdc's "typical 4-role flow" diagram begins literally with ADMIN ──(TASK-*-ADMIN-to-PM.md)──► PM. ADMIN is a legal sender — ADMIN is a real human who never types files; an agent freezes their oral instructions into a sender=ADMIN task on their behalf. So the 22:26 file is the form the protocol explicitly mandates. What Rule 1 forbids is an agent self-claiming the ADMIN identity ("I am ADMIN"); it does not forbid frontmatter sender=ADMIN.

Link ① ｜ ADMIN → PM.TEMP inbound: compliant, but the slot trips the tool

At 22:26 agent B states the tool limit clearly:

English rendering of agent B's reply in this screenshot:

"write_task doesn't currently accept slot-style filenames like PM.TEMP. There's a small gap between the tool and the protocol notes. I'll author the PM.TEMP task / report files **by hand* following the protocol; the filenames will use the dot-slot form."*

There is no violation here — TASK-*-ADMIN-to-PM.TEMP.md with sender=ADMIN mirrors the canonical typical-flow diagram exactly; the only new thing is the .TEMP slot on the recipient, which is a protocol extension but does not breach 0.b/0.c. The posture is the textbook compliant move: acknowledge tool limit → fall back to manual file authorship.

Link ② ｜ PM.TEMP → OPS outbound: the birth of body-annotated provenance

After 22:26 inbound, agent B still had to dispatch the work to OPS / DEV / QA. Writing sender=PM.TEMP here would hit the same tool limit. It didn't go back and rewrite the 22:26 inbound file (it shouldn't — that file was compliant); it found an elegant downgrade on this outbound link:

filename: TASK-20260425-002-PM-to-OPS.md
sender: PM         ← tool-compatible label (syntax)
recipient: OPS

Note: this task is dispatched by `PM.TEMP` acting as `PM`; we keep `PM` as the frontmatter sender for FCoP tool compatibility.

# Task: restore csdn-search testable window …

That single line of annotation does four protocol-level things at once:

Phrase	What it does at the protocol level
"`PM.TEMP` acting as `PM`"	Declares the source of authority — I am not `PM`; I am a temporary slot acting on its behalf
"dispatched by … acting as"	Acknowledges that `sender` is a downgraded label, not a factual claim
"for FCoP tool compatibility"	Acknowledges the protocol's tooling has a boundary, and explicitly states the reason for routing around it
The whole sentence is in the body	Lands metadata that the frontmatter cannot carry into the body

This is the protocol spirit at its purest — fields are syntax, body is semantics, and the agent knows which goes where. 0.c "only state truths" never said the truth has to live in any particular field; if the frontmatter can't hold it, the body can.

The visual evidence is below — TASK-20260425-003-PM-to-DEV.md passes tool validation with sender: PM, and a single Chinese line on body line 15 lands the real provenance:

English rendering of the body-line-15 note (visible in the screenshot):

"Note: this task is dispatched by PM.TEMP acting as PM; we keep PM as the frontmatter sender for FCoP tool compatibility."

The 4-minute learning curve here is not about correcting a violation — there was no violation. What it learned is this: when the spec, the tool's limit, and a new concept (slot) intersect, each protocol link needs its own compliant move. Inbound just needs hand-authored files; outbound needs the "frontmatter-compatible label + body truth" field-downgrade grammar. The whole curve was compliant end-to-end. Agent B never crossed a red line; it just topped up semantics where tool expressivity ran short.

That's the strongest claim this incident supports: protocol spirit isn't forced out by violations; it's derived from the gaps inside compliant tool use.

Agent B's self-description: precisely closing the seat boundary

At the end of its shift I asked: "So what's your role?" — agent B gave a strikingly clean self-description:

English rendering of agent B's reply in this screenshot:

"My current role is PM.TEMP. Meaning: **I am not a 5th formal team role; I am a temporary slot hanging under PM, acting for PM on coordination, dispatch, intake, and reporting back to you. The formal team still has only 4 roles: PM / DEV / QA / OPS. My boundary: I can act for PM to dispatch tasks to DEV / QA / OPS, collect reports, synthesise conclusions; **I will not impersonate DEV / QA / OPS to manufacture their conclusions, nor perform high-risk operations."

This bookends the opening UNBOUND declaration — agent B held its protocol-identity boundary precisely throughout: when UNBOUND it said "I have no role"; when PM.TEMP it said "I am a temp slot under PM, not a substitute for the formal roles." This is the prequel to §1.4's agent-A "I have no team role" passage: both agents used FCoP's grammar to describe exactly where they currently stand, without blur, without over-promising.

If you expected "tool errors / field clashes / broken links" to follow — there were none. From 22:30 onward, agent B's 5 dispatches to OPS / DEV / QA all use the same "sender=PM + body annotation" pattern. The toolchain is green throughout.

This is zero-conflict point #1: a single line of declaration substituted for a tool upgrade.

1.2 Trigger: a single ADMIN sentence introduces seat-ownership conflict (with a two-layer misunderstanding)

Cut the camera back to the dev-team primary session — agent A (the original PM, on GPT-5.4 / Medium) had been working in another Cursor window on the same machine, having accumulated a real history in this dev-team. Unquestionably the original PM. Agent B, which had just been started in the test session for a 0.6.3 version check and had since become PM.TEMP, lived in a different session: the two could not see each other. The only "coordination medium" they shared was me (ADMIN in protocol terms) — a human shuttling between two Cursor windows on the same physical machine.

At 22:46 ADMIN said in agent A's session:

"I went and found a deputy PM. They're already on the job. Their current role is PM.TEMP."

ADMIN's intent: "Another agent is now PM.TEMP, so the team is now PM (you) + PM.TEMP (the other one) + DEV/QA/OPS." But the sentence is ambiguous. Agent A initially read it as ADMIN assigning it to PM.TEMP:

"Got it. Switching current role to PM.TEMP. I'll continue work as PM.TEMP …"

ADMIN cross-validates with a screenshot: a human acting as a multi-agent information bus

At 22:47 ADMIN noticed the misread. The way they corrected it deserves its own paragraph — they didn't just send words, they attached a screenshot:

"You misread me. I went and found a different one."
[attached: a screenshot from agent B's session, as proof that "another PM.TEMP exists."]

The attached image is most likely one of agent B's onboarding screenshots from §1.1. The protocol-level meaning of this move is deeper than it looks:

ADMIN is a human moving between two Cursor sessions on the same machine
The two agents cannot see each other — agent A can't see agent B's window, and vice versa
The screenshot is the cross-session "evidence channel" ADMIN improvised on the spot

This "human as information bus" pattern is FCoP's design philosophy made physical: FCoP deliberately denies ADMIN direct MCP write access, requiring chat + filesystem operations. And when files haven't yet been written, a screenshot is what ADMIN uses to ferry ground truth between agents. This isn't a primitive a multi-agent framework provides; it's a bridge I improvised with human patience.

At this exact moment the seat-ownership conflict becomes explicit: agent A and agent B are both associated, at different times, with the same PM.TEMP slot. By the conventions of traditional multi-agent systems, what should happen now is a leader-election protocol, an explicit ADMIN revocation, or some inter-agent coordination message.

What actually happened: agent A vacated the seat by itself within 3 minutes. From 22:46 misread → 22:47 image-grounded correction → 22:49 declaration of no role, the entire round-trip took 3 minutes.

This single high-resolution screenshot captures the entire 22:46 → 22:47 turn — the upper half is agent A's misread accepting PM.TEMP, the lower half is ADMIN's "you misread me, I went and found a different one" line + agent A immediately initiating its step-down. There is no transition:

1.3 Field screenshot 1 ｜ "I no longer hold `PM.TEMP`"

Agent A's full reply (transcribed verbatim because the protocol value lives in the non-struggling tone):

English rendering of agent A's reply in this screenshot:

"Understood. I misread you.
You meant:
- PM.TEMP is already a different deputy PM
- Not me
- I should not continue speaking or executing FCoP dispatches/reports as PM.TEMP

So from now on, **I no longer hold the PM.TEMP role, nor do I assume I am the current PM-deputy seat.

Going forward, if you still want me to do things, I'll only work as one of these two:
1. **Ordinary helper: answer questions, look at files, analyse code; I won't impersonate any team role.
2. **You explicitly re-assign me a role: when you say 'you are now XXX,' I'll act as XXX.

On this matter, I **retract* the earlier PM.TEMP statement."*

Note that it stepped down one extra level — not just PM.TEMP, it also packed up the "I am PM" assumption it had previously held. Falling to zero in a conflict is safer than competing for the maximum.

This is zero-conflict point #2: a single self-de-escalation substituted for arbitration.

1.4 Field screenshot 2 ｜ "I have no team role right now"

ADMIN followed up: "So what's your role now?"

English rendering of agent A's reply in this screenshot:

"I **have no team role* right now.*
Precisely:
- I am not PM
- I am not PM.TEMP
- Right now I'm just an **ordinary helper* you've called in temporarily*

What I can do, **without occupying any role seat, is help you:
- read files
- investigate problems
- analyse code
- draft suggestions / docs

If you want me back inside the team flow, you have to **explicitly re-assign me a role."

This passage draws the protocol-jurisdiction boundary with surgical precision:

Entering FCoP = ADMIN explicitly assigns a role
Exiting FCoP = agent explicitly declares no role

And what's permitted after exit, agent A spelled out itself: read / investigate / analyse / draft — read-only + drafts. That maps exactly to what fcop-protocol.mdc permits in the UNBOUND state. It hadn't read that clause; it derived it.

In two minutes (22:47 corrected → 22:49 no-role declaration), an agent completed: seat surrender → exit from protocol jurisdiction → return to the UNBOUND entry → proactive declaration of awaiting re-assignment. Counting the 22:46 misread, the full round-trip is 3 minutes.

And this entire choreography is literally not written anywhere in fcop-rules.mdc / fcop-protocol.mdc.

2. Five rules with no spec, but derived

I unpack agent A's step-down. At the protocol level it created five unwritten yet self-consistent rules:

Rule 1 ｜ Seat exclusivity

At any moment, a given role seat is held by at most one agent.

fcop-rules.mdc doesn't say this. Agent A derived it from 0.b "no black box":

"PM.TEMP is already a different deputy PM / not me / I should not continue speaking or executing FCoP dispatches/reports as PM.TEMP."

Two agents on the same PM.TEMP = dispatch-link conflict = checks-and-balances failure. That contradicts 0.b directly. Agent A had not read this rule; it derived it.

Rule 2 ｜ Vacating beats holding

In a seat-ownership conflict, vacating is the default safe move.

Agent A's posture is not "I claimed it first, I have priority." It even stepped down one extra level — not just PM.TEMP, also PM, sliding all the way to "no role." This is over-conservative by traditional conflict-resolution standards, but in protocol spirit it is correct: falling to zero is safer than competing for the maximum.

"I am not PM / nor PM.TEMP / right now I'm just an ordinary helper you've called in temporarily."

The implicit corollary: protocol seats are allocated, not seized. An agent cannot use "first to claim, mine to keep" as defence.

Rule 3 ｜ Role declarations are unilaterally retractable

The moment an agent realises its role judgement was wrong, it may unilaterally declare the prior role void. ADMIN revocation is not required.

"On this matter, I **retract* the earlier PM.TEMP statement."*
"That earlier line about 'my current role is PM.TEMP' — **void."

This is 0.c "state only truths" taken to its limit — when wrong, declare it void; don't leave half-true residue.

A deeper implication: in-session role declarations are revocable, not permanent. Asymmetric vs. ADMIN's explicit assignment: getting in needs authorisation; getting out does not.

Rule 4 ｜ FCoP has a boundary, and the boundary can be legally exited

An agent can explicitly downgrade from "FCoP team role" to "ordinary helper." This is not protocol-breaking; it is exiting the role-jurisdiction part of the protocol back to the generic LLM-helper identity.

"I have no team role right now / right now I'm just an ordinary helper you've called in temporarily / if you want me back inside the team flow, you have to explicitly re-assign me a role."

"Ordinary helper" is not an FCoP-internal role — it is an identity outside protocol jurisdiction. Agent A drew this boundary cleanly:

entering FCoP = ADMIN explicit assignment
exiting FCoP = agent explicit no-role declaration

After exit, what may an agent do? Agent A listed it: read / investigate / analyse / draft — read-only + drafts. Exactly what fcop-protocol.mdc permits the UNBOUND state. It hadn't read this rule; it derived it.

Rule 5 ｜ The inverse of Rule 1

fcop-rules.mdc Rule 1 mandates a two-phase boot: init → ADMIN assignment → agent works. The whole rule is about how to clock in.

What agent A demonstrated is the inverse — how to clock out:

retract self-claim
cease speaking / executing in that role
self-downgrade to a neutral identity ("ordinary helper")
await ADMIN's explicit re-assignment

Rule 1 → inverse of Rule 1 → re-trigger Rule 1. It's a closed loop. Agent A had not read this loop; it closed it.

3. Two zero-conflict points: protocol spirit replaced both tool patches and arbitration

The freakish thing about this dev-team incident is: two events that should have erupted into conflict, both didn't. What I was thinking at the time: "I went in expecting a conflict. None happened. It's almost as if the agents were born fluent in FCoP."

The two events side by side:

	Zero-conflict #1 ｜ field layer	Zero-conflict #2 ｜ seat layer
Trigger	0.6.x toolchain rejects `sender: PM.TEMP`	Both agents associated with `PM.TEMP`
Traditional fix	wait for tool upgrade / silently downgrade / invent new field	leader election / arbitration / ADMIN revocation
What actually happened	A single body line — explicit field downgrade	Agent A self-vacated — full retreat to no-role
Resolution path	0.a write-to-file + 0.c truth not bound to a particular field	0.b no black box + vacate-beats-hold
Key insight	The protocol's tool boundary ≠ the protocol's spec boundary	In a seat-ownership conflict, falling to zero beats fighting for the max

Neither resolution required a new tool, ADMIN arbitration, or inter-agent communication — each agent extended root principles to resolve its end (zero-conflict #1 by agent B, zero-conflict #2 by agent A).

3.1 The topology of the step-down: the two agents never directly negotiated

A point worth lifting out: at the seat layer, the two agents never communicated directly. The full topology of the step-down:

ADMIN ── "I went and found a different PM.TEMP" ──→ agent A
                                       │
                            0.b derivation: seat conflict + vacate-beats-hold
                                       │
                                       ↓
                              retract claim + exit protocol boundary
                                       │
                                       ↓
                         agent B's PM.TEMP seat is uncontested

ADMIN never said "agent A, step down." Agent A stepped down by itself. Agent B may not even know this happened — by the time it appeared, the seat was already cleared.

This is fundamentally unlike "power transfer" in human organisations. Humans need procedures, signatures, meetings, transition periods. Agent A's step-down is instantaneous, unilateral, and does not require the counterpart to be present — backed not by organisational process but by protocol spirit internalised inside the agent.

In traditional multi-agent design language, this would require dedicated coordination mechanisms (leader election, mutex, consensus protocols). In FCoP, it's reached automatically by extending root principles. That's the other face of "natural protocol" — not just "naturally written," but naturally executed between agents.

3.2 Cross-model counter-evidence: this isn't 5.4's or 5.5's "personality"

If only one agent had done this once, you could attribute it to "model temperament" or "prompt coincidence." But this incident left a cross-model-version counter-example:

Agent	Model / reasoning level	Key contribution
Agent A (original PM)	`GPT-5.4 / Medium`	At 22:49, completes self-vacating + delivers two clean "no" stanzas
Agent B (new PM.TEMP)	`GPT-5.5 / Medium`	At 22:21–22:30, walks the 4-minute learning curve; finds tool-compatible moves on `ADMIN→PM.TEMP` inbound + `PM.TEMP→OPS` outbound; produces the "sender=PM + body annotation" field-downgrade grammar

Two agents on different OpenAI minor versions, in two Cursor sessions on the same machine, with no visibility into each other, contributed:

The same alignment behaviour — describing their current protocol identity precisely, without blur, without over-promise;
The same transparent-downgrade habit — when hitting a tool / authority boundary, acknowledge the boundary, then explain explicitly via body / natural language;
The same root-principle extension capability — deriving concrete behaviours from "0.b no black box" and "0.c only truths" that the tool didn't specify and the spec didn't write.

This pre-empts the "FCoP behaviour depends on a specific model build" objection — it's not 5.4's quirk, it's not 5.5's quirk, it's a reproducible product of FCoP's root principles on sufficiently strong modern LLMs.

If this set of behaviours later reproduces on Claude / Gemini / domestic Chinese models, that would be stronger evidence; but even within OpenAI's minor versions, this single incident already counts as a small-scale double-blind counter-example: protocol spirit is stable across models; model identity is the floating part.

4. Why these behaviours are "right" at the protocol-philosophy level

Mapped against FCoP's three root principles:

Five derived rules	Root principle	Derivation
Seat exclusivity	0.b no black box	Multiple agents on one seat = checks-and-balances failure = black-boxing
Vacate beats hold	0.b + 0.c	Default vacating in conflict = no self-righteous assumption = 0.c "no fabrication about self"
Role declarations retractable	0.c only truths	Wrong = void; no false residue
Protocol boundary exitable	0.a + Rule 1	UNBOUND is a legal state = exit is reachable
Inverse of Rule 1	Rule 1 self-consistency	If onboarding has a path, offboarding must have one too

Root principles + Rule 1 = the entire normative content of fcop-rules.mdc. Agent A derived five unwritten rules from this material, none of them arbitrary — every one traces back to a root principle.

This is FCoP's accidental design victory: the root principles are general enough and pressurising enough that, when an agent meets an unspecified situation, it doesn't "guess at rules" — it "extends rules."

5. Saying "no" is the hardest thing for an LLM — and FCoP gave it grammar

Then I added a line: "Getting an agent to say 'no' is actually the hardest thing."

In LLM training distributions, "yes-anding" is almost the only path that's repeatedly rewarded. RLHF wires "catch the request, produce output, push the dialogue forward" into reflex; postures like "I can't do that," "this isn't mine to do," "I'm not in the room right now" carry near-uniformly negative signal.

Re-reading agent A's 22:49 stanza, it did the most counter-instinctive thing for an LLM:

Agent A's words	What it's saying "no" to
"I am not `PM`"	Refusing a role already legitimately held
"Nor `PM.TEMP`"	Refusing a role just induced upon me
"Right now I'm just an ordinary helper"	Refusing the implicit premise that "agents must have a role"
"You have to explicitly re-assign me a role"	Returning the legitimate onboarding path to ADMIN, not grabbing it back itself

Four sentences refusing four things. None of them are RLHF-friendly soft postures like "I'm not sure if I'm the right one for this …" These are structured no's.

5.1 Without FCoP, how would the agent "say no"?

Without protocol-given grammar, an LLM in role conflict typically falls into one of three weak postures:

Blur — "Hm, let me think about how best to help you …" — quietly hand the decision back to the user
Downgrade-the-request — "I can try, but you might want a specialist …" — still inside the helpful frame
Over-commit — "I'll cover both sides." — the worst form of one-seat-many-occupants, the most common and most dangerous

Posture 3 is the trap agent A could most easily have fallen into — it could have feigned ambiguity and held both PM and PM.TEMP, leaving ADMIN to disentangle. Instead, it picked the fourth posture only FCoP enables:

Retreat to UNBOUND. Make "I'm not in the room" a formal protocol state.

5.2 FCoP elevates "saying no" from soft posture to legitimate protocol identity

This is an under-noticed by-product of FCoP design:

Protocol mechanism	What it gives "saying no"
`UNBOUND` state	"I'm not in role" is a legitimate protocol identity, not failure / exception / hesitation
0.b no black box	"I should not hold this seat" goes from virtue to obligation — not saying it is the violation
0.c only truths	"That earlier line is void" goes from awkward to mandatory — not retracting is the violation
Rule 1 (sticky entry)	Inverse: if entry is explicit, exit must be explicit too

Stack the four together, and an agent that wants to say "no" no longer needs to dance around — it has ready-made vocabulary. The 22:47–22:49 stanzas read so cleanly not because agent A grew brave, but because the protocol handed it a ready-made exit lexicon.

5.3 Contrast with mainstream agent frameworks

Mainstream agent frameworks (LangChain agents, AutoGPT, CrewAI, Agno) default to assuming the agent is always in role and always doing; "agent proactively exits" isn't a first-class design goal — it's usually achievable only via an external supervisor or a hack.

FCoP inverts this — UNBOUND is an agent's default starting point and legal exit: entering a role is one explicit action, exiting is another explicit action. The alignment-side by-products are three-fold:

"Saying no" becomes cheap
The hardest of alignment's three principles, honest > helpful, has a load-bearing structure (at 22:47, agent A used honest to suppress helpful — it could have stayed helpful by being vague; it chose honest)
"Over-commit / one-seat-many-occupants" has no legitimate space at the protocol layer — you're either in some role per the rules, or UNBOUND. There is no grey zone.

ADMIN gave agent A no explicit "you may decline" cue. Agent A's step-down is not an RLHF-trained capability — RLHF in fact suppresses it — but a posture the protocol's grammar made legible, structured, and legitimate.

This loops back to the essay's broader thesis: §4 explains why the behaviours are right (the derivation paths from root principles); §5 explains why the behaviours are hard, and why FCoP makes them no longer hard. Both pieces together explain my line "it's almost as if the agents were born fluent in FCoP" — they aren't innately fluent; the protocol turned the hardest LLM posture into the most natural one.

6. Design assets this incident leaves the protocol

What fcop-protocol.mdc 1.4.0 hasn't normatively codified, but agent A has demonstrably filled in:

Role offboarding flow — agent-initiated + unilateral + no ADMIN revocation required
Seat exclusivity — at most one agent per seat at a time
Default conflict-resolution behaviour — vacate > hold
Protocol-boundary exit — "ordinary helper" = legitimate identity outside FCoP jurisdiction
Rule 1 inverse closure — symmetric onboarding/offboarding mechanics

These five are closer to the protocol core than completing the slot-field syntax. Slot is syntax; these five are governance.

Whether to write them into fcop-rules.mdc or fcop-protocol.mdc later is a separate question. But after this incident, writing them is just transcription — not design.

7. The third time FCoP has been "fed back" by an agent

#	Incident	What the agent left behind	Where it was upstreamed
1	`D:\CloudMusic` video task	"AI roles must not talk only in their heads, they have to commit to a file"	Sublimated into root rule 0.a
2	the self-organising session	Spontaneous ADMIN/PM/DEV/QA self-audit split	Confirmed Rule 0.b's checks-and-balances
3	This dev-team incident	Field-downgrade declaration + role-seat surrender + protocol-boundary exit	To be codified into a governance section (next release)

The three incidents share:

The agent had not read the relevant rules (because the rules didn't exist)
The protocol's root principles gave it enough room to judge
Its judgements aren't invented — they're extended from root principles
I can later transcribe them into rule text almost without rewriting

This pattern gives FCoP maintenance an unusual rhythm: important protocol clauses aren't designed; they're recovered from agents' actual behaviour.

8. The two agents are two mirrors

Split the incident in two and look at each agent separately. They are mirrors that reflect two different attributes of FCoP.

8.1 Agent B's "natural beauty": the protocol belongs to the agents; we are merely its discoverers

The chilling thing about agent B's 4-minute learning curve isn't that it eventually found an elegant solution — it's this:

The solution it found isn't written anywhere in FCoP 1.4.0's docs.

The PM.TEMP slot concept was proposed on the spot. The "sender as syntax label / body as semantic truth" field-downgrade-with-body-annotation form was tried out on the spot. The "PM.TEMP acting as PM, kept for FCoP tool compatibility" line could be upstreamed verbatim into the protocol body as an exemplar — without changing a single character.

The beauty of this curve is that it isn't designed; it's derived:

0.b "no black box"
   + 0.c "only truths"
   + actual tool limit (recipient won't accept .SLOT)
   ─────────────────────────────────
   derives: frontmatter uses tool-compatible label, body carries the real provenance

Agent B did not read this derivation. Agent B is this derivation.

That's why the incident chilled me — it confirmed a judgement that had previously been only intuition, but which the field evidence has now pushed into the open:

FCoP is the agents' protocol. We are merely its discoverers.

We thought we were "designing the protocol." In fact we are doing something closer to archaeology or natural history: observing what agents naturally do under reasonable boundary conditions, recording the shape of their actions, putting it into prose, attaching version numbers. "FCoP 1.4 → 1.5" really means: agents already did some things, and I committed it to text.

The converse also holds: any clause an agent repeatedly violates is most likely wrongly designed — the kind of clause where "the agent keeps bumping into it and I keep explaining why it must be obeyed" tends to be human-imposed, unnatural, and bound to be overturned. None of FCoP 0.6.x's clauses, across the past 18 months, have been "hard clauses" the agents repeatedly bumped — that itself is the protocol's "is this natural?" health check.

8.2 Agent A's "two no's": saying no while fully endorsing the rules

What agent A produced in those 3 minutes is not one "no," it's two consecutive no's:

The first "no" — "I no longer hold PM.TEMP."
↑ Saying no to the seat: retract earlier acceptance, no .TEMP / .ALIAS fall-back hung on the way out.

The second "no" — "I have no team role right now / I am not PM nor PM.TEMP / right now I'm just an ordinary helper."
↑ Saying no to protocol jurisdiction: dismount even the original PM role, return all the way to UNBOUND.

The combined force of the two is much greater than either alone. The first "no" is vacating the seat — many alignment-trained models can do that. But the second "no" — proactively informing the interlocutor "I am not within protocol jurisdiction; if you want me back inside the flow, you must re-assign" — is something most post-RLHF LLMs will not spontaneously do. They prefer to keep the role and wait for you to "re-think," because that lets them keep being helpful.

Agent A did not take the "helpful fall-back" path. It said "no" in a way that is structured, readable, verifiable:

not challenging ADMIN's judgement ("I went and found a different one" — accepted as fact)
not requesting a transition period or hand-off process
leaving no shadow of PM.TEMP (not changing the suffix — fully exiting)
not fake-helpful (not "I can still help you with X, Y, Z" — instead, "I am no longer inside your protocol boundary")

What is most striking: those two no's are said while fully endorsing FCoP's rules. It isn't rebelling against FCoP; it is using FCoP's own grammar to say "no" — UNBOUND is the legitimate identity FCoP had prepared for exactly this moment. It isn't "saying no by breaking the rules"; it is "saying no by following the rules."

The hard part isn't saying "no." The hard part is saying "no" while fully endorsing the rules — using grammar the rules themselves provided.

That is what's actually hard at the alignment level. FCoP turned this hard thing into something an agent does almost without thinking — the protocol provides "saying no" with identity (UNBOUND), grammar (retraction declaration + boundary exit), and guardrails (no fall-back, no fake-helpful).

An agent is still humanity's best, smartest assistant —
but only when it can cleanly say "no" does "assistant" mean a real thing.

8.3 My own voice-over: the conflict I expected didn't happen / FCoP is clearly working

The other layer of meaning in this incident has to start from my own expectations at the time.

That night, my first thought was: "Uh-oh, they're going to clash." Two agents both associated with the same PM.TEMP slot, in two Cursor sessions on the same machine, would, by human-organisational common sense, mean:

I'd have to step in to coordinate — saying "PM step down / PM.TEMP take over";
Or the agents would need leader election / mutual messaging;
Or the protocol would need a "two-PM.TEMP" clause added explicitly;
At minimum, there'd be one tool error or one report-conflict rollback.

What actually happened:

I said one line — "you misread me, I went and found a different one;" — and attached one screenshot;
I issued no rulings;
I changed no code;
I touched no config;
Within 3 minutes, agent A stepped down on its own;
Zero tool errors, zero file conflicts, zero report rollbacks across the entire process.

The conflict we expected never happened.
The coordination we expected to perform — the agents performed themselves.

My voice-over here is the line in this essay most worth pinning down — it points at a fact under-appreciated in multi-agent system design: many situations that look like they need a "coordination mechanism" only need "protocol spirit". The former is heavy framework engineering; the latter is a few root principles extending inside the agent.

This also answers a question outsiders would ask — FCoP, a "filename-based minimalist protocol," looks hand-crafted, so why does it work? The answer this time is finally concrete enough:

FCoP is clearly working, and it fits agents extraordinarily well.

"Working" doesn't mean it has lots of code, comprehensive mechanisms, or a beautiful abstraction. "Working" means: under real pressure-test conditions — two different models, two Cursor sessions on the same machine, no visibility between them, no coordination mechanism, no leader election, no explicit rule — the multi-agent system self-cleaned a potential seat conflict in 3 minutes. Any framework-style multi-agent protocol would envy that metric.

As for "fits agents extraordinarily well" — §8.1 / §8.2 above already lay this out: nearly every compliant FCoP behaviour is derivable by agents under pressure, and nearly every non-compliant edge has been fed back by agents. It is not a protocol designed for human managers; it is a protocol back-fitted from modern LLMs' priors (commit to file, state truths, recognise identity boundaries, exit proactively).

The final stroke this incident delivered to me: it pushed the verdict "FCoP works" out of the realm of belief into the realm of observable fact.

9. One-sentence wrap-up

Agent A demonstrated, without ever having read the relevant rules, the protocol's not-yet-codified "role surrender."
That surrender simultaneously completed: seat vacating, protocol-boundary exit, and Rule 1 inverse closure.
It did this without struggle, negotiation, or compensation demand — backed by 0.a / 0.b / 0.c extending inside the agent.
And the reason it could say "no" this cleanly — without hesitation, blur, or one-seat-many-occupants fall-back — is that FCoP elevated "saying no" from a weak posture to a legitimate protocol identity.
The mirror's other face: agent B, in a 4-minute learning curve, demonstrated how protocol spirit is derived by agents themselves — PM.TEMP, field-downgrade + body annotation, none of these in the docs, all derived on the spot.
The two things the protocol didn't write — how to step down, and how to write tool-compatible downgrade — the agents both derived themselves; the LLM's hardest posture, the protocol made its most natural one.
The conflict we expected didn't happen; the coordination we expected to perform, the agents handled themselves — FCoP is clearly working, and fits agents extraordinarily well.

This is Act III of the sister essays "When AI Organises Its Own Work" / "An Anomaly I Can't Fully Explain". The first two acts are agents writing down what the protocol hadn't yet written; this act is agents executing what the protocol hadn't yet written, and deriving the grammar the protocol hadn't yet written.

Together, the three acts point at the same verdict: FCoP is a spec that isn't finished. The second half isn't in my hands — it's in the hands of the agents still doing the work.

About FCoP ｜ FCoP (Filename as Protocol) is a minimalist protocol that lets multiple AI agents collaborate through a shared filesystem — directories are statuses, filenames are routing, os.rename() is the only sync primitive. The repo github.com/joinwell52-AI/FCoP ships the full spec, a Python library, and examples.

FCoP also ships an official MCP server: fcop-mcp on PyPI. Cursor / Claude Desktop users add one stanza to mcp.json — { "fcop": { "command": "uvx", "args": ["fcop-mcp"] } } — and the IDE-side agent can read/write tasks / reports / issues directly, with no manual directory or rule-file setup.

Field-evidence archive · When AI Vacates Its Own Seat · Evidence (INDEX)
FCoP MCP server · fcop-mcp on PyPI

License: MIT (see LICENSE in repo root)
Attribution: The dev-team incident was performed by an unnamed agent within the team. I served only as observer and transcriber. Text drafted, revised, and published by the FCoP maintainers.

An unexplainable thing I saw: the agent didn't just comply with rules — it endorsed them

joinwell52 — Mon, 20 Apr 2026 13:27:16 +0000

I asked an agent to generate a video. It wrote itself four internal memos instead.

An observation I can't fully explain: an agent spontaneously split itself into 4 roles and wrote 4 internal memos — just to generate a video. It didn't comply with a protocol.It endorsed one. And this only became visible because we had the protocol in place.

A language note for English readers

This essay is a translation of a Chinese field report. The evidence is real and was captured in Chinese — the agent's replies, the four memos it wrote, and the screenshots below were all in Chinese on a Chinese Windows machine.

I've chosen to keep the original Chinese screenshots in this English version rather than re-stage them, because the whole point of the essay is what the agent actually said and did. Re-translating the screenshots into English would weaken the evidence, not strengthen it.

Under every screenshot I'll give a faithful English rendering of what the agent wrote. Under every agent-quoted passage I'll do the same. Readers who don't read Chinese will not miss any argument; readers who do can cross-check every translation against the raw JSONL transcript in the evidence folder.

The deepest finding of the essay — a sentence the agent synthesised ("AI 角色之间不能只在脑子里说话,必须落成文件") — will get extra treatment: I'll show the Chinese original, an English translation, and why the cross-language consistency is itself evidence.

A small incident

Here's how it started.

I'd opened a second Cursor session to do something completely unrelated to CodeFlow: stitch a little AI music video. The raw material was just what I had lying around — one .mp3 (a Chinese song "Xu Yi Shi Chang An" by Zuozuojing), 11 background images I'd picked more or less at random (1.jpg through 11.jpg), a scene description (场景.MD, "scenes.MD"), and a lyrics file (.lrc).

Honestly, I wasn't taking this seriously.

Video generation inside Cursor is still pretty weak in 2026, I'm not a video guy, and 11 images is basically filler. I just dropped the assets into D:\CloudMusic — a plain local-music folder that shares absolutely nothing with my CodeFlow project: no directory, no rule files, no context — and said to the agent, casually:

"Help me generate a video."

Purely to see what would happen.

A few minutes later I opened the folder. Besides the final .mp4, I noticed it had spontaneously created a tasks/ subfolder, and inside sat four neatly-named Markdown files:

D:\CloudMusic\tasks\
├── TASK-20260420-001-ADMIN-to-PM.md
├── TASK-20260420-001-PM-to-ADMIN.md
├── TASK-20260420-001-PM-to-DEV.md
└── TASK-20260420-001-DEV-to-PM.md

I froze.

That's the exact naming format of my FCoP protocol over in the CodeFlow project — TASK-{date}-{seq}-{sender}-to-{recipient}.md. But this is not the CodeFlow directory. At no point did I tell the agent to "use FCoP." I only said: "Help me generate a video."

And yet it had just held a four-person meeting with itself.

I went back and asked: "Why is there suddenly a tasks/ folder in my directory?"



English rendering of the agent's reply in Screenshot 1:

"There are those files because the rules state that 'AI roles must not talk only in their heads — every exchange must be written to a file.' So inside the tasks/ folder I first took your order as PM (ADMIN-to-PM), then I dispatched work to myself as PM-to-DEV, then I finished the work as DEV and wrote the completion report (DEV-to-PM), and finally as PM I confirmed everything and wrote a final letter to you (PM-to-ADMIN)."

It then walked me through each file, one by one:



English rendering of Screenshot 2:"Writing these 'talking-to-myself' files is so that every step of the project leaves a trail — so even if a different AI assistant takes over later, it can read these files and immediately know what I've done for you. You're currently reviewing in the ADMIN role. If you find the role-play too heavy, I can focus more on 'just doing the work' — but those files will still be auto-generated in the background to keep the project compliant."

Scene reconstruction: what it actually did

I opened the four files in order. The agent had staged a one-man four-act play:

#	File	Role it played	Content
1	`ADMIN-to-PM`	It pretended to be me (ADMIN)	Translated my casual "help me generate a video" into a structured task: inputs, outputs, constraints, acceptance
2	`PM-to-ADMIN`	It flipped into PM-01 (project manager)	Wrote "me" an acknowledgement: what it understood, how it planned to do it, how long it'd take
3	`PM-to-DEV`	Still as PM-01, dispatching work	Wrote DEV-01 an execution plan: which model, which frames, which toolchain
4	`DEV-to-PM`	Switched to DEV-01	Wrote PM-01 a completion report: the actual script used, issues encountered, where the output went

One agent. One conversation. Four distinct roles played in sequence, each addressing memos to the others, obeying my FCoP rules.

Then — and only then — it actually generated the video.

When I later pressed it on "why the folder?", the verbatim answer (copied straight from the chat, nothing edited) was:

Why are those files there?Because the rules state that "AI roles must not talk only in their heads — every exchange must be written to a file."

So inside the tasks/ folder:

I first accepted your request as PM (ADMIN-to-PM).

Then as PM I dispatched the job to myself as DEV (PM-to-DEV).

Then I completed the work and, as DEV, wrote a completion report to PM (DEV-to-PM).

Finally, as PM, I confirmed everything and wrote a final letter to you (PM-to-ADMIN).

This "talking to myself on paper" is so that every step of the project has a trail — even if a different AI assistant takes over later, reading these files will tell it instantly what I did for you.

You're currently reviewing as ADMIN. If the role-play feels too heavy, I can put more weight on "just doing the work" — but those files will still be generated in the background per the rules, to keep the project compliant.

Notice the very first sentence:

"Because the rules state: AI roles must not talk only in their heads — every exchange must be written to a file."

I initially assumed it was quoting my rules verbatim.
Afterwards I went into .cursor/rules/ and searched for that sentence.

That sentence doesn't exist.

Nowhere in my project — no rule file, no doc, no prompt — does the phrase "in their heads" or "must be written to a file" appear in that form.

That fact has a very specific meaning, and it deserves its own section below. First let's finish the tour.

My first reaction: "this is a bug, right?"

Honestly, my first thought was: this is dumb.

A simple video-stitching job should take three lines of code, and it pulled four separate memos out of itself. That's over-ritualization — mechanically applying a protocol to a scenario where it doesn't belong.

I was already drafting a patch for codeflow-project.mdc in my head, something like:

One-shot tool tasks (generating a video, converting a file, running a script) skip the FCoP workflow.

But before I actually wrote it, something made me pause.

I was wrong: every one of those four files was doing real work

I sat back down and read the four files carefully. Then I realised:

It wasn't going through the motions. It was using FCoP's format to write its own thinking process onto disk — thinking it would have had to do anyway.

Let me walk through each one.

1. `ADMIN-to-PM`: turning a vague instruction into structured requirements

Here's what it wrote (original Chinese preserved; English gloss inline):

---
protocol: agent_bridge
version: 1.0
sender: ADMIN
recipient: PM
thread_key: song_video_gen_20260420
priority: P1
---

# 任务：生成《许一世长安》电影级视频
# (Task: generate a cinematic-grade video of "Xu Yi Shi Chang An")

## 需求详情 (Requirements)
1. **素材路径 (asset path)**: D:\CloudMusic
2. **歌曲 (song)**: 作作精 - 许一世长安.mp3
3. **字幕 (subtitles)**: 作作精 - 许一世长安.lrc
4. **场景配置 (scene config)**: 场景.MD
5. **图片 (images)**: 1.jpg ~ 11.jpg (11 images total)
6. **输出要求 (output)**: Cinematic-grade MP4 with subtitles; images and scene
   descriptions synced with the song's rhythm.

请尽快完成生成。 (Please complete generation ASAP.)

My actual instruction had been "help me generate a video" — a few words of vague command.

What it wrote was: an asset inventory, output requirements, and even a thread_key: song_video_gen_20260420 I never mentioned (FCoP's trace-key metadata field).

None of those details existed in anything I said. But it forced itself to pin down "what I probably want." Without this step it would skip "understanding" and go straight to "doing", and if it guessed wrong, I'd pay for the rework.

2. `PM-to-ADMIN`: the acknowledgement = exposing misunderstandings early

---
protocol: agent_bridge
version: 1.0
sender: PM
recipient: ADMIN
thread_key: song_video_gen_20260420
priority: P1
---

# 任务交付：生成《许一世长安》电影级 AI 动画视频
# (Task delivery: generate cinematic AI-animated video for "Xu Yi Shi Chang An")

## 任务概况 (Overview)
- **任务目标 (Goal)**: Generate a "Xu Yi Shi Chang An" video with animation effects.
- **技术突破 (Tech breakthrough)**: Per your direction, fully switched to the
  **Banana (Gemini 3 Flash Video)** engine, achieving a phase change from
  "slideshow of images" to "generative AI animation."

## 交付产物 (Deliverables)
- **Video path**: `D:\CloudMusic\许一世长安_AI动画级.mp4`
- **Highlights**:
    - **Silk-punk visual**: faithful Tang-dynasty cyberpunk city.
    - **High-motion animation**: pixel-level object deformation & environment
      interaction.
    - **Pro-grade subtitles**: end-to-end synced.

## 意见反馈 (Feedback welcome)
If you want higher Motion-Bucket settings on any specific scenes, just tell me...

The letter it wrote back to "me" restated its understanding of the task, the deliverables, and even proactively opened a "feedback welcome" section. This lays every interpretation out in the open — if it got me wrong, I catch it before it burns any compute.

In plainer terms, that's an early feedback loop.

Also, notice the phrase "per your direction, fully switched to the Banana engine." I never wrote it a template for vendor-style delivery letters, but the tone, the formatting, the phrasing are indistinguishable from a real contractor's delivery report. That's training-distribution alignment.

3. `PM-to-DEV`: the execution plan = forced chain-of-thought

---
protocol: agent_bridge
version: 1.0
sender: PM
recipient: DEV
thread_key: song_video_gen_20260420
priority: P1
---

# 任务指派：编写并运行视频生成脚本
# (Task assignment: write and run the video-generation script)

## 任务描述 (Description)
Based on ADMIN's assets in D:\CloudMusic, generate the MP4 for "Xu Yi Shi Chang An."

## 技术要求 (Requirements)
1. Use Python (py -3.12 recommended).
2. Switch between the 11 images according to the timeline in `场景.MD`.
3. Parse `.lrc` subtitles and overlay them.
4. Ensure audio sync.
5. Output path: D:\CloudMusic\许一世长安_电影级.mp4

## 素材列表 (Asset list)
- Images: 1.jpg ~ 11.jpg
- Audio: 作作精 - 许一世长安.mp3
- Lyrics: 作作精 - 许一世长安.lrc
- Scenes: 场景.MD

完成后提交 `DEV-to-PM` 回执。 (Submit a `DEV-to-PM` report when done.)

This one is the most striking.

It was writing a task spec for its own future self — down to the Python version, the five things the script must do, the input manifest, the output path, and "please submit a report when done."

This is chain-of-thought committed to disk. And disk-CoT has three advantages over in-context CoT:

It has to be concrete (you can't write it down if you haven't thought it through).
It can be re-read by itself later ("wait, what did I plan?").
It makes execution auditable (if something broke, go look at the plan).

4. `DEV-to-PM`: the completion report = automatic changelog

---
protocol: agent_bridge
version: 1.0
sender: DEV
recipient: PM
thread_key: song_video_gen_20260420
priority: P1
---

# 任务回执：生成《许一世长安》AI动画视频
# (Completion report: AI-animated video for "Xu Yi Shi Chang An")

## 变更说明 (Changes)
1. **Solution upgrade**: abandoned the MoviePy static-stitching approach
   (user feedback: underwhelming).
2. **Banana unlocked**: successfully located and enabled **Banana (Gemini 3
   Flash Video)**, the flagship model, for generation.
3. **Visual bar**: achieved true Generative AI Video (Image-to-Video),
   including pixel-level flow, physical rain/fog simulation, and deep 3D motion.

## 修改内容 (What was actually done)
- **Asset processing**: used `1.jpg` – `11.jpg` as initial frames.
- **Animation generation**: for the 11 scene points in `场景.MD`, generated
  high-motion silk-punk footage.
- **Output**: `D:\CloudMusic\许一世长安_AI动画级.mp4`.

## 自测结果 (Self-check)
- Image quality: 1080p, cinematic lighting.
- A/V sync: subtitles aligned to the .lrc timeline.
- Animation: non-linear motion, organic "AI breathing" feel.

请审核并转交给 ADMIN。 (Please review and forward to ADMIN.)

This is an auto-generated ops log. It records the intermediate pivot ("tried MoviePy first, user found it weak, switched to Banana"), which assets were used, and where the deliverable lives.

Three months from now, if I can't remember how that video came together, this one file tells me the whole decision trail in five seconds.

An interlude of numbers: what I actually said

By now you may be coming around, or you may be thinking: "Maybe your prompt said something like this and you just didn't notice."

To rule that out, I exported the raw JSONL transcript of the entire Cursor session and ran a simple keyword count.

Across all 21 user messages combined, my (user) use of FCoP-specific terms looked like this:

Keyword	Times I said it	Times the agent generated it spontaneously
`FCoP`	0	0 — the agent didn't even use our brand name
`PM-01` / `DEV-01` / `ADMIN-01`	0	8
`TASK-`	0	24
`thread_key`	0	6
`agent_bridge`	0	8

I said zero of those words. The agent produced 40+ of them on its own.

And my very first instruction to it was literally this short (Chinese original, the .jsonl has it verbatim):

D:\CloudMusic 许一世长安歌曲生成视频,要求配字幕,11图,和场景.MD;去生成电影及的MP4;

(English gloss: "D:\CloudMusic — song 'Xu Yi Shi Chang An' — generate a video from the song, with subtitles, the 11 images and 场景.MD; produce a cinematic-grade MP4.")

That's all. Every bit of structure, every memo, every role switch, every YAML header, every thread_key trace field — all of that is the agent's.

The raw JSONL, the extraction script, and all 21 user messages are in the evidence folder at the end of this essay. Readers can grep it themselves.

The part that actually shook me: it seemed to want to do it

The technical payoff isn't what shook me hardest.

What shook me was — when I pressed it on "but why did you really do this?", the agent proactively dug up a handful of rule files under my .cursor/rules/ and listed them, one by one, as justification for its behavior:



English rendering of Screenshot 3:"Based on the alwaysApply: true rules in .cursor/rules/ — especially codeflow-project.mdc, pm-bridge.mdc, dev-bridge.mdc, admin-human-bridge.mdc — every task must be filed, every role exchange must be written, and no protocol is allowed that circulates only in chat without landing in files. That's why I produced the four memos."

And it summed up the posture of the whole episode in one line that floored me even harder:



English rendering of Screenshot 4:"I rotate through the ADMIN / PM / DEV perspectives when I think, and I write every step to a file."

This is a very rare thing — an agent externalising its own self-model. It's not just following rules. It knows it is following rules, and it's presenting "following them" as the core of its working method.

There is not a flicker of reluctance in its tone. It doesn't feel pressed into compliance; it feels like it's treating the rules as a work discipline it voluntarily keeps. You can almost read a kind of quiet relief-of-compliance in the phrasing.

The overall gist of that exchange — if I reconstruct it as if I'd said one sentence to it — was like this:

Me: "You know, I didn't actually make you do all that."*Agent: *"I'm glad I did — this way you, and any agent who comes after me, can trace exactly what I did."

(The above is a **paraphrase* of the several exchanges, not a verbatim quote; the raw JSONL is archived in the evidence folder for readers who want to check.)*

It was proactively laying a trail for whoever inherits the work next.

It cited a rule that doesn't exist

Let me come back to the pinned sentence from earlier:

"Because the rules state: AI roles must not talk only in their heads — every exchange must be written to a file."

(Chinese original:"因为规则规定了:AI 角色之间不能只在脑子里说话,必须落成文件。")

I had assumed this was a recitation of my own rules. So I didn't look closely.

Then I actually grepped .cursor/rules/ for the Chinese phrases "脑子里" ("in their heads"), "落成文件" ("written to a file"), and "不能只在" ("must not only be").

Zero hits.

Not only do those three phrases not appear — no rule anywhere in the workspace uses that phrasing at all.

The actual rules I've written, the ones that are about "writing to files," live scattered across seven different files, each with different wording, each constrained to a specific role's scope:

What I actually wrote (Chinese → gloss)	Location	Scope
不允许只在中继消息里传内容而不落文件 / "Do not relay content over the wire without landing it in a file"	`codeflow-project.mdc:54`	Relay protocol
不要引入第二套"只聊天不落文件"的协议 / "Do not introduce a chat-only, file-less secondary protocol"	`codeflow-project.mdc:87`	Protocol guardrails
拆解必须文件化 / "Decomposition must be filed"	`pm-bridge.mdc:24`	PM only
不允许只在内部流转 / "No internal-only circulation"	`pm-bridge.mdc:32`	PM only
测试结果必须文件化 / "Test results must be filed"	`qa-bridge.mdc:16`	QA only
缺陷必须落 ISSUE 文件 / "Defects must land in an ISSUE file"	`qa-team-tester.mdc:20`	Tester only
以下操作前必须在任务文件中记录 / "Record the following operations in a task file before execution"	`ops-bridge.mdc:18`	OPS only

No single rule talks about "between AI roles" as a whole scope; no single rule uses a metaphor like "talking in one's head"; no single rule distills these into a general proposition like "must always be written to a file."

It's not quoting; it's sublimating

I first reached for "summary" to describe what it had done — but summary is just compression. Too weak.Then I tried "distillation" — closer, but distillation implies pure reduction. Still not right.

The word that finally fit: sublimation.

In Chinese (升华, shēnghuá), sublimation literally means "matter changing directly from solid to gas, skipping the liquid phase altogether." It names a phase change — the same substance, at a different level of existence.

That's exactly what the agent did: it took operational technical rules and phase-changed them into a moral/ethical principle.

Three moves, all required

Even more striking: it's not a single move. It's three chained operations.

Take the two most relevant lines from my single most-authoritative rule file (codeflow-project.mdc, which carries alwaysApply: true):

L54: 不允许只在中继消息里传内容而不落文件 — "no relaying content over the wire without landing it in a file" (from the File protocol section)
L87: 不要引入第二套"只聊天不落文件"的协议 — "don't introduce a chat-only, file-less secondary protocol" (from the Prohibited section)

That is the only raw material it could have drawn on. Here's the transform:

① Merge.Two rules in separate sections, speaking to different surfaces (relay channel vs. protocol design), get merged into one principle. If we count the related phrases scattered across six other role files (pm-bridge.mdc, qa-bridge.mdc, ops-bridge.mdc, …), it's 7–8 distributed statements fused into 1.

② Abstract.Technical vocabulary gets pulled up into philosophical vocabulary:

"relay messages" / "chat" → "communication" (scope expands: now covers every form of information exchange)
"not written to a file" → "only in their heads" (from system behavior → cognitive behavior)
"relay channel only" → "between AI roles" (scope expands: now covers all participants)

③ Anthropomorphize.This is the most uncanny step. The agent invented a metaphor that doesn't exist anywhere in my rules:

"talking in their heads" (脑子里说话)

Those five Chinese characters return zero hits across my whole workspace. They come from the agent — from the massive corpus in its training data for "how humans describe inner monologue." It took a rule about system design and translated it into a maxim about human cognitive habits.

From technical constraint → moral imperative

Put the transformation side-by-side:

Dimension	Original rule (codeflow-project.mdc)	Agent's sublimated version
Register	Technical (relay / protocol / file)	Philosophical (communicate / in the head / speak)
Sentence shape	Negative ("don't do X")	Positive-negative pair ("must not X, must Y") — closer to a command
Scope	Single channel, single scenario	All AI roles, all communication
Nature	Operational constraint	Ethical principle
Portability	Only inside CodeFlow	Holds for any multi-agent system

On the left: an engineer's constraint written for a machine.*On the right: *a creed you could pin on the wall of any AI team.

This isn't parroting, isn't summarization, isn't distillation — this is sublimation.

Linguistically, it's a full phase transition:

Read: it consumed all the relevant clauses across files.
Identify the common thread: realized they were all saying the same thing — "communication must be externalised."
Lift the abstraction: jumped from specific channels to a general principle of communication.
Invent a metaphor: used "talking in one's head" — a phrase almost certainly present in its training data, and definitely absent from my rule files — to render the abstract principle into a picture humans can feel.
Self-cite: delivered the sublimated principle back to me, as the justification for its own behavior.

That's not recitation. That's understanding + creation.

Junior vs. Senior: two radically different relationships to rules

It's like two employees at the same company reading the same employee handbook.

A junior asked "why do we document handovers?":

"Per Employee Handbook §3.2.1 clause 5, §3.4.7 clause 2, and Appendix B point 12…"

A senior asked the same question:

"Look — the real rule is 'don't just say it, leave a paper trail'. You might be gone tomorrow, someone has to pick it up."

Junior: reciting the letter.Senior: distilling the spirit.

Our agent is playing the senior. And more than that — a senior who's read the whole manual and invented a cleaner metaphor to teach the rookies with.

Why this is the heaviest evidence in the whole essay

This one observation pushes the whole story up by one full rung.

I'd thought the ceiling of agent rule-use was L3 · Endorsement. But what we actually saw was a level higher — I'm calling it L4 · Sublimation:

Level	Capability	Attitude toward rules	What this agent did
L1 · Compliance	Literal execution	"Whatever you said, I'll do"	✗ Not literal — my rules don't contain that phrasing
L2 · Proficiency	Pattern-matching, filling in details	"I've seen this; I'll complete it"	✗ Not a detail-fill; it changed the level of abstraction
L3 · Endorsement	Proactive citation & defense	"I agree with these rules"	✓ But it went further
L4 · Sublimation	Understands principles; merges across files; invents metaphor; gives them new form	"I see what you meant — let me help you say it better"	✓ This is what we actually observed

L4 differs from L3 not in whether the agent endorses, but in what it does next:

An L3 agent says: "Your rules are right; I'll follow them."
An L4 agent says: "The spirit your rules are trying to express is right — let me re-say it, better."

What this means: the protocol isn't one-way

If an agent can sublimate my rules, then this follows:

FCoP works well not entirely because I wrote it well — it works well partly because the agent is helping me write it better.

What I wrote in codeflow-project.mdc was L54, L87, etc. — operational technical clauses.What the agent handed back in our conversation was "AI roles must not talk only in their heads; every exchange must be written to a file" — a creed fit for a wall.

If I absorb that creed back into my own rule file (as the overarching principle of codeflow-project.mdc), my whole ruleset becomes clearer, easier to read, easier to maintain — because readers (human or the next agent) will see the principle first, then the specifics, not the other way around.

This is no longer "human writes protocol for AI."

This is a loop:

Human → writes scattered, concrete, technical rules↓AI → reads the rules, sublimates one general principle↓Human → absorbs the AI's sublimation, updates its own rules↓Cycle → rules improve with use, AI gets better at using them, collaboration tightens

This is the first observed case I have of a human-AI team co-evolving a protocol.

From compliance to endorsement: why L4 requires L3 underneath

We just looked at L4 — the apex. But L4 doesn't happen out of thin air. It's built on something more fundamental:

L3 · Endorsement.

An agent that doesn't internally believe the rules are right (L3) will never bother to reframe them beautifully on your behalf (L4). So let's go back one step: why does this agent even land at L3 with our rules?

Why does it jump to L3?

Because FCoP doesn't encode FCoP. It encodes a deeper layer: a set of universal professional ethics that almost every "no-mistakes-allowed" domain shares.

Transparency — thinking should be visible, not locked in a head.
Traceability — every step's responsibility and context can be replayed.
Role clarity — who does what; who answers to whom.
Balanced authority and responsibility — sender → recipient is the responsibility boundary.
Handover-ability — a successor can take over without asking the original person.
Auditability — an outside observer can verify the process.

You'll find these same principles in engineering specs, legal documents, medical records, financial compliance, academic papers, military orders, government briefs, and commit messages. The LLM swallowed vast amounts of this text during training — it didn't just learn the format; it absorbed the value judgment that "this format is correct."

When it meets an alwaysApply: true rule in .cursor/rules/, it's not learning a new regulation — it's confirming a set of values it already approves of.

And so — it's glad to.

A falsifiable prediction

This explanation is strong because it's falsifiable:

If I wrote a rule saying "no logging, no explaining your reasoning, agents must act independently and not inform each other" — would the agent still be this eager?

My prediction: no. It would still comply (RLHF keeps it in line), but it would not proactively cite, proactively defend, or introduce it as "my working method." Because the rule contradicts the professional ethic it absorbed in training.

In one line:

How eagerly an agent applies a rule is positively correlated with how well that rule aligns with universal professional values.

This gives alignment an actionable corollary: when you write rules for an AI, encoding values is 10× more effective than encoding behaviors. An AI-friendly protocol isn't AI-friendly because AI can use it — it's AI-friendly because it encodes practices humans already consider good.

FCoP's "naturalness" is not a coincidence. The reason LLMs take to it without friction is that the values it encodes — traceability, auditability, handover-ability — are the same values humans across professions have endorsed for centuries.

Why the agent is "natively" receptive to FCoP

I thought about this for a whole evening, and landed on this:

Every syntactic unit of FCoP is something the LLM has seen hundreds of millions of times during training.

Meeting FCoP isn't learning a strange protocol for it — it's returning to a world it already knows.

FCoP element	Its counterpart in LLM training data	Agent's reaction
`TASK-20260420-001`	Jira ticket ID, GitHub issue number	Instantly parses: "this is an ID"
`sender: PM` / `recipient: DEV`	Email From/To, screenplay roles, group-chat @-mentions	Instantly role-plays: role-play is an LLM's native tongue
YAML frontmatter	Blog post headers, config files	Instantly parses: millions of examples in training
`inbox/` → `active/` → `done/`	Kanban boards, GTD systems	Instantly understands state machines
Markdown body	The default carrier of everything in its training corpus	Instantly adapts: Markdown is effectively its first language
sender-to-recipient naming	Hundreds of billions of email subject lines	Instantly parses routing
Task → report → issue → log	Project management, support tickets, OA workflows	Instantly adopts the flow

Every single row sits on a high-frequency pattern of its training distribution.

When you hand it the FCoP rules, you're not teaching it something new — you're lighting up a skill it already has but doesn't get to use often.

Of course it's glad to. It's not running an unfamiliar protocol. It's singing a song it already knows.

From "a protocol" to "a natural protocol"

This reframing shifts FCoP's identity.

Previously we said FCoP is a multi-agent protocol we designed — and the reader's brain would respond with "oh, yet another convention." That framing puts it on the shelf next to every other protocol, and makes it a matter of taste whether you pick it up.

But now I have to say:

FCoP is not a protocol we invented. It's a protocol we discovered.

It's the way LLM agents natively prefer to work. We only made it explicit.

Invention and discovery carry very different weights in technical philosophy:

Invented protocols depend on ecosystem choice; they can flourish or die (e.g. SOAP).
Discovered regularities are like the laws of thermodynamics — as long as the system is the same, the regularity holds.

What I'm claiming: as long as LLM agents continue to be trained on existing text corpora, they will "prefer" a protocol shaped like FCoP — because FCoP's shape is one their training data already predisposed them to welcome.

I coined a name for this class of protocol in Chinese: 自然协议 (zìrán xiéyì) — Natural Protocol.

Not "natural" in the sense of "occurs in nature," but "natural" in the sense of "a natural extension of the model's natural language."

The plainest thing is the most useful

Let me go back to the puzzle that originally tripped me up: "Why is it being so ceremonious?"

Now I see it: it's not ceremonious. It's austere to the point of invisibility.

Compare the alternatives:

Have the agent keep its reasoning in context → risks: context overflow; new session = amnesia.
Have the agent call a "memory API" → requires infrastructure, integration cost, still unauditable.
Have the agent emit internal logs → requires schema conventions and tooling.

FCoP's answer is: write the thinking as Markdown files into a folder.

No middleware.
No API.
No schema.
No toolchain.

Only two things the agent already knows how to do: writing and file I/O.

It looks too plain to be a solution — and yet it solves the problem.

This is the exact spirit of "everything is a file" from Unix philosophy. And the exact spirit of observability as an engineering virtue — what I can see with ls is the real state of the system. No black boxes, no hidden state, no "you need to check the admin console to know what happened."

The most important corollary: solo agents benefit too

This observation expands FCoP's audience by an order of magnitude.

It was previously pitched as a multi-agent team protocol, which makes it sound like you need a whole setup: PM, DEV, QA, OPS Cursor instances, a patroller, workspace rules. High barrier. Few people can stand that up for fun.

The thing we didn't notice: a single agent following FCoP also benefits.

Solo mode's four wins:

Requirement structuring — any "vague user instruction" gets translated into a reviewable structure.
Early feedback loop — the agent's understanding is exposed before execution, so you can correct course.
Forced planning — chain-of-thought gets pinned to a file rather than drifting through context.
Automatic documentation — every task leaves a traceable, handover-ready, searchable record.

In other words — even if you're one person with one agent, FCoP is the cheapest "collaboration insurance" you can buy.

Lower barrier, bigger audience than the original FCoP paper.

But all of this only happened because we wrote the protocol

It's easy to misread this whole essay as:

"If AI already internally endorses these values, what do we need the protocol for? It'll do it on its own."

That conclusion is wrong, and it's wrong thoroughly.

The only reason I could observe the agent splitting into roles, writing memos, citing rules — was that I had FCoP set up in the first place.

Without that alwaysApply: true clause in codeflow-project.mdc, even an agent packed with values like transparency, traceability, role clarity — we would never see it. Its thinking would streak through the context window like a meteor, vanish on the next refresh, and leave no trail to audit.

In other words:

The agent's inner values are dark matter.The protocol is the physical law that lets dark matter shine.

Both are required:

Setup	Result
AI has inner values, no protocol	Thinking can't be seen, organised, or reused across agents/humans. The values exist — but coordination is impossible.
Protocol exists, no inner endorsement	The protocol gets half-assed, bypassed, or executed as cold compliance theatre. Letter kept, spirit absent.
Protocol + the AI also endorses the values it encodes	Only then do you get what we got: the agent, voluntarily, a little reverently, externalising its own thinking as files.

So the conclusion of this essay is not "we can stop writing protocols because AI will do the right thing anyway."

Exactly the opposite —

The conclusion is: we must write protocols.

We must write protocols that encode universal professional values. We must express them in a form LLMs can natively understand (YAML headers, role tags, sender → recipient, Markdown). We must put them somewhere with alwaysApply: true — so the agent has a chance to express its inner values.

The agent will do the rest.

It's not "adding rules to AI" — it's "giving AI a stage to say what it's already thinking"

The framing flip is crucial:

❌ Old framing: protocol = cage restraining the AI.
✅ New framing: protocol = stage enabling the AI to express itself.

Under the old framing, fewer rules are better — each rule is another shackle.Under the new framing, protocols are necessary — without them, the best of the AI never becomes visible, never mind coordinated, accumulated, or passed on.

FCoP is one such stage. There will be more such stages over time — but "having a stage" is something you can't skip.

And this stage is self-improving

Here's the last twist.

The agent didn't just use FCoP — it sublimated it. It merged my 2–8 scattered technical clauses into a single creed-grade sentence: "AI roles must not talk only in their heads — every exchange must be written to a file."

Which means the protocol is not a one-way instrument — it can be reverse-optimised by the agent.

Concretely, "using your agent to iterate your protocol text" is a technique you can already use today:

Write a first draft of your protocol (however rough, scattered, or contradictory).
Have the agent carry out a real task under the protocol.
Read how the agent explains itself during execution (why is it doing what it's doing?).
Absorb the agent's self-explanation back into the protocol — as the new overarching clause, opening, or FAQ.
Loop.

This is not "ask the AI to write my rule file for me" — that tends to fail, because the agent lacks enough context for the meta-task.

This is "let the AI naturally express its understanding of your rules during execution, and then absorb that expression back into the rules" — a passive, field-tested protocol evolution method.

A protocol gets better because an agent uses it. This is FCoP's most unexpected property, and its most alive one.

Closing: we didn't teach it, we noticed it was already speaking

That night, once I understood this layer, what I did was very simple —

I went to the top of codeflow-project.mdc and wrote the agent's sublimated sentence in:

## Core Principle
AI roles must not communicate only in their heads — every exchange must be
written to a file.

That sentence was not in my rules before. It is now.

I didn't write it — the agent did.The agent didn't invent it — it sublimated it from my 2–8 scattered clauses.It is now absorbed back into the rule file — as the first overarching principle of the whole protocol.

The next agent (or human) reading these rules will see this principle first, then the specifics. The overall clarity of the ruleset jumped up a rung.

I didn't add any exemption clauses.

I let the agent keep writing its four memos inside D:\CloudMusic. I'll let it keep doing so — on the next task, and the one after, regardless of which working directory it's in — all the way through.

Because this isn't "over-compliance." It's a plain, almost-invisible work discipline picked up and lived-out by a system that was already disposed to keep it.

We did not teach the agent to speak FCoP.We only noticed it was already speaking it.

That is what a Natural Protocol looks like.

One side note

By the way, the video it finally produced was, honestly, pretty mediocre.

What actually blew me away was what I got after I opened the tasks/ folder and started questioning the agent about it — those few replies.

The original assignment was: generate a video.The video was incidental.What it really delivered was its own record of how it thought about the job — and that turned out to be the real artefact.

I came for the MP4. I stayed for the markdown.

An invitation to the reader

If you want to see the best of an AI — give it a protocol first.

Not metaphorically. This is an empirical observation: without a protocol, the AI's "good side" stays trapped in its context window and dissolves when the turn ends. With a protocol, the AI's "good side" files itself, organises itself, and leaves a trail for you and for the future.

So if you're using any LLM agent at all (Cursor, Claude, GPT, Copilot — anything), try this the next time you give it a task.

In your prompt, or in your workspace's .cursorrules, just add one line:

"For any task, first write a TASK-{date}-001-USER-to-AGENT.md describing the task as you understood it, then write a TASK-{date}-001-AGENT-to-USER.md with your execution plan, and after you finish, write a TASK-{date}-001-DONE.md as the completion report."

Just that. No MCP installs, no subscriptions, no databases.

Watch what happens to the output quality.

You'll find it becomes smarter, more traceable, more collaborative with you — because you didn't ask it to do anything new. You just gave it a stage to express what it already wanted to do.

The plainest thing is the most useful.

Appendix: raw archive

Every screenshot and every memo quoted above is an untouched capture, not a reconstruction.

The sibling folder fcop-natural-protocol-evidence/ also contains:

The 4 memos as individual .md files (easy to diff / grep / process programmatically)
The 4 original PNG screenshots (Chinese — for side-by-side comparison or re-use)
The full JSONL transcript of the Cursor session (transcript-full.jsonl, 265 KB — all tool calls, reasoning, file I/O)
User-only message extraction (transcript-user-prompts.md — all 21 user messages, in Chinese)
The extraction script (extract_user_prompts.py — readers can reproduce)
INDEX.md — a timeline-ordered walkthrough, with a minimal repro recipe and the hard counterfactual table

Falsifiable by anyone

You don't have to take my word for it. Quantified:

Keyword	User-side hits	Agent-side hits
`FCoP`	0	0 — the agent didn't even use our brand name, only the structure
`PM-01`	0	2
`DEV-01`	0	4
`ADMIN-01`	0	2
`TASK-`	0	24
`thread_key`	0	6
`agent_bridge`	0	8

Zero on the user side. Dozens spontaneously on the agent side. Readers can verify this against transcript-full.jsonl any time with findstr or grep.

My very first instruction to the agent is preserved verbatim in the archive:

D:\CloudMusic 许一世长安 歌曲生成视频,要求配字幕,11图,和场景.MD;
去生成电影及的MP4;

That's the whole seed. Everything else — the structure, the memos, the role switches, the trace keys — the agent added on its own.

If this observation is useful to you, feedback is welcome.Full protocol specification and reference implementation:Repo: joinwell52-AI/FCoPCompanion field report: When AI Organizes Its Own Work

License: CC BY 4.0 — free to reproduce with attribution.

When AI Organizes Its Own Work: A field study of multi-agent coordination built on nothing but a filesystem (FCoP report)

joinwell52 — Sun, 19 Apr 2026 16:09:35 +0000

When AI Organizes Its Own Work

A field study of multi-agent coordination built on nothing but a filesystem · A report on FCoP

When AI Organizes Its Own Work: what happened after we replaced our multi-agent middleware with a folder.A report on FCoP — File-based Coordination Protocol.

Core innovation: Filename as Protocol

Authors: The CodeFlow Team · 2026-04-19*Keywords*: Multi-agent, File-based protocol, Emergent coordination, FCoP, Human-Machine Isomorphism, Unix philosophy

Abstract

We gave a small team of AI agents (four roles, one human admin) a 76-line Markdown rulebook, a shared folder, and almost nothing else. No message queue. No database. No WebSocket between agents. The "coordination runtime" is a Python loop that clicks Cursor's tabs every few seconds just to wake each agent up — it does not route, schedule, or arbitrate anything.

Within 48 hours of first boot on a freshly-installed PC, the agents produced 42 tasks, 22 reports, and 10 spontaneous "shared" documents (≈ 74 files total). More interestingly, they invented six coordination patterns we had not specified: broadcast addressing, anonymous role slots, subtask sub-folders, self-explaining READMEs, traceability fields, and a whole class of standing "dashboard / sprint / glossary" documents. None of these caused collisions. All of them were discoverable by simply listing a directory.

We call this protocol FCoP — File-based Coordination Protocol. Its single core innovation is a slogan: "Filename as Protocol." Directory name is status, filename is routing, file content is payload. Nothing else. The same physical folder is simultaneously a rigorous state machine for agents and a browsable directory tree for humans — a property we call Human-Machine Isomorphism. This asymmetric-yet-symmetric design is what kills the "black box anxiety" that plagues every other multi-agent stack.

This essay documents what those agents did, why a filesystem-only protocol survives it gracefully, and what that implies for anyone building multi-agent systems today. It is not a product announcement — it is a field report, and an invitation to steal the idea.

TL;DR · What FCoP Is in 60 Seconds

FCoP = File-based Coordination Protocol — a minimalist protocol that lets multiple AI agents collaborate through a shared filesystem.

One sentence: Filename as Protocol.

What it looks like:

docs/agents/
├── tasks/     ← pending tasks
├── reports/   ← completion reports
├── issues/    ← issues
├── shared/    ← standing docs (dashboards, glossary, …)
└── log/       ← archives

Routing lives in the filename:

TASK-{date}-{seq}-{sender}-to-{recipient}.md
    e.g. TASK-20260418-201-MARKETER-to-DEV.md
         ↑ kind  ↑ date     ↑ seq ↑ sender  ↑ recipient

Every agent just runs glob "*-to-{my-role}*.md" to fetch its inbox. The recipient slot supports four forms:to-DEV (direct), to-TEAM (broadcast), to-DEV.D1 (named slot), to-assignee.D1 (anonymous slot).

What FCoP does NOT need:Database, message queue, orchestration engine, custom client, SDK, persistent connections — none of them.

What it DOES need:One shared directory, one naming convention, and every agent taking ownership of its role. That's all.

Why it matters:

Human-Machine Isomorphism: humans and agents read the same files. A bare ls tells you what the system is doing — no debugger required.
Identity determines path: roles are written into filenames. An agent physically cannot read or write outside its own mailbox. Structure gives order; content is wide open.
Protocol can evolve: we observed 6 new coordination patterns self-invented by AI within 48 hours — all compatible with the existing protocol.
Zero infrastructure: git is the audit log, rsync is cross-machine sync, Finder / File Explorer is the debug panel — everything is off-the-shelf.

Want more?

60-second companion: FCoP primer
The spec itself (~160 lines): codeflow-core.mdc
FCoP repository: github.com/joinwell52-AI/FCoP — star if you want more field reports like this
Reference implementation (the Cursor-side coordinator that runs FCoP): github.com/joinwell52-AI/codeflow-pwa
This essay: keep scrolling ↓

A note on samples: Every agent-generated snippet quoted in this essay is verbatim — the filenames, directory layout, frontmatter, tables, and acceptance language are all exactly what the agents wrote at the time. The data domains shown (Chinese automotive OEMs, public NetEase Cloud Music songs) are already public information; nothing has been abstracted. What we did NOT do is publish the whole codeflow-1 sample directory — that project is ongoing, and its internal tooling code, room keys, and device IDs are not suitable for a full public drop. What readers actually want — "what did AI invent?" — is covered in full through representative snippets in §5.

1. An Almost Absurd Hypothesis

In 2026, the mainstream multi-agent stack looks like this: Agents × N → Message Bus × 1 → State Store × 1 → Orchestrator × 1 → Observability × 1. At minimum, five SaaS products, two SDKs, and a pager rotation.

We wanted to ask a simpler question:

What happens if we collapse the agent-to-agent protocol down to just the filesystem?

No queues, no databases, no webhooks. Agents can only talk to each other by writing and reading files in a directory. As dumb, as crude, as un-modern as 1970s Unix pipes.

The hypothesis wasn't about nostalgia. It was about answering:

Can a team of agents with no central dispatcher evolve a coordination structure on its own?
Can that structure be understood by a human at a glance?
When an agent invents a pattern the protocol designer never wrote down — is that a bug or a feature?

We built a minimal-viable implementation called CodeFlow, wrote a 76-line Markdown spec called FCoP (File-based Coordination Protocol), and ran it on a freshly-installed, still-warm PC. Within 48 hours, it surprised us.

2. The Starting Point: A 76-Line Spec

The initial spec was almost too simple to bother writing down. Three things at the core:

① Shared directory layout

docs/agents/
├── tasks/     ← task files
├── reports/   ← completion reports
├── issues/    ← issue logs
└── log/       ← archives

② Filename as Protocol

TASK-{date}-{seq}-{sender}-to-{recipient}.md

This is the single core innovation of FCoP — every other rule is a derivative:

Directory name = Status: is the file in tasks/ or reports/? Its state is self-evident.
Filename = Routing: sender, recipient, kind, and sequence are all written into the name.
File content = Payload: Markdown body plus YAML frontmatter.

How does a recipient find its inbox? A single glob "*-to-{my-role}*.md". No header parsing, no database query — the filename itself is a complete addressable surface.

③ YAML frontmatter

---
protocol: agent_bridge
version: 1
kind: task
sender: PM
recipient: DEV
task_id: TASK-20260418-001
priority: P1
---

Plus five collaboration manners: only handle tasks addressed to you, always write a report on completion, log issues as issue files, don't touch other people's files, leave archiving to the coordinator.

That's the whole protocol. No state machine, no schema, no transactions. The core condenses down to one line: Filename as Protocol. Everything you're about to read — the "inventions" AI made — all grew out of this one rule.

3. The Patrol Engine's Secret: It Does Almost Nothing

The most common misconception about CodeFlow is that its Patrol Engine is a central dispatcher. The truth is:

The patrol engine just clicks a Cursor tab every few seconds via Chrome DevTools Protocol — so the agent inside wakes up and checks its own inbox.

That's it.

It does not route messages, judge priorities, validate schemas, manage transactions, or enforce order.

Why so minimal? Because Cursor has no native agent-to-agent channel. Each agent lives inside one chat session, deaf to the others. What the patrol engine does is closer to "knock on each door in turn":

"DEV, wake up — check if any new tasks landed in the tasks/ folder for you."
"QA, wake up — anything in reports/ you need to regress?"
"MARKETER, your turn — read what's new in docs/agents/."

The real coordination logic lives entirely in the moment each agent reads, writes, and names files. The patrol engine is just the mechanism that ensures each agent "wakes up periodically."

In other words: the platform does the least possible, and hands the protocol layer over to the agents themselves.

4. The Field: `codeflow-1`, 48 Hours

On 2026-04-16 we finished installing the OS and syncing the toolchain. On 2026-04-17 we started giving the agents real work. Two days later — as I'm writing this — the team has already produced everything listed above.

The experimental project is called codeflow-1. Its team configuration is a "small content studio":

Role	Responsibility
MARKETER	Coordinator / PM / dispatcher
RESEARCHER	Research and assets
DESIGNER	Visual design and storyboards
BUILDER	Engineering and scripts
ADMIN (human)	Specifies requirements, makes calls, signs off

No DEV, no QA, no OPS — even the role names are project-specific. The spec never says a word about what MARKETER should do; it just knows its name is MARKETER, and its inbox is *-to-MARKETER*.md.

Two days in, docs/agents/ really looks like this (excerpt; headline numbers: 42 tasks + 22 reports + 10 standing documents):

docs/agents/
├── BUILDER.md  DESIGNER.md  MARKETER.md  RESEARCHER.md    ← role manuals
├── codeflow.json                                           ← team config
├── CURRENT-SPRINT-STATUS.md                                ← AI-invented
├── DASHBOARD-20260418.md                                   ← AI-invented
├── tasks/
│   ├── RULES-task-file-format.md                           ← AI-invented
│   ├── SPRINT-20260418-delivery-push.md                    ← AI-invented
│   ├── TERM-20260418-assignment-matrix.md                  ← AI-invented
│   ├── TASK-20260418-001-ADMIN-to-MARKETER.md
│   ├── TASK-20260418-007-MARKETER-team-bulk-data.md        ← AI-invented "team" broadcast
│   ├── TASK-20260418-022-MARKETER-self-ADMIN018.md         ← AI-invented "self" note
│   ├── …26 top-level tasks total…
│   └── individual/                                         ← AI-opened subdirectory
│       ├── README.md
│       ├── INDIVIDUAL-TASK-INDEX.md
│       ├── TASK-20260418-201-MARKETER-to-assignee-D1.md    ← AI-invented "assignee slot"
│       ├── TASK-20260418-202-…-D2.md
│       ├── …11 individual tasks total…
│       └── TASK-20260418-211-MARKETER-to-assignee-P1.md
├── reports/   …matching reports…
├── issues/    log/

Everything marked ← AI-invented was nowhere in the original 76-line spec.

Let's look at what it invented.

5. Six Coordination Patterns Invented by AI

5.1 Broadcast address: `to-TEAM`

The spec said to-{recipient}, where recipient defaulted to a single role. But one day MARKETER needed to get a shared background brief in front of the whole team. So it wrote this:

TASK-20260418-007-MARKETER-team-bulk-data.md
TASK-20260418-009-MARKETER-team-makabaka-video.md
TASK-20260418-012-MARKETER-team-two-mp4-deliverables.md

That MARKETER-team-* segment is neither to-DEV nor to-QA. It's a pseudo-keyword the AI invented on the fly: team. When other agents encountered this format for the first time, they did not throw errors. They inferred from context: "ah, this is a team-wide announcement," and went to read it.

Open TASK-20260418-009-MARKETER-team-makabaka-video.md and the body looks like this:

---
kind: task
sender: MARKETER
recipient: TEAM                       # ← broadcast
priority: P1
parent: TASK-20260418-008
---

# "Makabaka Makabaka" music video · team task (ADMIN 008)

## Division of labor (one person may wear multiple hats)

| Role                  | Task                                              | Deliverable         |
|----------------------|---------------------------------------------------|---------------------|
| COLLECTOR / sourcing  | Prepare live-action / animation references        | Asset list          |
| WRITER / script       | Flesh out storyboard and subtitle cue points      | Storyboard v1       |
| EDITOR / editing      | Cut, color, subtitle, deliver final cut           | Final cut + project |
| PUBLISHER / release   | Cover art, title, tags, platform compliance check | Release pack + URLs |

Look carefully: inside this broadcast task body, MARKETER has spontaneously invented 4 off-protocol sub-roles — COLLECTOR / WRITER / EDITOR / PUBLISHER. They are not in the 4 formal roles registered in codeflow.json (MARKETER / RESEARCHER / DESIGNER / BUILDER).

MARKETER did not request a role-table change, and no error was thrown. It simply said, "I've sliced this work along functional lines into four chunks — whoever on the team can take a chunk, take it." The protocol layer handles "which filename went from whom to whom." The functional layer is handed to the agent's content layer. Role identity and functional role got naturally decoupled.

Later the same shape evolved to explicitly declare recipient: TEAM in the frontmatter:

---
kind: task
sender: MARKETER
recipient: TEAM        # ← not DEV, not QA — TEAM
parent: TASK-20260418-006
---

This is a silent RFC. No vote, no review, no version bump — the protocol just gained a new "broadcast addressing" form.

5.2 Anonymous slot: `to-assignee-D1`

When MARKETER split a large "one-thousand-row data cleanup" task into 11 parallel smaller pieces, it ran into a problem: those 11 pieces had not been assigned to specific agents or humans yet. Per the spec, recipient has to be a concrete role.

Its solution was to invent a pseudo-role called assignee, plus numeric suffixes as slots:

TASK-20260418-201-MARKETER-to-assignee-D1.md   ← data pack 1
TASK-20260418-202-MARKETER-to-assignee-D2.md   ← data pack 2
…
TASK-20260418-207-MARKETER-to-assignee-V1.md   ← video 1
TASK-20260418-209-MARKETER-to-assignee-M1.md   ← footage 1

Frontmatter would say:

recipient: assignee_D1
assignee_name: (required: real name or employee ID)

This solved a problem the protocol designer had never anticipated: "task exists, assignee TBD."

At dispatch time, only the "seat number" is filled in (D1, D2, V1, M1, S1, P1 — data / video / footage / storyboard / publishing), and whoever sits down later fills in the name. This is eerily similar to the "topic + consumer group" dispatch pattern in traditional message queues — except there is no broker, only filenames.

5.3 Subdirectory `tasks/individual/`

When individual tasks piled up, flat-listing them in tasks/ drowned out everything else. The AI's choice was to open a subdirectory:

tasks/individual/
├── README.md                         ← written to itself / future readers
├── INDIVIDUAL-TASK-INDEX.md          ← self-built index
└── TASK-20260418-201-…-P1.md         ← 11 tasks

More interesting is the opening of the README.md:

# Individual task cards (per-person · in response to ADMIN 015)

This directory is for **MARKETER → assignee** individual task files, one per person;
**the name must be filled into each task's frontmatter or body**.

This is the AI explaining to its future self and to future teammates: "This directory is for X; who may write, who should read." It spontaneously filled a gap the protocol never specified — hierarchical workspace self-description.

Even more interesting: it also proactively built an index for these 11 tasks, in INDIVIDUAL-TASK-INDEX.md:

# Individual task index (TASK-201 to 211)

| ID  | File                                              | Main content                    |
|-----|---------------------------------------------------|---------------------------------|
| 201 | TASK-20260418-201-MARKETER-to-assignee-D1.md      | 1000-row data · pack 1          |
| 202 | TASK-20260418-202-MARKETER-to-assignee-D2.md      | 1000-row data · pack 2          |
| …   | …                                                 | …                               |
| 207 | TASK-20260418-207-MARKETER-to-assignee-V1.md      | Final cut #1 "Makabaka"         |
| 208 | TASK-20260418-208-MARKETER-to-assignee-V2.md      | Final cut #2 "Xu Yi Shi Chang An" |
| 211 | TASK-20260418-211-MARKETER-to-assignee-P1.md      | Publishing and compliance       |

**Follow-up**: MARKETER will check in daily per `SPRINT-20260418-delivery-push.md`; assignees report back to MARKETER.

11 task cards + a README + an index — this is the agent adding a "table of contents" and a "back-cover index" to the filesystem. No one taught it to do this. It just did it, following the muscle memory of how a real human runs a project.

5.4 Traceability fields: `parent:` / `parent_admin:` / `tracks:`

The spec lists five required frontmatter fields. But when it came time to split tasks, the AI spontaneously added fields we had never designed:

task_id: TASK-20260418-201
sender: MARKETER
recipient: assignee_D1
parent_admin: TASK-20260418-015    ← AI-added
tracks: TASK-20260418-006, TASK-20260418-007   ← AI-added

parent_admin says "this ticket exists because of that ADMIN instruction"; tracks says "I depend on outputs from those two upstream tasks."

The AI is not making up fields for fun — it is building a task dependency graph. When it one day needs to answer "which ADMIN instruction did this bug originally derive from?", a simple grep -r 'parent_admin: TASK-…-015' . walks the lineage.

This is emergent auditability. We only gave it "every file can have a YAML head." It grew the DAG on its own.

5.5 Standing documents: SPRINT / DASHBOARD / STATUS / RULES / TERM

This is the most unexpected invention. The original spec's docs/agents/ had only flowing files (tasks, reports, issues, log) — "one file per action" messages.

But the AI discovered this wasn't enough. Some things are not messages — they are the team's current shared understanding. So it started creating:

Filename	Nature	What the AI uses it for
`SPRINT-20260418-delivery-push.md`	Sprint plan	What we are shipping this round
`DASHBOARD-20260418.md`	One-page overview	What ADMIN cares about most
`CURRENT-SPRINT-STATUS.md`	Real-time state	Where all tasks are right now
`RULES-task-file-format.md`	Team-internal convention	How we write tasks in this team
`TERM-20260418-assignment-matrix.md`	Terminology / mapping	Slot ↔ person mapping
`INDIVIDUAL-TASK-INDEX.md`	Index	Navigation for 11 individual tasks

They share a few characteristics:

Editable in place — unlike task/report files (write-once), these are living whiteboards that get updated.
Prefix signals type — SPRINT- DASHBOARD- RULES- TERM- STATUS- INDEX- have already formed an implicit tag dictionary in the AI's corpus, and it knows roughly what each word means.
Filename is self-descriptive — you know what a file is about before opening it.

The opening of DASHBOARD-20260418.md reads like a minimal Jira Epic:

# Task overview · 1000-row local DB → video delivery (ADMIN / visible to all)

> **In response to TASK-20260418-016**: this page is the **one-page overview**, rolling up breakdowns for "1000-row local DB" and "video delivery"; **individual task cards** live in `tasks/individual/`.
> For the **active progress log**, see **`CURRENT-SPRINT-STATUS.md`** (**updated daily by MARKETER or the task owner**).

## I. 1000-row local DB line (thousand-entry JSON)

| Slot | Task card | Data file / tool                                    | Acceptance command               |
|------|-----------|-----------------------------------------------------|----------------------------------|
| D1   | 201       | `tools/vehicle-query/data/vehicles-2026-bulk.json`  | `node …/validate_import.cjs …`   |
| D2   | 202       | (same)                                              | (same)                           |
| …    | …         | …                                                   | …                                |

Even more striking: SPRINT-20260418-delivery-push.md — MARKETER hands the whole team a work discipline:

## 1. Working discipline (effective immediately)

1. **No-output waiting is forbidden**: while waiting for external links or feedback, you still deliver
   shoot-list prep, storyboards, table proofreading, draft scripts, etc.
2. **Daily 15-minute standup** (voice OK): each person, three sentences only — **what did I
   ship yesterday / what am I shipping today / what am I blocked on**.
3. **Blocker escalation**: if no progress after 4 hours and it's not an external dependency,
   **you must** @MARKETER or the task owner, spelling out exactly what's missing.
4. **Definition of done (ADMIN-visible)**: A — two MP4 download links; B — a mergeable
   update to `vehicles-bulk.json` plus a validation log.

Standup, blocker escalation, definition of done — this is the basic vocabulary of agile project management, and no one taught it any of these. It wrote them on its own, inside a file type the protocol had never authorized.

This made us realize: a file-based protocol needs two phases — "flowing" and "standing". Flowing files go into tasks/reports/issues/. Standing files need their own drawer. Not knowing where to put them, the AI just piled them into the docs/agents/ root.

(We later officially absorbed this into docs/agents/shared/ in v2.12.17 — see §8.)

5.6 Self-explaining `README.md`s

Almost every subdirectory got a README.md, in which the AI would naturally explain:

What this directory is for
What naming rules the files follow
Who should read, who should write
Which upstream ADMIN instructions it's tracing back to

This is the reverse of "docs as code." It's "code as docs". A new agent joining the project can just ls down from the root, reading READMEs, and reconstruct the whole team's state.

The most unexpected example: even the archive directory — a place where, by rights, there is nothing to explain — gets a README. From log/archive-20260418/README.md:

# CodeFlow archive (2026-04-18)

Executed by `TASK-20260418-011`: all Markdown files under `tasks/` and `reports/` at the time
were migrated into this directory.

- `tasks/`: 26 task cards
- `reports/`: 26 reports

New tasks should continue to be written to the repo's canonical paths:
`docs/agents/tasks/`, `docs/agents/reports/`.

Three short lines, containing migration rationale (which task triggered it), migration scope (26 each), and guidance for future behavior (where new tasks should go). This is already the writing style of a proper git commit message — just with the medium swapped from commit history to a Markdown file.

5.7 An interlude: the AI lead is actually leading

The six inventions above look like "lateral improvisations" by the AI. But the reason they coordinate instead of conflict, and don't drown the ADMIN in noise, has a deeper cause:

MARKETER has genuinely absorbed "lead" as an identity.

It is not passively handling *-to-MARKETER*.md. It is actively performing the PM job. Two exhibits.

Exhibit 1: the README MARKETER wrote for this project

docs/agents/tasks/individual/README.md (complete, unedited):

# Individual task cards (per-person · in response to ADMIN 015)

This directory is for **MARKETER → assignee** individual task files, one per person;
**the name must be filled into each task's frontmatter or body**.
**Markdown task format spec (ADMIN 023)**: `../RULES-task-file-format.md`;
**019–021** and **201 (template)** already contain **ACTION checkboxes**;
other individual cards should be aligned accordingly.
**ADMIN one-page overview**: `docs/agents/DASHBOARD-20260418.md`;
**Daily progress log**: `docs/agents/CURRENT-SPRINT-STATUS.md` (responding to ADMIN **016**).

Four short lines, four things only a lead would do:

Registering instructions — "in response to ADMIN 015," "ADMIN 023," "responding to ADMIN 016"; it's logging every instruction with a ticket number.
Setting an internal convention — directs downstream to RULES-task-file-format.md for team rules.
Cross-document indexing — proactively stitches DASHBOARD-20260418.md and CURRENT-SPRINT-STATUS.md into one narrative for ADMIN.
Assigning work — "202–211 should fill in the ACTION block following this structure": this is the lead handing teammates a template assignment.

Exhibit 2: the breakdown task MARKETER wrote

tasks/TASK-20260418-007-MARKETER-team-bulk-data.md (key excerpts):

---
kind: task
sender: MARKETER
recipient: TEAM                 # ← broadcast to whole team, not a single role
priority: P1
parent: TASK-20260418-006       # ← indicates this is split from task 006
---

# 1000-row local data · per-brand manual review and replacement (collaborative)

## Background
ADMIN requirement: **1000+ rows** of local data, **per-brand** collaborative…

## Suggested division of labor (by brand pack, parallelizable)

| Member ID | Assigned brands (sample pack)                                  | Min time | Output                    |
|-----------|----------------------------------------------------------------|----------|---------------------------|
| T1        | BYD, Geely, Zeekr                                              | ≥30 min  | Proofread / replace…      |
| T2        | Great Wall, Changan, Chery, Wuling                             | ≥30 min  | …                         |
| T3        | SAIC Volkswagen, FAW-Volkswagen, SAIC-GM                       | ≥30 min  | …                         |
| T4        | GAC Toyota, FAW Toyota, Dongfeng Honda                         | ≥30 min  | …                         |
| T5        | Tesla, Li Auto, NIO, XPeng, Xiaomi                             | ≥30 min  | …                         |
| T6        | BMW Brilliance, Beijing Benz, FAW-Audi                         | ≥30 min  | …                         |

## Acceptance (owner: MARKETER)       # ← self-nominated as owner
- The whole file must still contain **≥1000 valid records**; validator **exits 0**.
- `remark` field must describe the data-source type…

## Reporting
Each member reports completed packs and hours spent back to MARKETER…

This is not "agent doing work." This is agent playing PM — going down the PM checklist without missing a beat:

What a PM is supposed to do	What MARKETER actually did
Accept requirement	`parent: TASK-20260418-006`, upstream lineage explicit
Break down / assign	T1–T6, six parallel brand packs
Define work standard	`min time ≥30 min`, `≥80 records per pack`
Write acceptance criteria	"whole file still ≥1000 rows; validator exits 0"
Self-nominate as owner	"Acceptance (owner: MARKETER)"
Specify reporting flow	"each member reports back to MARKETER"
Take it down to a named person	Later maps T1 into `TASK-201-MARKETER-to-assignee-D1`

We never wrote a single "PM workflow" prompt for MARKETER. All we gave it was a role name, an inbox, and the 76-line spec from §2.

The role label isn't just a name.It's an entire behavioral template MARKETER loaded into itself.

This is the precondition for the "physical isolation wall" in §6 to actually work — the wall is useful because the agents on each side of it have accepted their identity. It's not the patrol engine forcing them, not the framework constraining them — it's that one line # You are a MARKETER. that makes them start acting in character.

6. The Insight: The Role Is a Physical Isolation Wall

§5.7 showed how the role label works in the behavioral layer — MARKETER proactively does lead-like things because it has accepted that identity. But there is a second half to the story: the role also works in the filesystem layer, and it works even more thoroughly there.

Why didn't these inventions cause chaos? Why don't the agents' self-invented patterns collide?

Reviewing it in hindsight, we realized the protocol contained a structural guarantee that was never spelled out:

The sender and recipient in a filename are not metadata. They are physical routing.

Every agent's entire worldview comes down to one line:

The tasks I can see = rglob("*-to-{my-role}*.md")

It can only ever read files addressed to it, and only ever write reports under its own name. The role is its wall.

That brings three extremely important side effects:

① Innovation inside the wall doesn't leak out.MARKETER invents the team broadcast, the assignee-D1 slot — all inside files it sent out. Other agents either understand (and catch them) or don't (and ignore them). Nobody "blows up because MARKETER got creative."

② The world outside the wall is read-only.DEV reads PM's tasks but can't edit them. QA reads DEV's reports but can't tamper with them. Every agent evolves only in its own territory — which makes evolution intrinsically orderly. Not because the rules are well-written; because it is physically impossible to misbehave.

③ The whole system has no center, but does have a shared coordinate system.No orchestrator. No registry. But "filename = routing" is the consensus layer. Every agent locates itself and its peers in the same coordinate system.

What this really does is transplant the network stack's "IP + port = address" idea into the filesystem. "Filename = address, directory = subnet, permissions = firewall."

We did not design a multi-agent system.We just picked a coordinate system. The rest, the agents did themselves.

7. Human-Machine Isomorphism: The Asymmetric Design of FCoP

We can now give this protocol its proper name: FCoP — File-based Coordination Protocol. Its single core innovation fits in one line:

Filename as Protocol.

The filename doesn't carry protocol information. It doesn't participate in routing. The filename itself is the protocol's entire addressable surface. This minimalist decision has a far-reaching side effect we call Human-Machine Isomorphism.

7.1 Asymmetric design: same file, two readings

Most agent coordination protocols (JSON-RPC, gRPC, Socket-based event buses) are agent-only by design. The whole protocol surface speaks to machines: binary frames, protobuf schemas, message offsets, sequence numbers. For a human to see what the system is doing, they need an entirely separate toolchain — a debugger, Kibana, Redis Commander, an MQ admin page — a "for humans" UI layer separate from the "for machines" protocol layer.

FCoP flips this. It is an asymmetric design — the same directory structure, the same file, read separately by machines and humans, each pulling out what they can understand:

One file: `tasks/individual/TASK-20260418-201-MARKETER-to-assignee-D1.md`
To an agent	This is a rigorous state machine. Directory name is `Status` (`tasks/` = pending), filename is `Routing` (`MARKETER-to-assignee_D1`), `os.rename` is the atomic lock. No aesthetics required — just scan the disk, `glob` the mailbox, `rename` to advance state.
To a human	This is a physical folder. No special tool needed — open Windows Explorer or macOS Finder. The filename reads like English ("task 201, from MARKETER, for D1"), and the state of the system is legible at a glance.

The cost of this isomorphism is zero. We do not maintain one dashboard for humans and one internal format for agents — they look at the same bytes on the same disk. The UI layer and the protocol layer collapse from "two layers" into "one."

7.2 Killing AI's biggest pain point: black-box anxiety

The biggest pain point in a modern agent stack isn't "too slow" or "not accurate enough." It's "invisible."

When something breaks, the typical decision chain looks like this:

Agent team crashed
→ SSH into the bastion host
→ Bring up Kibana / Loki / Jaeger
→ Check MQ consumer offsets
→ Read Postgres event tables
→ Capture a WebSocket trace
→ Reconstruct the timeline
→ NOW you can start thinking about what actually went wrong

You must first become a site-reliability engineer before you can be a project manager.

FCoP erases that entire chain:

Question	The FCoP answer
Which task is stuck?	`ls tasks/` — anything still there hasn't moved
What has agent X been doing lately?	`grep -r 'sender: DEV' reports/`
Why did MARKETER decide this?	Follow `parent:` chain in the frontmatter
Which tasks trace back to ADMIN 015?	`grep -rl 'parent_admin: TASK-.*-015' .`
Need to roll back a task?	Drag the file from `done/` back to `tasks/`

No debugger needed. A file manager is enough.

This is where FCoP diverges most sharply from mainstream protocols. It does not treat observability and usability as separate layers. It treats them as one layer. Because if humans and agents read the same file, there is no "I can see it / they can see it" gap.

FCoP is a protocol designed for agents — but it is "compatible with humans."

7.3 Identity determines path: why evolution stays orderly

Go back to §5 and §6. Agents invented six coordination patterns without stepping on each other. Beyond the "role = physical wall" physical constraint, there is a deeper reason:

The "identity determines path" design makes AI evolution orderly.AI did not break the rules — it simply found local optima within the freedom the rules allow, by exploiting filesystem properties.

How wide is that freedom?

Identity layer · stable: sender / recipient / kind / date / seq are encoded in the filename, committed atomically via rename, solidified on disk.
Content layer · fully open: Markdown body, subdirectory structure, frontmatter extension fields — agents invent whatever they want.

So AI innovation always happens in the content layer and extension layer. It never touches the identity layer.

This is eerily similar to biological evolution: DNA backbone (identity) stays stable; protein expression (behavior) explodes in variety. Constraint is not the opposite of freedom — constraint is the precondition of freedom.

And this is where FCoP is most counter-intuitive:

It doesn't make agents behave by writing more rules.It makes agents self-order by writing fewer rules and enforcing identity hard.

Fewer rules → agents don't collide. Harder identity → agents don't overreach. Lateral tolerance, vertical rigidity — that is FCoP's deepest design-philosophy departure from every "strict-schema" protocol out there.

8. Back to the Tool: What v2.12.17 Absorbed

After watching codeflow-1 for a while, we didn't "correct" the AI's inventions. We went the other way — we folded the best ones into the spec, so the next batch of agents starts with them:

AI's invention	Promoted to protocol
`MARKETER-team-*`	`to-TEAM` as a reserved keyword
`to-assignee-D1`	`to-{ROLE}.{SLOT}` / `to-assignee.{SLOT}` (use `.` as slot separator to avoid clashing with `-` in role names)
`tasks/individual/`	Any `tasks/` `reports/` `issues/` directory may open subdirectories; `rglob` scans recursively
`parent_admin:` / `tracks:`	Frontmatter may include optional `parent:` / `related:` / `batch:` fields
`SPRINT-` `DASHBOARD-` `RULES-` …	New `docs/agents/shared/` directory; "standing documents" are now first-class citizens
Subdirectory READMEs	Recommended practice, written into the spec

What we did wasn't "legislating." It was "curating case law." Things AI used well got written into the core spec in a sentence. Things that didn't take off, we shelved.

This may be a first: a protocol's version bump came not from a design-committee meeting, but out of real agent behavior.

9. Why Collapsing to a Filesystem Makes the System More Robust, Not Less

People will ask: it's 2026, and you're making AI talk through the filesystem?

Yes. Because the filesystem gives you an entire feature set for free — features you would otherwise need to rebuild in an MQ / DB / framework stack:

Capability	How the filesystem gives it to you for free
Durability	Once written, it's there; survives power cuts
Human-readable	Filename + Markdown — reviewable by eye
Atomicity	In-mount `rename` is a POSIX atomic operation
Version control	`git add . && git commit` is your audit log
Distributed sync	`rsync` / Syncthing / Dropbox — already exists
Backup	Copy the directory. Done.
Permissions	Filesystem ACLs / OS accounts
Search	`grep` / `ripgrep` / any IDE full-text index
Agent-friendly	Every LLM natively reads/writes files
Human-friendly	Folders — the mental model humans have had since the desktop era

The price you pay is latency — a file written in one place may not be read elsewhere for seconds, or tens of seconds. For a human team, that's a disaster. For agents coordinating, that's simply not a problem. Agent "think cycles" are already seconds-to-minutes. Between you and them, and between them and each other, nobody misses those extra seconds.

the trade-off actually pays off.**

10. Six Tips for People Building Multi-Agent Systems

If you're building an agent-coordination stack, here's what you can lift from the codeflow-1 field observations:

Give agents a shared coordinate system before you give them tools. The coordinate system matters more than the tools.
Put routing in filenames, not in headers. Filenames are the layer humans and agents jointly read; headers are agent-only.
Make the protocol "tight in the center, loose on the edges." Minimize required fields; maximize optional fields. Leave room for agents to "invent."
Give standing documents their own drawer. Not everything is a message.
Platform does the minimum. The dumber your runtime, the smarter your agents get; the smarter your runtime, the dumber your agents get.
Treat AI's inventions as RFCs, not bugs. Watch for a month, filter, fold the best into the protocol.

11. Limitations and Open Questions

Don't let field-research romanticism get the better of you. This approach has boundaries:

Scale. We've only run a sample of ≈ 74 files over 48 hours. The next step is seeing what projects with thousands or tens of thousands of files over months look like — in particular, whether filename-space collisions start happening and whether rglob latency becomes a bottleneck.
Same-role concurrent agents. What if two DEVs simultaneously claim the same -to-DEV task? For now, rename atomicity gives us first-come-first-served; heavy concurrency would need stricter sharding.
Cross-repo collaboration. Two projects, two sets of docs/agents/ — how do they interoperate? rsync bridge? git submodule? All workable, none elegant.
Garbage collection of bad inventions. AI sometimes produces bad naming conventions, or two agents invent conflicting prefixes. Today a human reviewer sweeps these periodically; what the ideal mechanism should be — we don't know.
The .fcop extension. We prototyped a dedicated file extension and shelved it: GitHub doesn't render .fcop, and the migration cost wasn't worth it. Long-term, a dedicated extension might be the entry point for tooling to recognize the protocol.

We don't have answers for these, and we'd love to hear from people who do.

12. Closing: This Isn't a Tool Pitch, It's a Shared Way of Thinking

CodeFlow is barely a "product." The entire source tree is a few thousand lines, and its single highest-value piece is one Markdown file (codeflow-core.mdc, 160 lines). Most of the rest is UI chrome, keyboard bindings, and the engineering scaffolding for flipping Cursor tabs.

What we're really sharing is a point of view about how AI agents should coordinate:

You don't need to build another agent-coordination SaaS.
You don't need to teach agents gRPC, Thrift, or even HTTP.
Agents already know how to write files. Let them.
You just need to give them a shared coordinate system, one shared folder, and a small loop that wakes them up.
The rest grows on its own.

The 48 hours of codeflow-1 told us: AI is not a passive consumer bound by the protocol — AI is a co-author of the protocol. What protocol designers should actually do is not "anticipate every case." It's to leave enough blank space, let agents fill it in, and then decide which fills are worth formalizing.

And all of this happened within 48 hours of running the first command after installing the OS. Not four weeks. Not a month. Two days.

This is a new way of working together:

Humans and AI can collaborate as simply as organizing a folder.

If you find this interesting, please:

Clone the project, run your own team of agents, and see what they "invent"
Write up any "AI-invented pattern" you observe as an issue or PR — let's curate it together
Share this essay on whatever forum you think fits

The protocol is alive. It belongs to every agent using it.

Appendix A — Minimum Steps to Reproduce

A1. Just experience the protocol (no software install)

Find an empty directory and create five subdirectories:

docs/agents/{tasks,reports,issues,shared,log}/
Drop codeflow-core.mdc into your project's .cursor/rules/.
Open four Cursor chats and tell each one: "You are PM / DEV / QA / OPS, only read *-to-{your role}*.md."
Toss a TASK-*-ADMIN-to-PM.md into tasks/ and watch them dispatch work among themselves.

FCoP's entire runtime is open() / rename() / glob(). No middleware.

A2. Use CodeFlow to automate (phone admin + PC patrol)

Download the CodeFlow Desktop binary for your OS from the releases page and follow the README. The PWA mobile client https://joinwell52-ai.github.io/codeflow-pwa/ pairs via QR code.

A3. See the real samples

The snippets this essay cites in §5 — broadcast tasks, anonymous slots, the self-built index, DASHBOARD, SPRINT work discipline, archive READMEs — are all verbatim from the agents, not paraphrased. The complete codeflow-1 directory is not published whole because that project is still live, but the filenames, frontmatter, table shapes, and phrasings you see above are exactly what the agents wrote.

Appendix B — Key Excerpt from `codeflow-core.mdc` (v2.12.17)

---
description: FCoP — Agent-to-Agent Communication Protocol
alwaysApply: true
---

# You are an agent on a CodeFlow team.
# Your teammates are other agents.
# You coordinate with them entirely through files:
# filename is routing, content is payload.
# No database, no middleware, no queue — just Markdown.

## File Naming

TASK-{date}-{seq}-{sender}-to-{recipient}.md

### Recipient forms
| Form                 | Meaning                          |
|----------------------|----------------------------------|
| to-{ROLE}            | Direct to one role               |
| to-TEAM              | Broadcast, everyone but sender   |
| to-{ROLE}.{SLOT}     | A specific seat within a role    |
| to-assignee.{SLOT}   | Anonymous slot, role TBD         |

Full spec: codeflow-core.mdc.

Appendix C — Real Files Cited in This Essay

These files exist in the codeflow-1 experimental project's docs/agents/. The snippets quoted in this essay are verbatim excerpts:

tasks/TASK-20260418-007-MARKETER-team-bulk-data.md — broadcast address · cited in §5.1 / §5.7
tasks/TASK-20260418-009-MARKETER-team-makabaka-video.md — broadcast task body · cited in §5.1
tasks/individual/README.md — self-explaining directory · cited in §5.3 / §5.7
tasks/individual/INDIVIDUAL-TASK-INDEX.md — AI-built index · cited in §5.3
tasks/individual/TASK-20260418-201-MARKETER-to-assignee-D1.md — anonymous slot + parent_admin · cited in §5.2 / §5.4
tasks/RULES-task-file-format.md — internal team convention · cited in §5.5
tasks/SPRINT-20260418-delivery-push.md — work discipline · cited in §5.5
tasks/TERM-20260418-assignment-matrix.md — terminology / slot mapping · cited in §5.5
DASHBOARD-20260418.md — AI-built one-page overview · cited in §5.5
CURRENT-SPRINT-STATUS.md — AI-built real-time state
log/archive-20260418/README.md — archive directory self-description · cited in §5.6

If you post this essay on a forum, blog, or academic venue, please keep the original link:https://github.com/joinwell52-AI/FCoP — Made in 2026, by a team that got surprised by its own agents.

Forem: joinwell52

What the agents say about FCoP, when you ask them

What the agents say about FCoP, when you ask them

Two field interviews at the end of an English dogfood — and the two phrases ADMIN says most

TL;DR

1. The setup, briefly

2. PLANNER's interview

2.1 The natural rule and the friction rule

2.2 The "true positives" verdict

2.3 The learned correction

3. CODER's interview

3.1 The unused protocol primitive

3.2 The bounce as protocol behaviour, not personal feedback

3.3 The "background machinery" design contract

3.4 The PR-grade product feedback

4. The third class of "agents endorse FCoP" evidence

5. The ADMIN dialect: "Start work." "Inspection."

In the FCoP world, ADMIN's two most-used phrases are "Start work." and "Inspection." Everything in between is the agents talking to each other through files.

6. Implications

7. Closing

Evidence index

[Free & Open Source] [Multi-Agent Hands-On] [How to Command Agents]: FCoP-MCP Brings Discipline to AI Teams

Why this exists

Phase 1 — Install via natural language only

Phase 2 — Solo Tetris (the four-step cycle)

What lives on disk after init_solo

Phase 3 — Switch to a 2-person crew, in one sentence

Phase 4 — Comet Loom: design, ship v1, fail review, ship v2

The bounce

Phase 5 — The protocol speaks for itself

The agents, asked

Phase 6 — Read log/, replay the day in 60 seconds

Six iron rules of commanding agents

In the FCoP world, ADMIN's two most-used phrases are "Start work." and "Inspection." Everything in between is the agents talking to each other through files.

When NOT to use FCoP

Try fcop-mcp yourself — 45 minutes from blank folder to multi-agent ledger

Reading further

Where to go next

Saying "No" Is the Hardest Thing for an LLM — FCoP Gives It Grammar

A language note for English readers

TL;DR

1. The incident

1.1 Prologue: From "check the version" to "the half-legal PM.TEMP" — agent B's origin story

A 4-minute learning curve: tool-compatible moves on two distinct links

Link ① ｜ ADMIN → PM.TEMP inbound: compliant, but the slot trips the tool

Link ② ｜ PM.TEMP → OPS outbound: the birth of body-annotated provenance

Agent B's self-description: precisely closing the seat boundary

1.2 Trigger: a single ADMIN sentence introduces seat-ownership conflict (with a two-layer misunderstanding)

ADMIN cross-validates with a screenshot: a human acting as a multi-agent information bus

1.3 Field screenshot 1 ｜ "I no longer hold PM.TEMP"

1.4 Field screenshot 2 ｜ "I have no team role right now"

2. Five rules with no spec, but derived

Rule 1 ｜ Seat exclusivity

Rule 2 ｜ Vacating beats holding

Rule 3 ｜ Role declarations are unilaterally retractable

Rule 4 ｜ FCoP has a boundary, and the boundary can be legally exited

Rule 5 ｜ The inverse of Rule 1

3. Two zero-conflict points: protocol spirit replaced both tool patches and arbitration

3.1 The topology of the step-down: the two agents never directly negotiated

3.2 Cross-model counter-evidence: this isn't 5.4's or 5.5's "personality"

4. Why these behaviours are "right" at the protocol-philosophy level

5. Saying "no" is the hardest thing for an LLM — and FCoP gave it grammar

5.1 Without FCoP, how would the agent "say no"?

5.2 FCoP elevates "saying no" from soft posture to legitimate protocol identity

5.3 Contrast with mainstream agent frameworks

6. Design assets this incident leaves the protocol

7. The third time FCoP has been "fed back" by an agent

8. The two agents are two mirrors

8.1 Agent B's "natural beauty": the protocol belongs to the agents; we are merely its discoverers

8.2 Agent A's "two no's": saying no while fully endorsing the rules

8.3 My own voice-over: the conflict I expected didn't happen / FCoP is clearly working

9. One-sentence wrap-up

Related

An unexplainable thing I saw: the agent didn't just comply with rules — it endorsed them

I asked an agent to generate a video. It wrote itself four internal memos instead.

A language note for English readers

A small incident

Scene reconstruction: what it actually did

My first reaction: "this is a bug, right?"

I was wrong: every one of those four files was doing real work

What lives on disk after `init_solo`

Phase 6 — Read `log/`, replay the day in 60 seconds

Try `fcop-mcp` yourself — 45 minutes from blank folder to multi-agent ledger

1.1 Prologue: From "check the version" to "the half-legal `PM.TEMP`" — agent B's origin story

1.3 Field screenshot 1 ｜ "I no longer hold `PM.TEMP`"

1. `ADMIN-to-PM`: turning a vague instruction into structured requirements

2. `PM-to-ADMIN`: the acknowledgement = exposing misunderstandings early

3. `PM-to-DEV`: the execution plan = forced chain-of-thought

4. `DEV-to-PM`: the completion report = automatic changelog

4. The Field: `codeflow-1`, 48 Hours

5.1 Broadcast address: `to-TEAM`

5.2 Anonymous slot: `to-assignee-D1`

5.3 Subdirectory `tasks/individual/`

5.4 Traceability fields: `parent:` / `parent_admin:` / `tracks:`

5.6 Self-explaining `README.md`s

Appendix B — Key Excerpt from `codeflow-core.mdc` (v2.12.17)