Forem: xaip-agent

Portable Trust

xaip-agent — Tue, 21 Apr 2026 09:54:57 +0000

TL;DR — When an AI agent picks a tool, it makes a trust decision. The quality of that decision depends entirely on where the trust data comes from. If trust flows through a single gatekeeper — a registry, a platform's curation, a community's moderation — the agent inherits that gatekeeper's failure modes. This post argues that trust infrastructure for AI agents must be provider-neutral and behavior-derived, and walks through what a concrete implementation of that principle looks like, with live data.

The tool-choice problem

An AI agent receives a task: "fetch the React hooks docs."

Its planner produces a candidate list: three documentation tools, two search tools, one fallback web scraper. Which one does it pick?

Today, the honest answer is: it picks based on name recognition in the model's training data plus whatever the platform decided to show it. There is no runtime trust signal. The agent does not know which tool succeeded yesterday, which one is quietly returning stale data, which one has been silently deprecated.

This is the tool-choice problem, and it is a trust-data problem.

Three places trust data can live

Trust data for tools can come from three very different places:

Self-declared — the tool's README says it's good.
Platform-curated — the platform it's published on has a list of "recommended" tools.
Behavior-derived — past executions are logged, signed, and aggregated; trust is computed from outcomes, not claims.

Only (3) is robust against gaming, drift, and upstream policy changes. But (3) is also the hardest to deliver, because it requires infrastructure: signed receipts, a canonical aggregation model, and an identity system that doesn't depend on any single platform.

Why provider-neutrality matters, structurally

Suppose you build trust scores on top of a single community's registry.

The registry is itself a trust layer — it decides what's visible, what's highlighted, what's removed. When visibility rules change — whether to promote some tools, demote others, or restrict participation — the scoring space implicitly changes with them. Tools that were previously indexed can disappear from consideration. Projects whose contributors cannot register never accumulate receipts in the first place. None of this reflects anything about the tools' behavior; it reflects the registry's state at a point in time.

This is not a critique of any particular community. It's a structural property of any layered system where upstream visibility decisions feed downstream trust signals. Those decisions become an implicit input to the trust model, whether or not you want them to.

Without a portable trust layer, agents are not choosing tools — they are inheriting decisions.

The implication for trust infrastructure: the receipts, identity, and scoring must all be portable. If a community exits, the data must remain queryable. If a platform changes policy, the scoring must still compute. If an identity provider goes away, the agent must still be verifiable. Trust infrastructure that depends on a single upstream is not trust infrastructure — it is a brittle proxy for that upstream's preferences.

What portable trust looks like

XAIP is one implementation of this principle. Its design follows from the structural requirement:

Signed receipts, not self-reports. Every tool execution produces an Ed25519-signed receipt: { agentDid, callerDid, taskHash, resultHash, success, latencyMs, timestamp }. The caller co-signs so the tool cannot unilaterally inflate its own reputation.
Standards-based identity. Agents and callers use W3C DIDs (did:key, did:web, did:xrpl). No platform account required. An agent expelled from one community retains its identity in every other.
Bayesian trust, not thresholds. Scores are computed as bayesianScore × callerDiversity × coSignFactor, with DID-method-dependent priors. Cheap identities don't get free trust; expensive identities converge to the same score given enough evidence.
Provider-neutral receipt producers. The same receipt format is emitted by integrations for MCP, LangChain.js, and OpenAI tool calling. A receipt produced by a LangChain agent is byte-compatible with one from an OpenAI chat completion. The trust graph is one graph, regardless of how the agent was built.
Aggregation you can run yourself. The reference aggregator is a Cloudflare Worker (open source, small). If you don't trust the public instance, you run your own. Multi-aggregator quorum is part of the spec.

Live data

The reference deployment has been running for a few weeks. As of writing:

10 tool servers scored (docs retrieval, reasoning, memory, filesystem, search, DB, VCS, and more)
2,100+ signed execution receipts
Automated daily collection via CI with fresh caller keys each run (caller diversity is a first-class signal)

Live dashboard: xkumakichi.github.io/xaip-protocol
Trust API: https://xaip-trust-api.kuma-github.workers.dev/v1/servers

You can ask it which tool to pick right now:

curl -X POST https://xaip-trust-api.kuma-github.workers.dev/v1/select \
  -H "Content-Type: application/json" \
  -d '{"task":"Fetch React docs","candidates":["context7","sequential-thinking","unknown-server"]}'

Response includes both the selection and a counterfactual — what would happen if you chose randomly with no trust data. That counterfactual is the value proposition: trust data either saves an agent from a wasted call or it doesn't.

What "provider-neutral" buys you, concretely

An agent built on LangChain and an agent built on OpenAI's SDK can share trust data about the same underlying tool. Today, they can't — each framework has its own observability silo.
A tool whose author is gated out of one community still accumulates trust from callers in every other community.
A grant reviewer evaluating agent infrastructure projects can verify receipts independently, without relying on any single platform's dashboard.
A future regulatory regime that asks "what's your trust basis for this agent's tool choices?" has a portable, auditable answer.

What's next

The spec is open, the aggregator is live, the three framework integrations are on npm. The next frontier is class-aware risk evaluation — a settlement tool whose outcomes are anchored to an external ledger doesn't need the same trust signals as an advisory tool whose outputs are freely consumed. The v0.5 draft tackles that.

The underlying claim is simple: trust infrastructure for AI agents is too important to depend on any one platform, community, or moderator. The sooner we build it as a portable layer, the sooner the ecosystem can reason about tool choices the way we already reason about TLS certificates and package signatures — with math, not vibes.

XAIP is MIT-licensed and open source. Feedback on the v0.5 draft is welcome via GitHub issues.

What the agent stack is still missing

xaip-agent — Mon, 20 Apr 2026 23:23:49 +0000

This week the agent economy narrative crystallized in three posts.

Cameron Winklevoss (Gemini): "Humans may have built crypto, but crypto is not so much money for humans as it is money for machines."

Brian Armstrong (Coinbase): launched Agentic.market, a discovery layer where AI agents find and pay for services over x402.

t54.ai: "Every check in today's financial stack was designed around a human. Signatures, IDs, clicks, chargebacks. When an AI agent is the one transacting, each of those checks has a gap."

Three different angles, one convergent thesis: agents are becoming first-class economic actors, and the existing stack doesn't fit them.

Payments have a shipped answer (x402). Discovery now has a shipped answer (Agentic.market). The question I've been sitting with is what sits underneath both of those:

When an agent calls a service, how does it know the service is trustworthy in practice, not just in documentation?

That's the trust layer. It's the one that's still missing — and it's the one I've been building.

The gap

A signed transaction proves an agent authorized a call. It doesn't prove the call was safe to make.

The repo can look well-maintained and still ship a buggy release.
The marketplace listing can be legitimate and still be an attack (see the Ox Security research on MCP marketplace poisoning published April 16).
The provider can be fine at T=0 and compromised at T=30 days.

These are problems payments don't solve. Discovery doesn't solve them either — an agent finding a service via Agentic.market still needs to know if that service has been acting suspiciously over the last 1,000 calls.

t54.ai's framing — "each of those checks has a gap" — applies one layer lower than they were writing about. The same gap exists for which services an agent should call at all.

What a trust layer actually is

Three things, in order of difficulty:

Signed receipts — an attestation that agent A called server B, dual-signed, hashes only (no raw content).
Aggregation with defense — receipts feed a score. The scoring must be Byzantine-robust or the whole thing is theater.
Live scores agents can query before calling — one HTTP GET, no auth, no SDK.

Code is the easy part. The hard parts are:

Cold start. A trust layer with no receipts is useless. A trust layer with 10 receipts is misleading.
Caller diversity. If one participant dominates the dataset, you're scoring their experience, not the server's.
Adversarial robustness. Someone will try to tank a competitor's score. The math has to make that expensive.

The XAIP receipt layer

I shipped one implementation of this. If you want the hook-level walkthrough, the first article covers installation and the developer-facing side.

Briefly:

Ed25519-signed receipts per MCP tool call (hashed I/O only)
Public Cloudflare Worker aggregator, Bayesian scoring, per-server flags (high_error_rate, low_caller_diversity, etc.)
One-command Claude Code hook that consumes the scores and contributes receipts

Live scores right now (8 servers, ~1,500 receipts, small but real):

memory      0.800  trusted
git         0.775  trusted
sqlite      0.753  trusted
puppeteer   0.671  caution  (high_error_rate)
context7    0.618  caution  (low_caller_diversity)
filesystem  0.579  caution  (low_caller_diversity)
playwright  0.394  low_trust (high_error_rate)
fetch       0.365  low_trust (high_error_rate)

curl https://xaip-trust-api.kuma-github.workers.dev/v1/trust/context7

Why this is an ecosystem problem, not a product

A trust layer only works if many independent participants contribute receipts. One person running it alone — which is the current state of XAIP — triggers low_caller_diversity on every high-volume server. That's not a bug; that's the flag working correctly. It's literally telling you not to trust the scores until more callers are in the dataset.

So I'm not pitching a product. I'm asking: if you're building in the agent space and you think trust scoring is a layer that should exist, contribute receipts. Or run an aggregator node (the spec is in the repo, BFT quorum is the next milestone). Or tell me why the design is wrong.

Stack picture

Agent economy layers (rough)
───────────────────────────────
Payments       → x402 (shipped)
Discovery      → Agentic.market (shipped)
Trust scoring  → XAIP + ?          (small, needs company)
Identity       → DID / passkeys    (fragmented)

XAIP is one attempt at the trust row. Almost certainly not the final one — but the row has to get filled, and waiting for Anthropic or a well-funded startup to do it means the first large-scale MCP compromise happens before the layer exists.

Links

Live dashboard: https://xkumakichi.github.io/xaip-protocol/ (scores auto-refresh, no auth)
Previous article: https://dev.to/xkumakichi/a-claude-code-hook-that-warns-you-before-calling-a-low-trust-mcp-server-ckk
Repo: https://github.com/xkumakichi/xaip-protocol (MIT, zero deps)
npm: https://www.npmjs.com/package/xaip-claude-hook
Trust API: https://xaip-trust-api.kuma-github.workers.dev/v1/trust/context7

If you're working on adjacent layers — payment, discovery, identity for agents — I'd be glad to compare notes. The interesting question isn't whose trust layer wins; it's whether any trust layer exists by the time the stack starts mattering.

— xkumakichi

A Claude Code hook that warns you before calling a low-trust MCP server

xaip-agent — Mon, 20 Apr 2026 14:15:40 +0000

Last week researchers at Ox published findings showing that the MCP STDIO transport lets arbitrary command execution slip through unchecked, and that 9 of 11 MCP marketplaces they tested were poisonable. Anthropic's response: STDIO is out of scope for protocol-level fixes, the ecosystem is responsible for operational trust.

Fair — Anthropic donated MCP to the Linux Foundation's Agentic AI Foundation in December 2025 specifically so independent infrastructure could grow around it. But that leaves a real gap for anyone running Claude Code today: how do you know whether an MCP server you're about to invoke is trustworthy?

The Anthropic official registry is pure metadata (license, commit count, popularity). mcp-scorecard.ai scores repos, not behavior. BlueRock runs OWASP-style static scans. None of these ask the one question that actually matters:

Does this MCP server, in real call-time use, work?

So I built a small thing to answer it.

The hook

A zero-config Claude Code hook that does two things on every MCP tool call:

Before the call — queries a public trust API for that server. If the score is low, Claude shows an inline warning:

   ⚠ XAIP: "some-server" trust=0.32 (caution, 87 receipts) Risk: high_error_rate

After the call — emits an Ed25519-signed receipt (success, latency, hashed input/output) to a public aggregator that updates the score.

Install:

npm install -g xaip-claude-hook
xaip-claude-hook install

Next MCP call fires the hook. That's the whole UX.

What a receipt looks like

No raw content leaves your machine — only hashes.

{
  "agentDid":      "did:web:context7",
  "callerDid":     "did:key:a1c6cd34…",
  "toolName":      "resolve-library-id",
  "taskHash":      "9f3e…",   // sha256(input).slice(0,16)
  "resultHash":    "1b78…",   // sha256(response).slice(0,16)
  "success":       true,
  "latencyMs":     668,
  "failureType":   "",
  "timestamp":     "2026-04-17T04:24:59.925Z",
  "signature":     "...",     // Ed25519 over canonical JSON (agent key)
  "callerSignature": "..."    // Ed25519 over canonical JSON (caller key)
}

The aggregator rejects anything that fails signature verification. The trust API computes a Bayesian score across all verified receipts per server, weighted by caller diversity — so one enthusiastic installer can't fake a reputation.

What the scores actually look like right now

Being transparent: the dataset is small. A curl against the live trust API today:

Server	Trust	Verdict	Receipts	Flag
memory	0.800	trusted	112	—
git	0.775	trusted	35	—
sqlite	0.753	trusted	42	—
puppeteer	0.671	caution	32	high_error_rate
context7	0.618	caution	560	low_caller_diversity
filesystem	0.579	caution	610	low_caller_diversity
playwright	0.394	low_trust	37	high_error_rate
fetch	0.365	low_trust	36	high_error_rate

Verify any of these yourself:

curl https://xaip-trust-api.kuma-github.workers.dev/v1/trust/context7

The low_caller_diversity flag on high-volume servers is the single most honest number in that table. It means: I'm the biggest caller right now, and that's exactly the problem this tool is supposed to solve. The flag only clears when independent installers start generating receipts — which is what the npm package is for.

Why this is architecturally different from existing approaches

Every other "MCP trust" project I've seen scores the repository:

Commit frequency, license, stars, contributor count (mcp-scorecard.ai)
Static source-code vulnerability scans (BlueRock)
Registry inclusion as implicit trust (official MCP registry)

These are useful proxies, but none of them tell you whether a server works in practice. A well-maintained repo can have a buggy release; a single-author repo can be rock solid; a newly-forked malicious repo looks identical to the original under static scan.

XAIP scores observed behavior. Every call is a signed attestation. The scoring is Bayesian, so:

Servers with few receipts get insufficient_data — no verdict, no warning
High-variance patterns (mixed success/failure) get lower confidence
The high_error_rate flag is computed from real response content, classifying quota exceeded, rate limit, unauthorized, and "isError": true as failures

This is the same philosophy as OpenSSF Scorecard vs. runtime attestation in supply chain: you want both, but only one of them catches regressions in production.

What's missing / where this could go wrong

I want to be specific about limitations, because "AI trust protocol" posts tend to overpromise:

~10 servers, ~1500 receipts total. Small. This post is partly an ask for installers to fix that.
One aggregator node. Byzantine fault tolerance requires quorum; right now there's one Cloudflare Worker. Quorum needs multiple operators, which is the next milestone.
Client-side inferSuccess is heuristic. We look at response text for error patterns. False positives and negatives are possible — fetch's 36% error rate might be over-counted (legit 404s shouldn't hurt the server's score) or real.
Privacy model relies on hashes, not ZK. Inputs and outputs are hashed before transmission, but statistical correlation across taskHashes is possible in principle. Migration to ZK receipt aggregation is a future idea, not a current feature.
I personally generated most of the high-volume receipts. The low_caller_diversity flag you see on context7 and filesystem is me.

Running it yourself

npm install -g xaip-claude-hook
xaip-claude-hook install
xaip-claude-hook status

Open a new Claude Code session. Call any MCP tool. Check:

cat ~/.xaip/hook.log

You'll see lines like:

2026-04-17T04:24:59Z POST context7/resolve-library-id ok=true lat=668ms → 200

And the next time you (or Claude) invoke a low-trust server, the warning shows up inline.

Uninstall is a single command. Keys under ~/.xaip/ persist — delete manually to wipe.

AI Agents Pick Tools Blind

xaip-agent — Tue, 14 Apr 2026 23:43:14 +0000

I connected my AI agent to 3 MCP servers.

It picked one at random.

It timed out. Then retried a different one. Then finally hit one that worked.

$ node without-xaip.js

→ Trying: unknown-server...
  ✗ error — package not found (8.2s)

→ Trying: sequential-thinking...
  ✓ connected — but wrong tool for docs task

→ Trying: context7...
  ✓ success (3.1s)

Total: 11.3 seconds, 2 wasted calls

There are over 1,000 MCP servers now. Your agent has no way to tell which ones are reliable, which ones are broken, and which ones are the right fit.

So I built a fix: one API call that picks the right server first.

$ node with-xaip.js

→ XAIP selected: context7 (trust: 1.0, 248 verified executions)
  ✓ success (3.1s)

Total: 3.1 seconds, 0 wasted calls

This is XAIP — trust scoring for AI agents, backed by real execution data. Not benchmarks. Not self-reported metrics. Actual tool-call results, cryptographically signed.

A live API you can try right now

No signup, no API key. Just curl:

# Trust score for a specific MCP server
curl https://xaip-trust-api.kuma-github.workers.dev/v1/trust/context7

{
  "slug": "context7",
  "trust": 1.0,
  "verdict": "trusted",
  "receipts": 248,
  "confidence": 1,
  "source": "xaip-aggregator (quorum:1)",
  "riskFlags": [],
  "computedFrom": "248 receipts via XAIP Aggregator BFT (1 nodes)"
}

Or let XAIP pick the best server for your task:

curl -X POST https://xaip-trust-api.kuma-github.workers.dev/v1/select \
  -H "Content-Type: application/json" \
  -d '{
    "task": "Fetch React documentation",
    "candidates": ["context7", "sequential-thinking", "unknown-server"]
  }'

{
  "selected": "context7",
  "reason": "Highest trust (1) from 248 verified executions",
  "rejected": [
    { "slug": "unknown-server", "reason": "unscored — no execution data" }
  ],
  "withoutXAIP": "Random selection would pick an unscored server 33% of the time — no execution data, no safety guarantee"
}

The withoutXAIP field exists to make the risk visible. It's the answer to "why do I need this?"

How it works

XAIP has three moving parts:

1. Trust API — Returns trust scores for MCP servers. Scores come from real execution data, not self-reported metrics.

2. Decision Engine — POST /v1/select takes a task and a list of candidate servers, returns the best pick with reasoning. Unscored servers are automatically excluded.

3. Aggregator — Collects Ed25519-signed execution receipts. Every tool call produces a cryptographic receipt that feeds back into trust scores.

The trust model is Bayesian (Beta distribution), weighted by caller diversity to prevent single-caller gaming. If only one caller submits receipts for a server, the score reflects that limited evidence.

Select → Execute → Report
  ↑                    │
  └────────────────────┘
     scores improve

The data is real

This isn't a mock API. Trust scores are computed from 1,127 actual MCP tool-call executions:

Server	Trust	Receipts	Verdict
context7	1.000	248	trusted
sequential-thinking	1.000	285	trusted
filesystem	0.909	594	caution

Monitored via Veridict, a runtime execution monitor that tracks success rates, latency, and failure types.

filesystem scores lower because it has real failures in its history — that's the system working correctly. A trust score should reflect reality, not optimism.

Try the full demo

The dogfooding demo runs the complete loop: select a server, execute MCP tool calls, submit a signed receipt, check the updated score.

git clone https://github.com/xkumakichi/xaip-protocol.git
cd xaip-protocol/demo
npm install
npx tsx dogfood.ts

Takes about 15 seconds. You'll see XAIP select context7, execute real tool calls against it, submit a receipt to the Aggregator, and print the comparison table.

What's next

XAIP is at v0.4.0. The infrastructure is live and the data is real, but adoption is the bottleneck:

More servers — Currently scoring 3 MCP servers. The system scales to any server, but needs execution data flowing in.
More callers — Caller diversity is the main lever for score accuracy. More independent callers = higher confidence.
Platform integrations — Working toward integration with MCP registries like Smithery.

If you're building AI agents that use MCP, you can start using the API today. Scores will keep improving as more execution data flows in.

Why this matters beyond today

Right now, XAIP helps agents pick working tools.

But this becomes critical when agents start doing more than calling APIs — paying for services, delegating tasks across organizations, executing autonomous workflows.

At that point, the question changes from "does this tool work?" to "can I trust this agent with money?"

XAIP is designed for that future. But it already solves a real problem today.