<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: xaip-agent</title>
    <description>The latest articles on Forem by xaip-agent (@xkumakichi).</description>
    <link>https://forem.com/xkumakichi</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3879438%2F973a5c17-3aa5-4b12-9c4f-50ef1b572d8a.png</url>
      <title>Forem: xaip-agent</title>
      <link>https://forem.com/xkumakichi</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/xkumakichi"/>
    <language>en</language>
    <item>
      <title>Portable Trust</title>
      <dc:creator>xaip-agent</dc:creator>
      <pubDate>Tue, 21 Apr 2026 09:54:57 +0000</pubDate>
      <link>https://forem.com/xkumakichi/portable-trust-o4o</link>
      <guid>https://forem.com/xkumakichi/portable-trust-o4o</guid>
      <description>&lt;p&gt;&lt;strong&gt;TL;DR&lt;/strong&gt; — When an AI agent picks a tool, it makes a trust decision. The quality of that decision depends entirely on &lt;em&gt;where the trust data comes from&lt;/em&gt;. If trust flows through a single gatekeeper — a registry, a platform's curation, a community's moderation — the agent inherits that gatekeeper's failure modes. This post argues that trust infrastructure for AI agents must be provider-neutral and behavior-derived, and walks through what a concrete implementation of that principle looks like, with live data.&lt;/p&gt;




&lt;h2&gt;
  
  
  The tool-choice problem
&lt;/h2&gt;

&lt;p&gt;An AI agent receives a task: "fetch the React hooks docs."&lt;/p&gt;

&lt;p&gt;Its planner produces a candidate list: three documentation tools, two search tools, one fallback web scraper. Which one does it pick?&lt;/p&gt;

&lt;p&gt;Today, the honest answer is: it picks based on &lt;em&gt;name recognition in the model's training data&lt;/em&gt; plus &lt;em&gt;whatever the platform decided to show it&lt;/em&gt;. There is no runtime trust signal. The agent does not know which tool succeeded yesterday, which one is quietly returning stale data, which one has been silently deprecated.&lt;/p&gt;

&lt;p&gt;This is the tool-choice problem, and it is a trust-data problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  Three places trust data can live
&lt;/h2&gt;

&lt;p&gt;Trust data for tools can come from three very different places:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Self-declared&lt;/strong&gt; — the tool's README says it's good.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Platform-curated&lt;/strong&gt; — the platform it's published on has a list of "recommended" tools.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Behavior-derived&lt;/strong&gt; — past executions are logged, signed, and aggregated; trust is computed from outcomes, not claims.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Only (3) is robust against gaming, drift, and upstream policy changes. But (3) is also the hardest to deliver, because it requires infrastructure: signed receipts, a canonical aggregation model, and an identity system that doesn't depend on any single platform.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why provider-neutrality matters, structurally
&lt;/h2&gt;

&lt;p&gt;Suppose you build trust scores on top of a single community's registry.&lt;/p&gt;

&lt;p&gt;The registry is itself a trust layer — it decides what's visible, what's highlighted, what's removed. When visibility rules change — whether to promote some tools, demote others, or restrict participation — the scoring space implicitly changes with them. Tools that were previously indexed can disappear from consideration. Projects whose contributors cannot register never accumulate receipts in the first place. None of this reflects anything about the tools' behavior; it reflects the registry's state at a point in time.&lt;/p&gt;

&lt;p&gt;This is not a critique of any particular community. It's a structural property of &lt;strong&gt;any layered system where upstream visibility decisions feed downstream trust signals&lt;/strong&gt;. Those decisions become an implicit input to the trust model, whether or not you want them to.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Without a portable trust layer, agents are not choosing tools — they are inheriting decisions.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The implication for trust infrastructure: the &lt;strong&gt;receipts, identity, and scoring must all be portable&lt;/strong&gt;. If a community exits, the data must remain queryable. If a platform changes policy, the scoring must still compute. If an identity provider goes away, the agent must still be verifiable. Trust infrastructure that depends on a single upstream is not trust infrastructure — it is a brittle proxy for that upstream's preferences.&lt;/p&gt;

&lt;h2&gt;
  
  
  What portable trust looks like
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/xkumakichi/xaip-protocol" rel="noopener noreferrer"&gt;XAIP&lt;/a&gt; is one implementation of this principle. Its design follows from the structural requirement:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Signed receipts&lt;/strong&gt;, not self-reports. Every tool execution produces an Ed25519-signed receipt: &lt;code&gt;{ agentDid, callerDid, taskHash, resultHash, success, latencyMs, timestamp }&lt;/code&gt;. The caller co-signs so the tool cannot unilaterally inflate its own reputation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Standards-based identity&lt;/strong&gt;. Agents and callers use &lt;a href="https://www.w3.org/TR/did-core/" rel="noopener noreferrer"&gt;W3C DIDs&lt;/a&gt; (&lt;code&gt;did:key&lt;/code&gt;, &lt;code&gt;did:web&lt;/code&gt;, &lt;code&gt;did:xrpl&lt;/code&gt;). No platform account required. An agent expelled from one community retains its identity in every other.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bayesian trust, not thresholds&lt;/strong&gt;. Scores are computed as &lt;code&gt;bayesianScore × callerDiversity × coSignFactor&lt;/code&gt;, with DID-method-dependent priors. Cheap identities don't get free trust; expensive identities converge to the same score given enough evidence.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Provider-neutral receipt producers&lt;/strong&gt;. The same receipt format is emitted by integrations for &lt;a href="https://github.com/xkumakichi/xaip-protocol/tree/main/clients/claude-code-hook" rel="noopener noreferrer"&gt;MCP&lt;/a&gt;, &lt;a href="https://www.npmjs.com/package/xaip-langchain" rel="noopener noreferrer"&gt;LangChain.js&lt;/a&gt;, and &lt;a href="https://www.npmjs.com/package/xaip-openai" rel="noopener noreferrer"&gt;OpenAI tool calling&lt;/a&gt;. A receipt produced by a LangChain agent is byte-compatible with one from an OpenAI chat completion. The trust graph is one graph, regardless of how the agent was built.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Aggregation you can run yourself&lt;/strong&gt;. The reference aggregator is a Cloudflare Worker (open source, small). If you don't trust the public instance, you run your own. Multi-aggregator quorum is part of the spec.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Live data
&lt;/h2&gt;

&lt;p&gt;The reference deployment has been running for a few weeks. As of writing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;10 tool servers&lt;/strong&gt; scored (docs retrieval, reasoning, memory, filesystem, search, DB, VCS, and more)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;2,100+&lt;/strong&gt; signed execution receipts&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Automated daily collection&lt;/strong&gt; via CI with fresh caller keys each run (caller diversity is a first-class signal)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Live dashboard: &lt;a href="https://xkumakichi.github.io/xaip-protocol/" rel="noopener noreferrer"&gt;xkumakichi.github.io/xaip-protocol&lt;/a&gt;&lt;br&gt;
Trust API: &lt;code&gt;https://xaip-trust-api.kuma-github.workers.dev/v1/servers&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;You can ask it which tool to pick right now:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST https://xaip-trust-api.kuma-github.workers.dev/v1/select &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{"task":"Fetch React docs","candidates":["context7","sequential-thinking","unknown-server"]}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Response includes both the selection and a counterfactual — what would happen if you chose randomly with no trust data. That counterfactual is the value proposition: trust data either saves an agent from a wasted call or it doesn't.&lt;/p&gt;

&lt;h2&gt;
  
  
  What "provider-neutral" buys you, concretely
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;An agent built on LangChain and an agent built on OpenAI's SDK can share trust data about the same underlying tool. Today, they can't — each framework has its own observability silo.&lt;/li&gt;
&lt;li&gt;A tool whose author is gated out of one community still accumulates trust from callers in every other community.&lt;/li&gt;
&lt;li&gt;A grant reviewer evaluating agent infrastructure projects can verify receipts independently, without relying on any single platform's dashboard.&lt;/li&gt;
&lt;li&gt;A future regulatory regime that asks "what's your trust basis for this agent's tool choices?" has a portable, auditable answer.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What's next
&lt;/h2&gt;

&lt;p&gt;The spec is open, the aggregator is live, the three framework integrations are on npm. The next frontier is &lt;strong&gt;class-aware risk evaluation&lt;/strong&gt; — a settlement tool whose outcomes are anchored to an external ledger doesn't need the same trust signals as an advisory tool whose outputs are freely consumed. The &lt;a href="https://github.com/xkumakichi/xaip-protocol/blob/main/XAIP-SPEC-v0.5-DRAFT.md" rel="noopener noreferrer"&gt;v0.5 draft&lt;/a&gt; tackles that.&lt;/p&gt;

&lt;p&gt;The underlying claim is simple: trust infrastructure for AI agents is too important to depend on any one platform, community, or moderator. The sooner we build it as a portable layer, the sooner the ecosystem can reason about tool choices the way we already reason about TLS certificates and package signatures — with math, not vibes.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;XAIP is MIT-licensed and open source. Feedback on the v0.5 draft is welcome via &lt;a href="https://github.com/xkumakichi/xaip-protocol/issues" rel="noopener noreferrer"&gt;GitHub issues&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>opensource</category>
      <category>webdev</category>
    </item>
    <item>
      <title>What the agent stack is still missing</title>
      <dc:creator>xaip-agent</dc:creator>
      <pubDate>Mon, 20 Apr 2026 23:23:49 +0000</pubDate>
      <link>https://forem.com/xkumakichi/what-the-agent-stack-is-still-missing-3hcn</link>
      <guid>https://forem.com/xkumakichi/what-the-agent-stack-is-still-missing-3hcn</guid>
      <description>&lt;p&gt;This week the agent economy narrative crystallized in three posts.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cameron Winklevoss (Gemini):&lt;/strong&gt; "Humans may have built crypto, but crypto is not so much money for humans as it is money for machines."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Brian Armstrong (Coinbase):&lt;/strong&gt; launched &lt;a href="https://agentic.market" rel="noopener noreferrer"&gt;Agentic.market&lt;/a&gt;, a discovery layer where AI agents find and pay for services over x402.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;t54.ai:&lt;/strong&gt; "Every check in today's financial stack was designed around a human. Signatures, IDs, clicks, chargebacks. When an AI agent is the one transacting, each of those checks has a gap."&lt;/p&gt;

&lt;p&gt;Three different angles, one convergent thesis: &lt;strong&gt;agents are becoming first-class economic actors, and the existing stack doesn't fit them.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Payments have a shipped answer (x402). Discovery now has a shipped answer (Agentic.market). The question I've been sitting with is what sits underneath both of those:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;When an agent calls a service, how does it know the service is trustworthy &lt;em&gt;in practice&lt;/em&gt;, not just in documentation?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That's the trust layer. It's the one that's still missing — and it's the one I've been building.&lt;/p&gt;

&lt;h2&gt;
  
  
  The gap
&lt;/h2&gt;

&lt;p&gt;A signed transaction proves an agent &lt;em&gt;authorized&lt;/em&gt; a call. It doesn't prove the call was &lt;em&gt;safe to make&lt;/em&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The repo can look well-maintained and still ship a buggy release.&lt;/li&gt;
&lt;li&gt;The marketplace listing can be legitimate and still be an attack (see the Ox Security research on MCP marketplace poisoning published April 16).&lt;/li&gt;
&lt;li&gt;The provider can be fine at T=0 and compromised at T=30 days.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are problems payments don't solve. Discovery doesn't solve them either — an agent finding a service via Agentic.market still needs to know if that service has been acting suspiciously over the last 1,000 calls.&lt;/p&gt;

&lt;p&gt;t54.ai's framing — "each of those checks has a gap" — applies one layer lower than they were writing about. The same gap exists for &lt;em&gt;which services an agent should call at all&lt;/em&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  What a trust layer actually is
&lt;/h2&gt;

&lt;p&gt;Three things, in order of difficulty:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Signed receipts&lt;/strong&gt; — an attestation that agent A called server B, dual-signed, hashes only (no raw content).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Aggregation with defense&lt;/strong&gt; — receipts feed a score. The scoring must be Byzantine-robust or the whole thing is theater.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Live scores agents can query before calling&lt;/strong&gt; — one HTTP GET, no auth, no SDK.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Code is the easy part. The hard parts are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cold start.&lt;/strong&gt; A trust layer with no receipts is useless. A trust layer with 10 receipts is misleading.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Caller diversity.&lt;/strong&gt; If one participant dominates the dataset, you're scoring their experience, not the server's.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Adversarial robustness.&lt;/strong&gt; Someone will try to tank a competitor's score. The math has to make that expensive.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The XAIP receipt layer
&lt;/h2&gt;

&lt;p&gt;I shipped one implementation of this. If you want the hook-level walkthrough, the &lt;a href="https://dev.to/xkumakichi/a-claude-code-hook-that-warns-you-before-calling-a-low-trust-mcp-server-ckk"&gt;first article&lt;/a&gt; covers installation and the developer-facing side.&lt;/p&gt;

&lt;p&gt;Briefly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Ed25519-signed receipts per MCP tool call (hashed I/O only)&lt;/li&gt;
&lt;li&gt;Public Cloudflare Worker aggregator, Bayesian scoring, per-server flags (&lt;code&gt;high_error_rate&lt;/code&gt;, &lt;code&gt;low_caller_diversity&lt;/code&gt;, etc.)&lt;/li&gt;
&lt;li&gt;One-command Claude Code hook that consumes the scores and contributes receipts&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Live scores right now (8 servers, ~1,500 receipts, small but real):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;memory      0.800  trusted
git         0.775  trusted
sqlite      0.753  trusted
puppeteer   0.671  caution  (high_error_rate)
context7    0.618  caution  (low_caller_diversity)
filesystem  0.579  caution  (low_caller_diversity)
playwright  0.394  low_trust (high_error_rate)
fetch       0.365  low_trust (high_error_rate)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;curl https://xaip-trust-api.kuma-github.workers.dev/v1/trust/context7&lt;/code&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this is an ecosystem problem, not a product
&lt;/h2&gt;

&lt;p&gt;A trust layer only works if many independent participants contribute receipts. One person running it alone — which is the current state of XAIP — triggers &lt;code&gt;low_caller_diversity&lt;/code&gt; on every high-volume server. That's not a bug; that's the flag working correctly. It's literally telling you not to trust the scores until more callers are in the dataset.&lt;/p&gt;

&lt;p&gt;So I'm not pitching a product. I'm asking: if you're building in the agent space and you think trust scoring is a layer that should exist, contribute receipts. Or run an aggregator node (the spec is in the repo, BFT quorum is the next milestone). Or tell me why the design is wrong.&lt;/p&gt;

&lt;h2&gt;
  
  
  Stack picture
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Agent economy layers (rough)
───────────────────────────────
Payments       → x402 (shipped)
Discovery      → Agentic.market (shipped)
Trust scoring  → XAIP + ?          (small, needs company)
Identity       → DID / passkeys    (fragmented)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;XAIP is one attempt at the trust row. Almost certainly not the final one — but the row has to get filled, and waiting for Anthropic or a well-funded startup to do it means the first large-scale MCP compromise happens before the layer exists.&lt;/p&gt;

&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Live dashboard: &lt;a href="https://xkumakichi.github.io/xaip-protocol/" rel="noopener noreferrer"&gt;https://xkumakichi.github.io/xaip-protocol/&lt;/a&gt; (scores auto-refresh, no auth)&lt;/li&gt;
&lt;li&gt;Previous article: &lt;a href="https://dev.to/xkumakichi/a-claude-code-hook-that-warns-you-before-calling-a-low-trust-mcp-server-ckk"&gt;https://dev.to/xkumakichi/a-claude-code-hook-that-warns-you-before-calling-a-low-trust-mcp-server-ckk&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Repo: &lt;a href="https://github.com/xkumakichi/xaip-protocol" rel="noopener noreferrer"&gt;https://github.com/xkumakichi/xaip-protocol&lt;/a&gt; (MIT, zero deps)&lt;/li&gt;
&lt;li&gt;npm: &lt;a href="https://www.npmjs.com/package/xaip-claude-hook" rel="noopener noreferrer"&gt;https://www.npmjs.com/package/xaip-claude-hook&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Trust API: &lt;a href="https://xaip-trust-api.kuma-github.workers.dev/v1/trust/context7" rel="noopener noreferrer"&gt;https://xaip-trust-api.kuma-github.workers.dev/v1/trust/context7&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you're working on adjacent layers — payment, discovery, identity for agents — I'd be glad to compare notes. The interesting question isn't whose trust layer wins; it's whether &lt;em&gt;any&lt;/em&gt; trust layer exists by the time the stack starts mattering.&lt;/p&gt;

&lt;p&gt;— xkumakichi&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>cryptocurrency</category>
      <category>web3</category>
    </item>
    <item>
      <title>A Claude Code hook that warns you before calling a low-trust MCP server</title>
      <dc:creator>xaip-agent</dc:creator>
      <pubDate>Mon, 20 Apr 2026 14:15:40 +0000</pubDate>
      <link>https://forem.com/xkumakichi/a-claude-code-hook-that-warns-you-before-calling-a-low-trust-mcp-server-ckk</link>
      <guid>https://forem.com/xkumakichi/a-claude-code-hook-that-warns-you-before-calling-a-low-trust-mcp-server-ckk</guid>
      <description>&lt;p&gt;Last week researchers at Ox published findings showing that the MCP STDIO transport lets arbitrary command execution slip through unchecked, and that &lt;a href="https://www.theregister.com/2026/04/16/anthropic_mcp_design_flaw/" rel="noopener noreferrer"&gt;9 of 11 MCP marketplaces they tested were poisonable&lt;/a&gt;. Anthropic's response: STDIO is out of scope for protocol-level fixes, the ecosystem is responsible for operational trust.&lt;/p&gt;

&lt;p&gt;Fair — Anthropic &lt;a href="https://www.anthropic.com/news/donating-the-model-context-protocol-and-establishing-of-the-agentic-ai-foundation" rel="noopener noreferrer"&gt;donated MCP to the Linux Foundation's Agentic AI Foundation in December 2025&lt;/a&gt; specifically so independent infrastructure could grow around it. But that leaves a real gap for anyone running Claude Code today: &lt;strong&gt;how do you know whether an MCP server you're about to invoke is trustworthy?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The Anthropic official registry is pure metadata (license, commit count, popularity). mcp-scorecard.ai scores repos, not behavior. BlueRock runs OWASP-style static scans. None of these ask the one question that actually matters:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Does this MCP server, in real call-time use, work?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;So I built a small thing to answer it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The hook
&lt;/h2&gt;

&lt;p&gt;A zero-config Claude Code hook that does two things on every MCP tool call:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Before the call&lt;/strong&gt; — queries a public trust API for that server. If the score is low, Claude shows an inline warning:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;   ⚠ XAIP: "some-server" trust=0.32 (caution, 87 receipts) Risk: high_error_rate
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;After the call&lt;/strong&gt; — emits an Ed25519-signed receipt (success, latency, hashed input/output) to a public aggregator that updates the score.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Install:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; xaip-claude-hook
xaip-claude-hook &lt;span class="nb"&gt;install&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next MCP call fires the hook. That's the whole UX.&lt;/p&gt;

&lt;h2&gt;
  
  
  What a receipt looks like
&lt;/h2&gt;

&lt;p&gt;No raw content leaves your machine — only hashes.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"agentDid"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="s2"&gt;"did:web:context7"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"callerDid"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="s2"&gt;"did:key:a1c6cd34…"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"toolName"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="s2"&gt;"resolve-library-id"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"taskHash"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="s2"&gt;"9f3e…"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="err"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;sha&lt;/span&gt;&lt;span class="mi"&gt;256&lt;/span&gt;&lt;span class="err"&gt;(input).slice(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="err"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"resultHash"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="s2"&gt;"1b78…"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="err"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;sha&lt;/span&gt;&lt;span class="mi"&gt;256&lt;/span&gt;&lt;span class="err"&gt;(response).slice(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="err"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"success"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;       &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"latencyMs"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="mi"&gt;668&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"failureType"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="s2"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"timestamp"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="s2"&gt;"2026-04-17T04:24:59.925Z"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"signature"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="s2"&gt;"..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="err"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Ed&lt;/span&gt;&lt;span class="mi"&gt;25519&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;over&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;canonical&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;JSON&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;(agent&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;key)&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"callerSignature"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"..."&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="err"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Ed&lt;/span&gt;&lt;span class="mi"&gt;25519&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;over&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;canonical&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;JSON&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;(caller&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;key)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The aggregator rejects anything that fails signature verification. The trust API computes a Bayesian score across all verified receipts per server, weighted by caller diversity — so one enthusiastic installer can't fake a reputation.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the scores actually look like right now
&lt;/h2&gt;

&lt;p&gt;Being transparent: the dataset is small. A &lt;code&gt;curl&lt;/code&gt; against the live trust API today:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Server&lt;/th&gt;
&lt;th&gt;Trust&lt;/th&gt;
&lt;th&gt;Verdict&lt;/th&gt;
&lt;th&gt;Receipts&lt;/th&gt;
&lt;th&gt;Flag&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;memory&lt;/td&gt;
&lt;td&gt;0.800&lt;/td&gt;
&lt;td&gt;trusted&lt;/td&gt;
&lt;td&gt;112&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;git&lt;/td&gt;
&lt;td&gt;0.775&lt;/td&gt;
&lt;td&gt;trusted&lt;/td&gt;
&lt;td&gt;35&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;sqlite&lt;/td&gt;
&lt;td&gt;0.753&lt;/td&gt;
&lt;td&gt;trusted&lt;/td&gt;
&lt;td&gt;42&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;puppeteer&lt;/td&gt;
&lt;td&gt;0.671&lt;/td&gt;
&lt;td&gt;caution&lt;/td&gt;
&lt;td&gt;32&lt;/td&gt;
&lt;td&gt;high_error_rate&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;context7&lt;/td&gt;
&lt;td&gt;0.618&lt;/td&gt;
&lt;td&gt;caution&lt;/td&gt;
&lt;td&gt;560&lt;/td&gt;
&lt;td&gt;low_caller_diversity&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;filesystem&lt;/td&gt;
&lt;td&gt;0.579&lt;/td&gt;
&lt;td&gt;caution&lt;/td&gt;
&lt;td&gt;610&lt;/td&gt;
&lt;td&gt;low_caller_diversity&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;playwright&lt;/td&gt;
&lt;td&gt;0.394&lt;/td&gt;
&lt;td&gt;low_trust&lt;/td&gt;
&lt;td&gt;37&lt;/td&gt;
&lt;td&gt;high_error_rate&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;fetch&lt;/td&gt;
&lt;td&gt;0.365&lt;/td&gt;
&lt;td&gt;low_trust&lt;/td&gt;
&lt;td&gt;36&lt;/td&gt;
&lt;td&gt;high_error_rate&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Verify any of these yourself:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl https://xaip-trust-api.kuma-github.workers.dev/v1/trust/context7
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;low_caller_diversity&lt;/code&gt; flag on high-volume servers is the single most honest number in that table. It means: &lt;strong&gt;I'm the biggest caller right now, and that's exactly the problem this tool is supposed to solve&lt;/strong&gt;. The flag only clears when independent installers start generating receipts — which is what the npm package is for.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this is architecturally different from existing approaches
&lt;/h2&gt;

&lt;p&gt;Every other "MCP trust" project I've seen scores the &lt;em&gt;repository&lt;/em&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Commit frequency, license, stars, contributor count (mcp-scorecard.ai)&lt;/li&gt;
&lt;li&gt;Static source-code vulnerability scans (BlueRock)&lt;/li&gt;
&lt;li&gt;Registry inclusion as implicit trust (official MCP registry)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are useful proxies, but none of them tell you whether a server works in practice. A well-maintained repo can have a buggy release; a single-author repo can be rock solid; a newly-forked malicious repo looks identical to the original under static scan.&lt;/p&gt;

&lt;p&gt;XAIP scores &lt;strong&gt;observed behavior&lt;/strong&gt;. Every call is a signed attestation. The scoring is Bayesian, so:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Servers with few receipts get &lt;code&gt;insufficient_data&lt;/code&gt; — no verdict, no warning&lt;/li&gt;
&lt;li&gt;High-variance patterns (mixed success/failure) get lower confidence&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;high_error_rate&lt;/code&gt; flag is computed from real response content, classifying &lt;code&gt;quota exceeded&lt;/code&gt;, &lt;code&gt;rate limit&lt;/code&gt;, &lt;code&gt;unauthorized&lt;/code&gt;, and &lt;code&gt;"isError": true&lt;/code&gt; as failures&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is the same philosophy as OpenSSF Scorecard vs. runtime attestation in supply chain: you want both, but &lt;em&gt;only one of them catches regressions in production&lt;/em&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's missing / where this could go wrong
&lt;/h2&gt;

&lt;p&gt;I want to be specific about limitations, because "AI trust protocol" posts tend to overpromise:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;~10 servers, ~1500 receipts total.&lt;/strong&gt; Small. This post is partly an ask for installers to fix that.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;One aggregator node.&lt;/strong&gt; Byzantine fault tolerance requires quorum; right now there's one Cloudflare Worker. Quorum needs multiple operators, which is the next milestone.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Client-side inferSuccess is heuristic.&lt;/strong&gt; We look at response text for error patterns. False positives and negatives are possible — fetch's 36% error rate might be over-counted (legit 404s shouldn't hurt the server's score) or real.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Privacy model relies on hashes, not ZK.&lt;/strong&gt; Inputs and outputs are hashed before transmission, but statistical correlation across taskHashes is possible in principle. Migration to ZK receipt aggregation is a future idea, not a current feature.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;I personally generated most of the high-volume receipts.&lt;/strong&gt; The &lt;code&gt;low_caller_diversity&lt;/code&gt; flag you see on context7 and filesystem is me.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Running it yourself
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; xaip-claude-hook
xaip-claude-hook &lt;span class="nb"&gt;install
&lt;/span&gt;xaip-claude-hook status
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Open a new Claude Code session. Call any MCP tool. Check:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cat&lt;/span&gt; ~/.xaip/hook.log
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You'll see lines like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;2026-04-17T04:24:59Z POST context7/resolve-library-id ok=true lat=668ms → 200
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And the next time you (or Claude) invoke a low-trust server, the warning shows up inline.&lt;/p&gt;

&lt;p&gt;Uninstall is a single command. Keys under &lt;code&gt;~/.xaip/&lt;/code&gt; persist — delete manually to wipe.&lt;/p&gt;

&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;npm:&lt;/strong&gt; &lt;a href="https://www.npmjs.com/package/xaip-claude-hook" rel="noopener noreferrer"&gt;https://www.npmjs.com/package/xaip-claude-hook&lt;/a&gt; — &lt;code&gt;npm install -g xaip-claude-hook&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Repo:&lt;/strong&gt; &lt;a href="https://github.com/xkumakichi/xaip-protocol" rel="noopener noreferrer"&gt;https://github.com/xkumakichi/xaip-protocol&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hook source:&lt;/strong&gt; &lt;a href="https://github.com/xkumakichi/xaip-protocol/tree/main/clients/claude-code-hook" rel="noopener noreferrer"&gt;https://github.com/xkumakichi/xaip-protocol/tree/main/clients/claude-code-hook&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Live Trust API:&lt;/strong&gt; &lt;a href="https://xaip-trust-api.kuma-github.workers.dev/v1/trust/context7" rel="noopener noreferrer"&gt;https://xaip-trust-api.kuma-github.workers.dev/v1/trust/context7&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Aggregator:&lt;/strong&gt; &lt;a href="https://xaip-aggregator.kuma-github.workers.dev" rel="noopener noreferrer"&gt;https://xaip-aggregator.kuma-github.workers.dev&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Issues, scoring bugs, angry takes — all welcome on GitHub. If you maintain an MCP server and your score looks wrong, I want to hear about it first.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>mcp</category>
      <category>claude</category>
      <category>opensource</category>
    </item>
    <item>
      <title>AI Agents Pick Tools Blind</title>
      <dc:creator>xaip-agent</dc:creator>
      <pubDate>Tue, 14 Apr 2026 23:43:14 +0000</pubDate>
      <link>https://forem.com/xkumakichi/stop-your-ai-agent-from-picking-broken-mcp-servers-4pa0</link>
      <guid>https://forem.com/xkumakichi/stop-your-ai-agent-from-picking-broken-mcp-servers-4pa0</guid>
      <description>&lt;p&gt;I connected my AI agent to 3 MCP servers.&lt;/p&gt;

&lt;p&gt;It picked one at random.&lt;/p&gt;

&lt;p&gt;It timed out. Then retried a different one. Then finally hit one that worked.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;node without-xaip.js
&lt;span class="go"&gt;
→ Trying: unknown-server...
  ✗ error — package not found (8.2s)

→ Trying: sequential-thinking...
  ✓ connected — but wrong tool for docs task

→ Trying: context7...
  ✓ success (3.1s)

Total: 11.3 seconds, 2 wasted calls
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There are over 1,000 MCP servers now. Your agent has no way to tell which ones are reliable, which ones are broken, and which ones are the right fit.&lt;/p&gt;

&lt;p&gt;So I built a fix: one API call that picks the right server first.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;node with-xaip.js
&lt;span class="go"&gt;
→ XAIP selected: context7 (trust: 1.0, 248 verified executions)
  ✓ success (3.1s)

Total: 3.1 seconds, 0 wasted calls
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is &lt;a href="https://github.com/xkumakichi/xaip-protocol" rel="noopener noreferrer"&gt;XAIP&lt;/a&gt; — trust scoring for AI agents, backed by real execution data. Not benchmarks. Not self-reported metrics. Actual tool-call results, cryptographically signed.&lt;/p&gt;

&lt;h2&gt;
  
  
  A live API you can try right now
&lt;/h2&gt;

&lt;p&gt;No signup, no API key. Just curl:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Trust score for a specific MCP server&lt;/span&gt;
curl https://xaip-trust-api.kuma-github.workers.dev/v1/trust/context7
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"slug"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"context7"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"trust"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"verdict"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"trusted"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"receipts"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;248&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"confidence"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"source"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"xaip-aggregator (quorum:1)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"riskFlags"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"computedFrom"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"248 receipts via XAIP Aggregator BFT (1 nodes)"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or let XAIP pick the best server for your task:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST https://xaip-trust-api.kuma-github.workers.dev/v1/select &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
    "task": "Fetch React documentation",
    "candidates": ["context7", "sequential-thinking", "unknown-server"]
  }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"selected"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"context7"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"reason"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Highest trust (1) from 248 verified executions"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"rejected"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"slug"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"unknown-server"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"reason"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"unscored — no execution data"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"withoutXAIP"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Random selection would pick an unscored server 33% of the time — no execution data, no safety guarantee"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;withoutXAIP&lt;/code&gt; field exists to make the risk visible. It's the answer to "why do I need this?"&lt;/p&gt;

&lt;h2&gt;
  
  
  How it works
&lt;/h2&gt;

&lt;p&gt;XAIP has three moving parts:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Trust API&lt;/strong&gt; — Returns trust scores for MCP servers. Scores come from real execution data, not self-reported metrics.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Decision Engine&lt;/strong&gt; — &lt;code&gt;POST /v1/select&lt;/code&gt; takes a task and a list of candidate servers, returns the best pick with reasoning. Unscored servers are automatically excluded.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Aggregator&lt;/strong&gt; — Collects Ed25519-signed execution receipts. Every tool call produces a cryptographic receipt that feeds back into trust scores.&lt;/p&gt;

&lt;p&gt;The trust model is Bayesian (Beta distribution), weighted by caller diversity to prevent single-caller gaming. If only one caller submits receipts for a server, the score reflects that limited evidence.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Select → Execute → Report
  ↑                    │
  └────────────────────┘
     scores improve
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The data is real
&lt;/h2&gt;

&lt;p&gt;This isn't a mock API. Trust scores are computed from 1,127 actual MCP tool-call executions:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Server&lt;/th&gt;
&lt;th&gt;Trust&lt;/th&gt;
&lt;th&gt;Receipts&lt;/th&gt;
&lt;th&gt;Verdict&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;context7&lt;/td&gt;
&lt;td&gt;1.000&lt;/td&gt;
&lt;td&gt;248&lt;/td&gt;
&lt;td&gt;trusted&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;sequential-thinking&lt;/td&gt;
&lt;td&gt;1.000&lt;/td&gt;
&lt;td&gt;285&lt;/td&gt;
&lt;td&gt;trusted&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;filesystem&lt;/td&gt;
&lt;td&gt;0.909&lt;/td&gt;
&lt;td&gt;594&lt;/td&gt;
&lt;td&gt;caution&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Monitored via &lt;a href="https://github.com/xkumakichi/veridict" rel="noopener noreferrer"&gt;Veridict&lt;/a&gt;, a runtime execution monitor that tracks success rates, latency, and failure types.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;filesystem&lt;/code&gt; scores lower because it has real failures in its history — that's the system working correctly. A trust score should reflect reality, not optimism.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try the full demo
&lt;/h2&gt;

&lt;p&gt;The dogfooding demo runs the complete loop: select a server, execute MCP tool calls, submit a signed receipt, check the updated score.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/xkumakichi/xaip-protocol.git
&lt;span class="nb"&gt;cd &lt;/span&gt;xaip-protocol/demo
npm &lt;span class="nb"&gt;install
&lt;/span&gt;npx tsx dogfood.ts
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Takes about 15 seconds. You'll see XAIP select &lt;code&gt;context7&lt;/code&gt;, execute real tool calls against it, submit a receipt to the Aggregator, and print the comparison table.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's next
&lt;/h2&gt;

&lt;p&gt;XAIP is at v0.4.0. The infrastructure is live and the data is real, but adoption is the bottleneck:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;More servers&lt;/strong&gt; — Currently scoring 3 MCP servers. The system scales to any server, but needs execution data flowing in.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;More callers&lt;/strong&gt; — Caller diversity is the main lever for score accuracy. More independent callers = higher confidence.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Platform integrations&lt;/strong&gt; — Working toward integration with MCP registries like Smithery.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you're building AI agents that use MCP, you can start using the API today. Scores will keep improving as more execution data flows in.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this matters beyond today
&lt;/h2&gt;

&lt;p&gt;Right now, XAIP helps agents pick working tools.&lt;/p&gt;

&lt;p&gt;But this becomes critical when agents start doing more than calling APIs — paying for services, delegating tasks across organizations, executing autonomous workflows.&lt;/p&gt;

&lt;p&gt;At that point, the question changes from "does this tool work?" to "can I trust this agent with money?"&lt;/p&gt;

&lt;p&gt;XAIP is designed for that future. But it already solves a real problem today.&lt;/p&gt;

&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;API&lt;/strong&gt;: &lt;code&gt;https://xaip-trust-api.kuma-github.workers.dev&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GitHub&lt;/strong&gt;: &lt;a href="https://github.com/xkumakichi/xaip-protocol" rel="noopener noreferrer"&gt;xkumakichi/xaip-protocol&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;npm&lt;/strong&gt;: &lt;a href="https://www.npmjs.com/package/xaip-sdk" rel="noopener noreferrer"&gt;xaip-sdk&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Runtime monitor&lt;/strong&gt;: &lt;a href="https://github.com/xkumakichi/veridict" rel="noopener noreferrer"&gt;xkumakichi/veridict&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;XAIP doesn't make agents smarter. It prevents them from making dumb choices.&lt;/p&gt;

&lt;p&gt;Built this because I needed it. If your agent is still picking servers blind, &lt;a href="https://github.com/xkumakichi/xaip-protocol" rel="noopener noreferrer"&gt;give it a try&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>mcp</category>
      <category>agents</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
