<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: greymoth</title>
    <description>The latest articles on Forem by greymoth (@greymothjp).</description>
    <link>https://forem.com/greymothjp</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3937147%2F66fce836-aa25-43f0-bb5f-632fc17ebf44.jpeg</url>
      <title>Forem: greymoth</title>
      <link>https://forem.com/greymothjp</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/greymothjp"/>
    <language>en</language>
    <item>
      <title>Why Kairon runs a separate gRPC authorization service</title>
      <dc:creator>greymoth</dc:creator>
      <pubDate>Sat, 23 May 2026 02:32:31 +0000</pubDate>
      <link>https://forem.com/greymothjp/why-kairon-runs-a-separate-grpc-authorization-service-foh</link>
      <guid>https://forem.com/greymothjp/why-kairon-runs-a-separate-grpc-authorization-service-foh</guid>
      <description>&lt;p&gt;When you're building a multi-tenant platform where users run autonomous trading agents, "just check a middleware flag" isn't a safety model. It's a hope.&lt;/p&gt;

&lt;p&gt;This is how we ended up with Guardian -- a standalone Node.js gRPC server on &lt;code&gt;:50052&lt;/code&gt; that every agent execution gates through before a single order can fire.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem with inline auth checks
&lt;/h2&gt;

&lt;p&gt;Our initial instinct was the usual: tRPC middleware, a capability check on the procedure, done. It works fine for UI-driven actions where a bad outcome is a 403 and a sad user. It does not work when the "action" is an autonomous agent executing a trading strategy with real capital.&lt;/p&gt;

&lt;p&gt;The failure modes are different. A misconfigured middleware might pass a stale session. A quota check might race against a concurrent execution. An unhandled exception might default-allow instead of default-deny. In a UI context those are bugs. In an agent runtime they're incidents.&lt;/p&gt;

&lt;p&gt;We needed authorization to be:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Explicit&lt;/strong&gt; -- every execution path calls it, no exceptions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fail-closed&lt;/strong&gt; -- if the auth service is unreachable, the run is rejected&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Auditable&lt;/strong&gt; -- every decision is a record, not a log line&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What Guardian does
&lt;/h2&gt;

&lt;p&gt;Guardian exposes a proto3 service with three RPCs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight protobuf"&gt;&lt;code&gt;&lt;span class="kd"&gt;service&lt;/span&gt; &lt;span class="n"&gt;GuardianService&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;rpc&lt;/span&gt; &lt;span class="n"&gt;CheckCapability&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;CapabilityRequest&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;returns&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;CapabilityResponse&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;rpc&lt;/span&gt; &lt;span class="n"&gt;CheckQuota&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;QuotaRequest&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;returns&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;QuotaResponse&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;rpc&lt;/span&gt; &lt;span class="n"&gt;AuthorizeAgentRun&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;AgentRunRequest&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;returns&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;AgentRunResponse&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;AuthorizeAgentRun&lt;/code&gt; is the gate. It calls &lt;code&gt;CheckCapability&lt;/code&gt;, then &lt;code&gt;CheckQuota&lt;/code&gt;, then writes an execution record. If any step fails or Guardian is unreachable, the run is rejected with reason &lt;code&gt;guardian_unavailable&lt;/code&gt;. No silent pass-through.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why a separate process
&lt;/h2&gt;

&lt;p&gt;Two reasons: practical and principled.&lt;/p&gt;

&lt;p&gt;Practical: Guardian enforces hard rate limits at the infrastructure level, isolated from API server memory pressure.&lt;/p&gt;

&lt;p&gt;Principled: a separate service audits independently. Our &lt;code&gt;kairon_org_audit_log&lt;/code&gt; table has exactly one writer with one responsibility.&lt;/p&gt;

&lt;h2&gt;
  
  
  The tradeoff
&lt;/h2&gt;

&lt;p&gt;Every agent execution has a gRPC round-trip. That latency is deliberate. Trading agent authorization isn't latency-sensitive -- if your strategy breaks because auth took 2ms, the strategy has bigger problems.&lt;/p&gt;

&lt;p&gt;What we gained is a single place where "should this agent run?" is answered and recorded, with an immutable sequence of authorization decisions to replay when something goes wrong.&lt;/p&gt;

&lt;p&gt;Building this at kairon.trade. Source: github.com/greymoth-jp.&lt;/p&gt;

</description>
      <category>node</category>
      <category>typescript</category>
      <category>architecture</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Building a gRPC Guardian + Intel API on a Prediction OS</title>
      <dc:creator>greymoth</dc:creator>
      <pubDate>Wed, 20 May 2026 00:27:14 +0000</pubDate>
      <link>https://forem.com/greymothjp/building-a-grpc-guardian-intel-api-on-a-prediction-os-5fjm</link>
      <guid>https://forem.com/greymothjp/building-a-grpc-guardian-intel-api-on-a-prediction-os-5fjm</guid>
      <description>&lt;h2&gt;
  
  
  What shipped in v0.57
&lt;/h2&gt;

&lt;p&gt;We just cut v0.57 of Kairon Forge — the B2B AI agent platform that ships every agent pre-loaded with prediction-market intelligence.&lt;/p&gt;

&lt;h2&gt;
  
  
  Guardian gRPC server
&lt;/h2&gt;

&lt;p&gt;The Guardian audit layer is now a dedicated gRPC server. SecurityScanner kernel runs inside, applying rules against each incoming audit record. Five unit tests cover the critical paths.&lt;/p&gt;

&lt;h2&gt;
  
  
  Intel API real impl
&lt;/h2&gt;

&lt;p&gt;macro_snapshot runs on a 4-hour cron against live Polymarket data. anomaly_detect uses z-score over a configurable rolling window. forecast_calibrated combines market probabilities with historical calibration curves for confidence-banded predictions.&lt;/p&gt;

&lt;h2&gt;
  
  
  @kairon/sdk v0.0.1
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;KaironClient&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@kairon/sdk&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;KaironClient&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;KAIRON_API_KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;tier&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;pro&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;snapshot&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;intel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;macroSnapshot&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;date&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;2026-05-18&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;MCP server (&lt;code&gt;@modelcontextprotocol/server-kairon&lt;/code&gt; v0.0.1) exposes the same Intel tools.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; @kairon/sdk @modelcontextprotocol/server-kairon
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Forge: kairon.trade/forge&lt;/p&gt;

</description>
      <category>ai</category>
      <category>governance</category>
      <category>webdev</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Building an Inference OS: deterministic-first router for prediction markets</title>
      <dc:creator>greymoth</dc:creator>
      <pubDate>Wed, 20 May 2026 00:22:07 +0000</pubDate>
      <link>https://forem.com/greymothjp/building-an-inference-os-deterministic-first-router-for-prediction-markets-3g2j</link>
      <guid>https://forem.com/greymothjp/building-an-inference-os-deterministic-first-router-for-prediction-markets-3g2j</guid>
      <description>&lt;h1&gt;
  
  
  Building an Inference OS for prediction markets
&lt;/h1&gt;

&lt;p&gt;Most AI agent stacks default to "throw the prompt at GPT-4o, hope for the best." For prediction markets that's expensive AND wrong — most market questions don't need a paid LLM at all. Here's how we built a 6-hook deterministic-first inference router on top of Kairon Forge.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 6 hooks (in priority order)
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Market Regime classifier&lt;/strong&gt; — 5 deterministic regimes (whale_dominant / meme_volatile / macro_anchored / panic_liquidation / dead_liquidity). Confident classification short-circuits the entire router. Zero LLM call.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Anomaly detector&lt;/strong&gt; — 3σ price spike + sentiment divergence. Confident anomaly FORCES Tier-2 (paid Claude/Anthropic), bypassing the viability cost cap on rare-and-important markets.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Time-to-Resolution decay&lt;/strong&gt; — exponential confidence decay vs event horizon. Low decayed confidence forces Tier-1 (Haiku-only).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Persona overlay&lt;/strong&gt; — 5 archetype priors (calibrated_researcher / whale_mimic / panic_seller / momentum_trader / contrarian) adjust baseline confidence.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Panic mode circuit breaker&lt;/strong&gt; — 60s rolling burn-rate σ. &amp;gt;2σ from baseline → force Ollama-only.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Economic Viability Filter&lt;/strong&gt; — per-tier hard cost cap (Free $0.05 / Pro $0.50 / Elite $5 / Enterprise $100). &amp;gt;cap → 402 quotaExhausted.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Cost-aware Cognition
&lt;/h2&gt;

&lt;p&gt;Before every paid call, EIG / cost ratio gate (&lt;code&gt;shouldEscalate(eig, cost, threshold=0.5)&lt;/code&gt;). Information gain ÷ inference cost. Below threshold → collapse to Tier-1 + budget consumption note.&lt;/p&gt;

&lt;h2&gt;
  
  
  Test coverage
&lt;/h2&gt;

&lt;p&gt;350+ inference tests covering router decision boundaries. Components: budget consumption gate, complexity classifier (trivial / medium / rare_hard), Tier-2 dispatch, recursion-depth + context-bloat guards, reflection-loop + duplicate-prompt detection.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this matters
&lt;/h2&gt;

&lt;p&gt;Cursor's silent auto-upgrade on quota exhaustion triggered viral brand backlash + US state class-action allegations. We engineered a structural answer: tier caps, panic mode, no-auto-charge — all enforced at the router layer.&lt;/p&gt;

&lt;p&gt;Source: github.com/greymoth-jp · Live: kairon.trade&lt;/p&gt;




&lt;p&gt;This is part of the API Kernel work at services/kairon-guardian/ — happy to answer architecture questions.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>typescript</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
