Forem: greymoth

Why Kairon runs a separate gRPC authorization service

greymoth — Sat, 23 May 2026 02:32:31 +0000

When you're building a multi-tenant platform where users run autonomous trading agents, "just check a middleware flag" isn't a safety model. It's a hope.

This is how we ended up with Guardian -- a standalone Node.js gRPC server on :50052 that every agent execution gates through before a single order can fire.

The problem with inline auth checks

Our initial instinct was the usual: tRPC middleware, a capability check on the procedure, done. It works fine for UI-driven actions where a bad outcome is a 403 and a sad user. It does not work when the "action" is an autonomous agent executing a trading strategy with real capital.

The failure modes are different. A misconfigured middleware might pass a stale session. A quota check might race against a concurrent execution. An unhandled exception might default-allow instead of default-deny. In a UI context those are bugs. In an agent runtime they're incidents.

We needed authorization to be:

Explicit -- every execution path calls it, no exceptions
Fail-closed -- if the auth service is unreachable, the run is rejected
Auditable -- every decision is a record, not a log line

What Guardian does

Guardian exposes a proto3 service with three RPCs:

service GuardianService {
  rpc CheckCapability(CapabilityRequest) returns (CapabilityResponse);
  rpc CheckQuota(QuotaRequest) returns (QuotaResponse);
  rpc AuthorizeAgentRun(AgentRunRequest) returns (AgentRunResponse);
}

AuthorizeAgentRun is the gate. It calls CheckCapability, then CheckQuota, then writes an execution record. If any step fails or Guardian is unreachable, the run is rejected with reason guardian_unavailable. No silent pass-through.

Why a separate process

Two reasons: practical and principled.

Practical: Guardian enforces hard rate limits at the infrastructure level, isolated from API server memory pressure.

Principled: a separate service audits independently. Our kairon_org_audit_log table has exactly one writer with one responsibility.

The tradeoff

Every agent execution has a gRPC round-trip. That latency is deliberate. Trading agent authorization isn't latency-sensitive -- if your strategy breaks because auth took 2ms, the strategy has bigger problems.

What we gained is a single place where "should this agent run?" is answered and recorded, with an immutable sequence of authorization decisions to replay when something goes wrong.

Building this at kairon.trade. Source: github.com/greymoth-jp.

Building a gRPC Guardian + Intel API on a Prediction OS

greymoth — Wed, 20 May 2026 00:27:14 +0000

What shipped in v0.57

We just cut v0.57 of Kairon Forge — the B2B AI agent platform that ships every agent pre-loaded with prediction-market intelligence.

Guardian gRPC server

The Guardian audit layer is now a dedicated gRPC server. SecurityScanner kernel runs inside, applying rules against each incoming audit record. Five unit tests cover the critical paths.

Intel API real impl

macro_snapshot runs on a 4-hour cron against live Polymarket data. anomaly_detect uses z-score over a configurable rolling window. forecast_calibrated combines market probabilities with historical calibration curves for confidence-banded predictions.

@kairon/sdk v0.0.1

import { KaironClient } from "@kairon/sdk";
const client = new KaironClient({ apiKey: process.env.KAIRON_API_KEY, tier: "pro" });
const snapshot = await client.intel.macroSnapshot({ date: "2026-05-18" });

MCP server (@modelcontextprotocol/server-kairon v0.0.1) exposes the same Intel tools.

Try it

npm install @kairon/sdk @modelcontextprotocol/server-kairon

Forge: kairon.trade/forge

Building an Inference OS: deterministic-first router for prediction markets

greymoth — Wed, 20 May 2026 00:22:07 +0000

Building an Inference OS for prediction markets

Most AI agent stacks default to "throw the prompt at GPT-4o, hope for the best." For prediction markets that's expensive AND wrong — most market questions don't need a paid LLM at all. Here's how we built a 6-hook deterministic-first inference router on top of Kairon Forge.

The 6 hooks (in priority order)

Market Regime classifier — 5 deterministic regimes (whale_dominant / meme_volatile / macro_anchored / panic_liquidation / dead_liquidity). Confident classification short-circuits the entire router. Zero LLM call.
Anomaly detector — 3σ price spike + sentiment divergence. Confident anomaly FORCES Tier-2 (paid Claude/Anthropic), bypassing the viability cost cap on rare-and-important markets.
Time-to-Resolution decay — exponential confidence decay vs event horizon. Low decayed confidence forces Tier-1 (Haiku-only).
Persona overlay — 5 archetype priors (calibrated_researcher / whale_mimic / panic_seller / momentum_trader / contrarian) adjust baseline confidence.
Panic mode circuit breaker — 60s rolling burn-rate σ. >2σ from baseline → force Ollama-only.
Economic Viability Filter — per-tier hard cost cap (Free $0.05 / Pro $0.50 / Elite $5 / Enterprise $100). >cap → 402 quotaExhausted.

Cost-aware Cognition

Before every paid call, EIG / cost ratio gate (shouldEscalate(eig, cost, threshold=0.5)). Information gain ÷ inference cost. Below threshold → collapse to Tier-1 + budget consumption note.

Test coverage

350+ inference tests covering router decision boundaries. Components: budget consumption gate, complexity classifier (trivial / medium / rare_hard), Tier-2 dispatch, recursion-depth + context-bloat guards, reflection-loop + duplicate-prompt detection.

Why this matters

Cursor's silent auto-upgrade on quota exhaustion triggered viral brand backlash + US state class-action allegations. We engineered a structural answer: tier caps, panic mode, no-auto-charge — all enforced at the router layer.

Source: github.com/greymoth-jp · Live: kairon.trade

This is part of the API Kernel work at services/kairon-guardian/ — happy to answer architecture questions.