<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: RAKESH THERANI</title>
    <description>The latest articles on Forem by RAKESH THERANI (@rakeshtherani).</description>
    <link>https://forem.com/rakeshtherani</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3915069%2F2cc58337-0e5c-494d-a95f-b451731b28b0.png</url>
      <title>Forem: RAKESH THERANI</title>
      <link>https://forem.com/rakeshtherani</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/rakeshtherani"/>
    <language>en</language>
    <item>
      <title>Four LLM Engines, One ClickHouse Cluster: An Agentic AI Architecture</title>
      <dc:creator>RAKESH THERANI</dc:creator>
      <pubDate>Thu, 14 May 2026 06:19:43 +0000</pubDate>
      <link>https://forem.com/rakeshtherani/four-llm-engines-one-clickhouse-cluster-an-agentic-ai-architecture-55h8</link>
      <guid>https://forem.com/rakeshtherani/four-llm-engines-one-clickhouse-cluster-an-agentic-ai-architecture-55h8</guid>
      <description>&lt;p&gt;We are building an agentic AI analytics platform for a crypto exchange where internal teams — Trading Ops, Risk, Compliance, Finance, Treasury, Product, Engineering — ask questions in plain English and get audited, citation-enforced answers.&lt;/p&gt;

&lt;p&gt;It's built on five open-source components:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;ClickHouse&lt;/strong&gt; — data + vector + observability storage&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Qwen 2.5 72B&lt;/strong&gt; — self-hosted LLM via vLLM&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Anthropic MCP&lt;/strong&gt; — zero-trust tool layer&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LibreChat&lt;/strong&gt; — chat UI (acquired by ClickHouse, November 2025)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Langfuse&lt;/strong&gt; — LLM observability (acquired by ClickHouse, January 2026)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The defining property is &lt;strong&gt;four execution engines on shared infrastructure&lt;/strong&gt;, each tuned for a different question shape:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Engine&lt;/th&gt;
&lt;th&gt;Typical signals (semantic, not keyword)&lt;/th&gt;
&lt;th&gt;What it does&lt;/th&gt;
&lt;th&gt;Latency&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Analytics&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Data retrieval, aggregations, single-domain "what is" patterns&lt;/td&gt;
&lt;td&gt;NL → SQL on pre-joined marts&lt;/td&gt;
&lt;td&gt;&amp;lt;2s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Numeric Research&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Match to one of 20 catalogued fitted-model patterns (regimes, probabilities, elasticities, etc.)&lt;/td&gt;
&lt;td&gt;Templated lookup against cached model output&lt;/td&gt;
&lt;td&gt;&amp;lt;1s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Optimization&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Decision-making with constraints + objective + trade-off ("what &lt;em&gt;should&lt;/em&gt; we do")&lt;/td&gt;
&lt;td&gt;NL → structured math model → solver → reviewable plan&lt;/td&gt;
&lt;td&gt;30s–5min&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Deep Research&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Multi-source explanation requiring decomposition + reasoning ("why did this happen")&lt;/td&gt;
&lt;td&gt;Planner → workers → critic → synthesizer loop&lt;/td&gt;
&lt;td&gt;60–90s&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Plus four continuous self-learning loops, a four-layer trust defense, and a 24-week build timeline. The rest of this post walks through each piece with concrete examples.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why This Exists
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The problem in current operations
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Analyst bottleneck.&lt;/strong&gt; Risk, Compliance, Finance teams wait days for ad-hoc SQL. Analytics is a ticket queue, not a self-service capability.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Investigations are slow.&lt;/strong&gt; &lt;em&gt;"Why did revenue drop 12%?"&lt;/em&gt; takes 2 days of analyst time and ships a 30-slide deck — often wrong on a key dimension that emerges weeks later.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Optimization is spreadsheet-based.&lt;/strong&gt; Insurance fund allocation, fee tier choices, wallet rebalancing — done in Excel because actual OR tools require OR specialists.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compliance is reactive.&lt;/strong&gt; SAR drafts written from scratch each time. Wash trading detection runs only when someone happens to look. Regulator inquiries take a week.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No audit trail for ad-hoc analysis.&lt;/strong&gt; When numbers ship to the CFO or regulator, you can't reproduce them six months later.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The fix
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Self-service NL&lt;/strong&gt; for non-engineers. Data team stops being a ticket queue.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deep Research engine&lt;/strong&gt; that produces cited investigations in 60–90 seconds vs analyst-hours.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Optimization layer&lt;/strong&gt; that takes business problems in English, formulates math models, runs solvers, returns reviewable plans for human approval.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Citation enforcement&lt;/strong&gt; so every claim traces back to specific SQL queries with parameter bindings — auditable, reproducible.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Self-learning loops&lt;/strong&gt; so the platform gets better weekly without engineering intervention.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  End-to-End Flow
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─────────────────────────────────────────────────────────────────┐
│  USER (1 of 7 teams)                                            │
│  Types: "Why did Q2 fee revenue drop 12% vs Q1?"                │
└──────────────────────┬──────────────────────────────────────────┘
                       ▼
┌─────────────────────────────────────────────────────────────────┐
│  LIBRECHAT                                                      │
│  • SSO authentication; team role attached to session            │
│  • Session memory in MongoDB                                    │
│  • 128K context window for conversation history                 │
│  • Streams response back as it's generated                      │
└──────────────────────┬──────────────────────────────────────────┘
                       ▼
┌─────────────────────────────────────────────────────────────────┐
│  TRIAGE AGENT (Qwen 72B)                                        │
│  Semantic classification + structured output:                   │
│    {engine, confidence, case_id, extracted_params, reasoning}   │
│  Not keyword matching — full LLM intent inference               │
└──────────────────────┬──────────────────────────────────────────┘
                       ▼
┌──────────────────────┴──────────────────────────────────────────┐
│            ROUTED TO ONE OF FOUR EXECUTION ENGINES              │
└─────────────────────────────────────────────────────────────────┘
       │              │              │             │
       ▼              ▼              ▼             ▼
   ANALYTICS    NR LOOKUP    OPTIMIZATION   DEEP RESEARCH
       │            │              │             │
       ▼            ▼              ▼             ▼
  LLM writes   Templated     LLM emits      Planner →
  SQL grounded SELECT        structured     Workers →
  by schema +  from cache    JSON model     Critic →
  glossary     marts.numeric → template    Synthesizer
               _research_    generates      (5 rounds max)
               runs          CVXPY/CP-SAT
                             → solver
       │            │              │             │
       └────────────┴──────────────┴─────────────┘
                       │
                       ▼
       ┌─────────────────────────────────────────────┐
       │  MCP LAYER                                  │
       │  • ClickHouse MCP — read-only SELECT only   │
       │  • Solver MCP — solve_lp/mip/cp/convex      │
       │  • Numeric Research MCP — run_nr            │
       │  • Vector search MCP — similarity lookups   │
       │  • All tools allowlisted per agent          │
       │  • Bearer-token auth                        │
       └────────────────┬────────────────────────────┘
                        ▼
       ┌─────────────────────────────────────────────┐
       │  CLICKHOUSE                                 │
       │  • Raw → Staging → Marts (medallion)        │
       │  • HNSW vector indexes (25.8+ GA)           │
       │  • Refreshable MVs auto-update marts        │
       │  • Persistence tables:                      │
       │    – marts.research_runs (Deep Research)    │
       │    – marts.research_evidence (citations)    │
       │    – marts.optimization_runs (plans)        │
       │    – marts.numeric_research_runs (NR cache) │
       │    – marts.langfuse_flat (agent traces)     │
       └────────────────┬────────────────────────────┘
                        ▼
       ┌─────────────────────────────────────────────┐
       │  SYNTHESIZER (LLM)                          │
       │  • Renders prose answer from evidence       │
       │  • Citation regex enforcement:              │
       │    every claim must have [ev-N]             │
       │    → 3 retries; fallback to LOW confidence  │
       │  • Confidence label (HIGH/MEDIUM/LOW)       │
       │  • Cross-engine handoffs (DR → Opt for fix) │
       └────────────────┬────────────────────────────┘
                        ▼
       ┌─────────────────────────────────────────────┐
       │  LIBRECHAT renders answer to user           │
       │  Every step traced in Langfuse              │
       │  Trace ID returned for audit replay         │
       └─────────────────────────────────────────────┘
                        ▼
       ┌─────────────────────────────────────────────┐
       │  SELF-LEARNING LOOPS fire continuously      │
       │  • Glossary expansion (real-time + nightly) │
       │  • Mart recommendations (nightly)           │
       │  • LoRA fine-tuning (weekly)                │
       │  • Eval-set regression (weekly)             │
       └─────────────────────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Triage — How the Agent Chooses an Engine
&lt;/h2&gt;

&lt;p&gt;This is the most important decision in the platform. Get it wrong and everything downstream is wrong.&lt;/p&gt;

&lt;h3&gt;
  
  
  It's semantic classification, not keyword matching
&lt;/h3&gt;

&lt;p&gt;The triage agent is &lt;strong&gt;Qwen 72B doing intent inference&lt;/strong&gt; — the LLM reads the full question and reasons about what the user is actually trying to do. The verb cues are &lt;em&gt;hints&lt;/em&gt; in the system prompt, not the routing rule. A user asking &lt;em&gt;"show me which markets to consolidate"&lt;/em&gt; routes to Optimization (not Analytics, despite the "show" verb) because the LLM infers that "which markets to consolidate" is a selection-with-objective decision. The classifier emits a structured JSON output with engine choice, confidence score, extracted parameters, and a one-sentence reasoning string.&lt;/p&gt;

&lt;h3&gt;
  
  
  What the triage prompt actually contains
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;You&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;classify&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;user&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;questions&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;into&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;one&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;of&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;four&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;engines.&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="err"&gt;ENGINES:&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="err"&gt;.&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;ANALYTICS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;—&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;answers&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"what is happening"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;questions&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;with&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;a&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;single&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;SQL&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="err"&gt;query&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;against&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;pre-joined&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;marts.&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Best&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;for:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;aggregations,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;counts,&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="err"&gt;listings,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;filters&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;over&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;single&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;domains.&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="err"&gt;Sample&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;questions:&lt;/span&gt;&lt;span class="w"&gt;
     &lt;/span&gt;&lt;span class="err"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Top 10 markets by 24h volume"&lt;/span&gt;&lt;span class="w"&gt;
     &lt;/span&gt;&lt;span class="err"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"All withdrawals over $50K last week"&lt;/span&gt;&lt;span class="w"&gt;
     &lt;/span&gt;&lt;span class="err"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Average fill time on BTC-PERP today"&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="err"&gt;.&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;NUMERIC_RESEARCH&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;—&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;looks&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;up&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;cached&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;statistical&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;models.&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Use&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;when&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;the&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="err"&gt;question&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;matches&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;one&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;of&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;catalogued&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;NR&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;cases&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;(regimes,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;elasticity,&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="err"&gt;probabilities,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;transition&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;matrices,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;etc.).&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Lookup&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;is&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;sub-second;&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="err"&gt;models&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;are&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;fitted&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;weekly&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;offline.&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="err"&gt;Sample&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;questions&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;matched&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;to&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;NR&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;cases:&lt;/span&gt;&lt;span class="w"&gt;
     &lt;/span&gt;&lt;span class="err"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"What market regime is BTC in?"&lt;/span&gt;&lt;span class="w"&gt;          &lt;/span&gt;&lt;span class="err"&gt;→&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;NR&lt;/span&gt;&lt;span class="mi"&gt;-01&lt;/span&gt;&lt;span class="w"&gt;
     &lt;/span&gt;&lt;span class="err"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Probability IF covers a 15% shock?"&lt;/span&gt;&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="err"&gt;→&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;NR&lt;/span&gt;&lt;span class="mi"&gt;-04&lt;/span&gt;&lt;span class="w"&gt;
     &lt;/span&gt;&lt;span class="err"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Forecast VIP tier distribution"&lt;/span&gt;&lt;span class="w"&gt;         &lt;/span&gt;&lt;span class="err"&gt;→&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;NR&lt;/span&gt;&lt;span class="mi"&gt;-08&lt;/span&gt;&lt;span class="w"&gt;
     &lt;/span&gt;&lt;span class="err"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Maker fee elasticity for tier 2"&lt;/span&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="err"&gt;→&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;NR&lt;/span&gt;&lt;span class="mi"&gt;-07&lt;/span&gt;&lt;span class="w"&gt;
     &lt;/span&gt;&lt;span class="err"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Almgren-Chriss impact on BTC-PERP"&lt;/span&gt;&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="err"&gt;→&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;NR&lt;/span&gt;&lt;span class="mi"&gt;-19&lt;/span&gt;&lt;span class="w"&gt;
     &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;cases&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;total...&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="err"&gt;.&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;OPTIMIZATION&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;—&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;formulates&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;a&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;math&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;program&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;and&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;runs&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;a&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;solver.&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="err"&gt;Best&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;for:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;allocation,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;scheduling,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;sequencing,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;pricing&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;optimization,&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="err"&gt;trade-off&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;problems&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;where&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;user&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;wants&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;*best*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;answer.&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="err"&gt;Sample&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;questions:&lt;/span&gt;&lt;span class="w"&gt;
     &lt;/span&gt;&lt;span class="err"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Allocate $50M insurance fund across markets"&lt;/span&gt;&lt;span class="w"&gt;
     &lt;/span&gt;&lt;span class="err"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Sequence today's liquidations to minimize impact"&lt;/span&gt;&lt;span class="w"&gt;
     &lt;/span&gt;&lt;span class="err"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Choose maker fees to maximize revenue with 5% retention cap"&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="err"&gt;.&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;DEEP_RESEARCH&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;—&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;multi-step&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;planner&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;→&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;workers&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;→&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;critic&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;→&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;synthesizer&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="err"&gt;loop.&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Best&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;for:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"why"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;investigations,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;root-cause&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;analyses,&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="err"&gt;multi-source&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;evidence&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;gathering.&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Takes&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;60-90&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;seconds.&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="err"&gt;Sample&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;questions:&lt;/span&gt;&lt;span class="w"&gt;
     &lt;/span&gt;&lt;span class="err"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Why did Q2 fee revenue drop 12%?"&lt;/span&gt;&lt;span class="w"&gt;
     &lt;/span&gt;&lt;span class="err"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Investigate user X for wash trading"&lt;/span&gt;&lt;span class="w"&gt;
     &lt;/span&gt;&lt;span class="err"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Liquidation cascade post-mortem"&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="err"&gt;RETURN&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;STRUCTURED&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;JSON:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"engine"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ANALYTICS | NUMERIC_RESEARCH | OPTIMIZATION | DEEP_RESEARCH"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"confidence"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.0-1.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"case_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"&amp;lt;NR-XX if Numeric Research, else null&amp;gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"extracted_params"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"fallback_engine"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"&amp;lt;second-best engine&amp;gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"reasoning"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"one-sentence justification"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The LLM does semantic matching against the &lt;strong&gt;engine descriptions and sample questions&lt;/strong&gt;, then emits a confidence score. Keywords are mnemonics — not the routing logic.&lt;/p&gt;

&lt;h3&gt;
  
  
  Real triage output for ambiguous cases
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;User:&lt;/strong&gt; &lt;em&gt;"Show me which 5 markets to consolidate this quarter"&lt;/em&gt;&lt;br&gt;
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"engine"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"OPTIMIZATION"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"confidence"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.84&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"case_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"extracted_params"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"n_markets"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"time_window"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"quarter"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"fallback_engine"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ANALYTICS"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"reasoning"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Despite 'show me' verb, 'which markets to consolidate' is a selection-with-objective decision — handled by optimization solver, not a SQL query."&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;User:&lt;/strong&gt; &lt;em&gt;"Why is &lt;code&gt;marts.mart_user_trading_activity&lt;/code&gt; slow today?"&lt;/em&gt;&lt;br&gt;
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"engine"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ANALYTICS"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"confidence"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.91&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"case_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"extracted_params"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"mart"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"mart_user_trading_activity"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"fallback_engine"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"DEEP_RESEARCH"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"reasoning"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Despite 'why' verb, this is a single-target query latency question answerable from system.query_log — engineering analytics, not investigation."&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  When confidence is low
&lt;/h3&gt;

&lt;p&gt;If &lt;code&gt;confidence &amp;lt; 0.7&lt;/code&gt;, triage asks a one-sentence clarifier:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"I can interpret this two ways. Would you like a quick aggregation against current data (Analytics) or a deeper investigation tracing root causes (Deep Research)?"&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;User picks → routed. The question + chosen engine are logged → a self-learning loop may add a sample question to the triage prompt for better future disambiguation.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Four Engines (In Detail)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  ENGINE 1 — ANALYTICS (~80% of all questions)
&lt;/h3&gt;

&lt;p&gt;The 80%-case engine. Most user questions are "show me X" or "count Y across Z" — those get answered here.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How the agent decides:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Triage classifies as Analytics based on semantic intent (not just verbs)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Schema injection&lt;/strong&gt; — system prompt includes live &lt;code&gt;list_tables()&lt;/code&gt; output + the top-5 marts retrieved by semantic similarity to the user question&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Glossary injection&lt;/strong&gt; — relevant glossary terms loaded (&lt;code&gt;liquidation = order_type IN (1003, 1004)&lt;/code&gt;, &lt;code&gt;whale = users WHERE 30d_volume &amp;gt; $10M&lt;/code&gt;, etc.)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LLM writes SQL&lt;/strong&gt; grounded by the schema + glossary&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pre-execution validator&lt;/strong&gt; (code, not LLM) checks SQL is SELECT-only, references existing tables, has LIMIT&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MCP executes&lt;/strong&gt; against &lt;code&gt;llm_role&lt;/code&gt; (read-only, mart-only access)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Synthesizer renders&lt;/strong&gt; as formatted table or summary&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Why this is reliable:&lt;/strong&gt; Pre-joined marts mean the LLM doesn't write JOINs — it picks a mart and filters. The #1 NL→SQL failure mode (bad JOINs) is eliminated by design.&lt;/p&gt;

&lt;h4&gt;
  
  
  Worked example 1.1 — Trading volume + liquidation roll-up
&lt;/h4&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;User (Risk Manager):&lt;/strong&gt; &lt;em&gt;"Top 10 markets by 24-hour volume with liquidation count and average liquidation size."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;LLM generates:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;market_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;volume_usd_24h&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;liq_count_24h&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;total_liq_volume_usd&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="k"&gt;nullIf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;liq_count_24h&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;avg_liq_size_usd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;liq_count_24h&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;trade_count_24h&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;         &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;liq_rate_pct&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;marts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mart_volume_daily&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt;
&lt;span class="k"&gt;LEFT&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;marts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mart_liquidations_summary&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;market_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;market_name&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;trade_date&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;today&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;volume_usd_24h&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Output:&lt;/strong&gt; 10-row table sorted by volume desc. &lt;strong&gt;Latency: ~500ms. Cost: ~$0.005.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Note the agent retrieved &lt;code&gt;liquidation = order_type IN (1003, 1004)&lt;/code&gt; from the glossary deterministically — never guessed.&lt;/p&gt;

&lt;h4&gt;
  
  
  Worked example 1.2 — Compliance cross-table screening
&lt;/h4&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;User (Compliance Officer):&lt;/strong&gt; &lt;em&gt;"All withdrawals over $50K in the last 7 days where the user has a KYC level below 2."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;LLM generates:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;username&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;coin_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;amount_usdt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;address&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;kyc_level&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fregistertime&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;create_date_time&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;staging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;balance_audit&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;staging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;f_user&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt; &lt;span class="k"&gt;USING&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;username&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;in_out_type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;          &lt;span class="c1"&gt;-- withdrawal direction (from glossary)&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;amount_usdt&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;50000&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;create_date_time&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;today&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;7&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;kyc_level&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;amount_usdt&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Output:&lt;/strong&gt; flagged-user list with KYC level. &lt;strong&gt;Latency: ~1s.&lt;/strong&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Worked example 1.3 — Trading bot performance breakdown
&lt;/h4&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;User (Product Manager):&lt;/strong&gt; &lt;em&gt;"How many trading bots are running on BTC-PERP by bot type, and what's the median ROI for each?"&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;LLM generates:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;bot_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;running_bot_count&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;quantile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)(&lt;/span&gt;&lt;span class="n"&gt;roi&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;median_roi&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;quantile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)(&lt;/span&gt;&lt;span class="n"&gt;roi&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;median_roi_pct&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;total_investment_usdt&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;total_capital_deployed&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;marts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mart_trading_bot_performance&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="k"&gt;state&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'RUNNING'&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;symbol&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'BTC-PERP'&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;bot_type&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;running_bot_count&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Output:&lt;/strong&gt; grouped table with quantile aggregation. &lt;strong&gt;Latency: ~800ms.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What happens when Analytics misses:&lt;/strong&gt; If LLM-generated SQL references a non-existent column, the pre-execution validator rejects it. The agent retries with corrected schema. Repeated failures log to Langfuse → a learning loop may draft a glossary entry to handle the term differently next time.&lt;/p&gt;




&lt;h3&gt;
  
  
  ENGINE 2 — NUMERIC RESEARCH LOOKUP (cached fitted models)
&lt;/h3&gt;

&lt;p&gt;This engine serves cached output from 20 statistical models that run &lt;strong&gt;offline&lt;/strong&gt; (weekly or monthly). The lookup itself is a deterministic templated SELECT — sub-second. The actual model fitting is heavy Python (scikit-learn, statsmodels, scipy, lifelines, PyMC) done in batch jobs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How the agent decides:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Triage classifies as Numeric Research → matches one of 20 NR cases by semantic similarity to the user question (not by keyword)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Templated lookup&lt;/strong&gt; runs against &lt;code&gt;marts.numeric_research_runs&lt;/code&gt; — the SQL &lt;strong&gt;skeleton is fixed&lt;/strong&gt;; only &lt;code&gt;case_id&lt;/code&gt; and filter params are extracted by LLM&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Branch on freshness:&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Fresh&lt;/strong&gt; (within case-specific window — varies from 24h for IF coverage to 30 days for VIP transitions) → parse &lt;code&gt;predictions_json&lt;/code&gt;, return cached&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stale&lt;/strong&gt; → call &lt;code&gt;run_nr(case_id=...)&lt;/code&gt; MCP tool to trigger synchronous refit (~5–30s depending on case)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Missing&lt;/strong&gt; (new market, new case) → honest "no model for this market yet" + log for catalogue recommendation&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Synthesizer renders&lt;/strong&gt; with confidence label + last-fit timestamp&lt;/li&gt;
&lt;/ol&gt;

&lt;h4&gt;
  
  
  Worked example 2.1 — Volatility regime (NR-01)
&lt;/h4&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;User (Trader):&lt;/strong&gt; &lt;em&gt;"What market regime is BTC-PERP in right now?"&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Agent decision flow:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Triage matches NR-01 (&lt;code&gt;Volatility Regime Classification&lt;/code&gt;) by semantic similarity to NR-01's catalogued sample question&lt;/li&gt;
&lt;li&gt;Extracted params: &lt;code&gt;{symbol: "BTC-PERP"}&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Templated SELECT:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;predictions_json&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;next_run_at&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;marts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;numeric_research_runs&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;nr_case_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'NR-01'&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;JSONExtractString&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;inputs_json&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'symbol'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'BTC-PERP'&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt; &lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Returned &lt;code&gt;predictions_json&lt;/code&gt;:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"current_regime"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"regime_label"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"high-vol bullish skew"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"days_in_current_regime"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"avg_persistence_days"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;8.2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"transition_probs"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"to_regime_0_low_vol"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;          &lt;/span&gt;&lt;span class="mf"&gt;0.08&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"to_regime_1_neutral"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;          &lt;/span&gt;&lt;span class="mf"&gt;0.42&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"to_regime_2_high_vol_bullish"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.28&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"to_regime_3_high_vol_bearish"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.22&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"silhouette_score"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.61&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Synthesizer renders:
&amp;gt; &lt;em&gt;"BTC-PERP is in regime 2 (high-vol bullish skew). Regimes persist ~8 days on average; this is day 5. Next-week transition probabilities: stay 28%, neutral 42%, bearish 22%, low-vol 8%. Last fit: 14h ago."&lt;/em&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Latency: &amp;lt;1s. Cost: ~$0.001.&lt;/strong&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Worked example 2.2 — Insurance fund coverage (NR-04)
&lt;/h4&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;User (CRO):&lt;/strong&gt; &lt;em&gt;"What's our insurance fund coverage probability at a 15% shock?"&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Agent decision flow:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Triage matches NR-04 (&lt;code&gt;Insurance Fund Coverage Probability — Monte Carlo&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;LLM extracts &lt;code&gt;shock_pct = 0.15&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Templated lookup returns Monte Carlo output (excerpt):
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"method"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"monte_carlo_100k_trials"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"current_if_usdt"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;50000000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"results_by_shock"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"0.05"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"p_breach"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.0024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"wilson_ci_95"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;0.0021&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.0027&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"0.10"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"p_breach"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.0158&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"wilson_ci_95"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;0.0150&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.0166&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"0.15"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"p_breach"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.0792&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"wilson_ci_95"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;0.0775&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.0809&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"0.20"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"p_breach"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.1834&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"wilson_ci_95"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;0.1810&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.1858&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"top_tail_risk_contributors"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"market"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"BTC-PERP"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"share_pct"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;38&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"market"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ETH-PERP"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"share_pct"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;24&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"market"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"SOL-PERP"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"share_pct"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;21&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Synthesizer renders:
&amp;gt; &lt;em&gt;"Insurance fund coverage probability at 15% shock: 92.1% (CI: 91.9–92.3%). At 20% shock: 81.7%. Top tail-risk contributors: BTC-PERP (38%), ETH-PERP (24%), SOL-PERP (21%). Last refit: 4h ago."&lt;/em&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Latency: &amp;lt;1s. Cost: ~$0.001.&lt;/strong&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Worked example 2.3 — VIP tier transitions (NR-08)
&lt;/h4&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;User (CFO):&lt;/strong&gt; &lt;em&gt;"Forecast next month's VIP tier distribution. Are we losing VIP-3 users?"&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Agent decision flow:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Triage matches NR-08 (&lt;code&gt;VIP Tier Transition Probabilities — Markov chain&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Templated lookup returns full 7×7 transition matrix&lt;/li&gt;
&lt;li&gt;Orchestrator &lt;strong&gt;(code, not LLM)&lt;/strong&gt; does matrix multiplication: current distribution × transition matrix = next-month forecast&lt;/li&gt;
&lt;li&gt;Synthesizer renders:
&amp;gt; &lt;em&gt;"Next-month forecast: VIP-3 will drop from 480 → 403 (-16%). Demotion rate (T3→T2) has spiked from 5.2% historical to 9.0% this month. Recommend investigating VIP-3 satisfaction. Last refit: 3 weeks ago."&lt;/em&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Latency: &amp;lt;1s. Cost: ~$0.002.&lt;/strong&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Worked example 2.4 — Almgren-Chriss impact (NR-19) — machine-to-machine
&lt;/h4&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Optimization engine&lt;/strong&gt; (not a direct user): the liquidation sequencing solver needs per-market impact coefficients before solving.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Agent decision flow (machine-to-machine):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Solver agent issues templated lookup; returned per-market (α, β) for &lt;code&gt;impact_bps = α × (size / depth)^β&lt;/code&gt;:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"BTC-PERP"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"alpha"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;1.24&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"beta"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.51&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"r_squared"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.78&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"n_obs"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;18420&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"ETH-PERP"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"alpha"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;1.55&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"beta"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.49&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"r_squared"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.74&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"n_obs"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;12100&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"SOL-PERP"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"alpha"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;2.18&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"beta"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.62&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"r_squared"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.66&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"n_obs"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="mi"&gt;5240&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Solver uses these as priors in the CP-SAT scheduling model&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Latency: &amp;lt;1s for the lookup. If stale, optimizer waits 5–15s for refit.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is the &lt;strong&gt;NR feeding Optimization&lt;/strong&gt; pattern. NR results aren't always user-facing; they often feed downstream engines.&lt;/p&gt;

&lt;h4&gt;
  
  
  How the 20 NR cases tile the domain
&lt;/h4&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Category&lt;/th&gt;
&lt;th&gt;NR cases&lt;/th&gt;
&lt;th&gt;Used by&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Market structure&lt;/td&gt;
&lt;td&gt;NR-01 (regimes), NR-15 (cap-hits), NR-16 (correlation)&lt;/td&gt;
&lt;td&gt;Risk, Trading&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Liquidity / impact&lt;/td&gt;
&lt;td&gt;NR-03 (depth function), NR-19 (Almgren-Chriss)&lt;/td&gt;
&lt;td&gt;Trading, Optimization&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Funding mechanics&lt;/td&gt;
&lt;td&gt;NR-02 (elasticity)&lt;/td&gt;
&lt;td&gt;Risk, Finance, Optimization&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;User behavior&lt;/td&gt;
&lt;td&gt;NR-06 (LTV survival), NR-08 (VIP transitions), NR-12 (retention curves)&lt;/td&gt;
&lt;td&gt;Product, Finance, Growth&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Compliance / AML&lt;/td&gt;
&lt;td&gt;NR-05 (wash trade prevalence), NR-10 (withdrawal z-score), NR-11 (alert P/R)&lt;/td&gt;
&lt;td&gt;Compliance&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Risk modeling&lt;/td&gt;
&lt;td&gt;NR-04 (IF coverage MC), NR-09 (HHI concentration)&lt;/td&gt;
&lt;td&gt;Risk&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Revenue / pricing&lt;/td&gt;
&lt;td&gt;NR-07 (rebate elasticity), NR-14 (Gini concentration), NR-17 (affiliate payback)&lt;/td&gt;
&lt;td&gt;Finance, Growth, Optimization&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Options&lt;/td&gt;
&lt;td&gt;NR-18 (Greeks aggregation)&lt;/td&gt;
&lt;td&gt;Risk, Trading&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Internal product&lt;/td&gt;
&lt;td&gt;NR-20 (PT settlement KS-test)&lt;/td&gt;
&lt;td&gt;Product, Risk&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Adding a new case is a configuration exercise: catalogue entry + fit script + cadence + sample question. The platform handles the rest.&lt;/p&gt;




&lt;h3&gt;
  
  
  ENGINE 3 — OPTIMIZATION (NL → math model → solver → reviewable plan)
&lt;/h3&gt;

&lt;p&gt;The decision-making engine. Where Analytics answers "what is X?", Optimization answers "what &lt;em&gt;should&lt;/em&gt; X be?" — formulating a math program from a business question, running it through a solver, returning a reviewable plan with assumptions, sensitivity, and alternatives.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How the agent decides:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Triage classifies as Optimization based on intent — decision-making with constraints + objective, regardless of verb cues&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LLM emits structured JSON&lt;/strong&gt; describing the math program (decision variables with bounds, objective with sense, constraints, parameters). The LLM does &lt;strong&gt;not&lt;/strong&gt; write Python — it emits structured data.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deterministic Python template&lt;/strong&gt; converts JSON → CVXPY / OR-Tools / Pyomo / HiGHS code&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sandbox executes&lt;/strong&gt; the model (docker + nsjail + cgroups; no network egress; 60s wall-clock; 2GB cap)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Solver runs&lt;/strong&gt; — backend chosen by &lt;code&gt;model_type&lt;/code&gt; field:

&lt;ul&gt;
&lt;li&gt;LP → HiGHS&lt;/li&gt;
&lt;li&gt;MIP → CBC, SCIP, optionally Gurobi or CPLEX&lt;/li&gt;
&lt;li&gt;CP → OR-Tools CP-SAT&lt;/li&gt;
&lt;li&gt;Convex (SOCP / SDP / NLP) → CVXPY with CLARABEL (default), ECOS / SCS fallback&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Plan persisted&lt;/strong&gt; to &lt;code&gt;marts.optimization_runs&lt;/code&gt; with full audit trail&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Synthesizer renders&lt;/strong&gt; plan with binding constraints, sensitivity, alternative formulations&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Critical:&lt;/strong&gt; the platform produces a &lt;strong&gt;plan&lt;/strong&gt;, not an action. Human approves before any operational effect. By design — never autonomous.&lt;/p&gt;

&lt;h4&gt;
  
  
  Worked example 3.1 — Insurance fund allocation (Convex, CVXPY)
&lt;/h4&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;User (CRO):&lt;/strong&gt; &lt;em&gt;"Allocate the $50M insurance fund across BTC/ETH/SOL to minimize worst-case 30-day loss with 5% per-market floor."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Structured JSON emitted by LLM:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"model_type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"convex"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"decision_variables"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"x_btc"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"continuous"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"lb"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"x_eth"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"continuous"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"lb"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"x_sol"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"continuous"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"lb"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"parameters"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"cvar"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;0.18&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.31&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.42&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"budget"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="mi"&gt;50000000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"min_pct"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="mf"&gt;0.05&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"max_pct"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="mf"&gt;0.40&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"objective"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"sense"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"minimize"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"expression"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"cvar_btc*x_btc + cvar_eth*x_eth + cvar_sol*x_sol"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"constraints"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"x_btc + x_eth + x_sol == budget"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"x_i &amp;gt;= min_pct * budget"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"x_i &amp;lt;= max_pct * budget"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Template generates CVXPY code; CLARABEL solver runs in sandbox. &lt;strong&gt;Plan output:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Plan opt-20260512-001 — Insurance Fund Allocation
Status: optimal · Solver: CVXPY/CLARABEL · Solve time: 47ms

Decision:
  BTC market: $20.0M (40% — at ceiling)
  ETH market: $20.0M (40% — at ceiling)
  SOL market: $10.0M (20%)

Weighted CVaR contribution: $14.0M
  (= 0.18×$20M + 0.31×$20M + 0.42×$10M)

Binding constraints: budget, BTC ceiling, ETH ceiling

Sensitivity:
  Cap relaxed to 50%  → loss drops to $12.5M (-10.5%)
  SOL drawdown ±10%   → allocation unchanged (cap binds)

Alternatives considered:
  Equal-weight ($16.7M each): $15.2M (+8.4%)
  Min-VaR (not CVaR):         $14.9M (+6.1%)

Assumptions:
  - 90-day drawdown distributions (sourced from NR-04)
  - Per-market floor 5% / ceiling 40% (from glossary)

Want to: relax the ceiling? change the lookback? add markets?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Latency: ~3s. Cost: ~$0.05.&lt;/strong&gt; Decision human-approved before any IF movement.&lt;/p&gt;

&lt;h4&gt;
  
  
  Worked example 3.2 — Compliance case routing (CP-SAT, OR-Tools)
&lt;/h4&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;User (Head of Compliance):&lt;/strong&gt; &lt;em&gt;"Assign today's 25 open cases (12 SAR drafts, 8 KYC reviews, 5 transaction audits) to the 8 investigators on shift. Minimize SLA breach risk. SAR drafts require senior investigators."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Math type:&lt;/strong&gt; Mixed-integer assignment problem. Decision variable &lt;code&gt;x[i,j] ∈ {0,1}&lt;/code&gt; = "case i assigned to investigator j". Objective: minimize weighted SLA breach risk. Constraints: each case to exactly one investigator; per-investigator capacity in hours; senior-investigator filter for SAR drafts.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Plan output:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Plan opt-20260512-002 — Compliance Case Routing
Status: optimal · Solver: OR-Tools CP-SAT · Solve time: 1.8s

Assignments (25 cases → 8 investigators):
  alice  : 3 cases  (1 SAR, 2 KYC)            load 7.0 / 8.0 hrs
  bob    : 3 cases  (1 SAR, 1 audit, 1 KYC)   load 7.0 / 8.0 hrs
  carol  : 3 cases  (1 SAR, 2 KYC)            load 7.0 / 8.0 hrs
  dan    : 3 cases  (1 SAR, 1 audit, 1 KYC)   load 7.0 / 8.0 hrs
  eve    : 3 cases  (2 KYC, 1 audit)          load 5.0 / 8.0 hrs   [senior, slack]
  frank  : 3 cases  (1 SAR, 2 KYC)            load 7.0 / 8.0 hrs
  grace  : 3 cases  (1 SAR, 2 KYC)            load 7.0 / 8.0 hrs
  henry  : 4 cases  (1 SAR, 2 audit, 1 KYC)   load 8.0 / 8.0 hrs

Total expected SLA breach risk score: 0.42  (vs 1.18 for FIFO)

Binding constraints:
  - Senior investigators (4) — fully loaded with SAR drafts
  - henry hours_available — at capacity

Risks flagged:
  - 1 SAR is unassignable today (no senior capacity)
    → Recommend: escalate to overflow team OR delay 1 lower-priority audit

Want to: see what changes with +1 senior? change SLA hours?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Latency: ~2s. Cost: ~$0.04.&lt;/strong&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Worked example 3.3 — Maker fee tier optimization (MIP with elasticity)
&lt;/h4&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;User (CFO):&lt;/strong&gt; &lt;em&gt;"Choose maker fee values for tiers 1–5 to maximize monthly fee revenue, subject to ≤5% volume retention loss. Use last 90 days as baseline."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Math type:&lt;/strong&gt; MIP with discrete fee choices (0.5bp increments) and piecewise-linear elasticity from NR-07. Five integer decision variables. Constraint: total volume ≥ 95% of baseline.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Plan output:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Plan opt-20260512-003 — Maker Fee Tier Optimization
Status: optimal · Solver: HiGHS MIP · Solve time: 230ms

Recommended fees:
  Tier 1 (≤$10M/mo):     +2.0 bps  (current +2.5)
  Tier 2 ($10–50M):      +1.0 bps  (current +1.5)
  Tier 3 ($50–250M):      0.0 bps  (current +0.5)
  Tier 4 ($250M–1B):     -1.5 bps  (current -1.0)
  Tier 5 (&amp;gt;$1B):         -3.5 bps  (current -3.0)

Projected 30-day revenue:  $9.4M   (current $8.5M, +10.6%)
Projected 30-day volume:  $40.4B   (current $42.3B, -4.5%)

Volume loss 4.5% — within 5% glossary cap.

Binding constraints: max_volume_drop_pct = 0.05

Sensitivity:
  Elasticity 1.5× current → revenue gain drops to +6.2%
  Elasticity 0.5× current → revenue gain rises to +14.8%
  → Plan moderately sensitive to elasticity — recommend A/B test before rollout

Assumptions:
  - Linear elasticity 1.2% volume / bp (from glossary NR-07)
  - 90-day baseline volumes
  - Tier boundaries unchanged

Want to: vary elasticity? cap at 3%? add a 6th tier?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Latency: ~1s. Cost: ~$0.04.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What makes Optimization safe:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;LLM never writes Python&lt;/strong&gt; — emits structured JSON; fixed template generates solver code. Successful prompt injection can't escape template structure.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sandbox containment&lt;/strong&gt; — code runs with no network, 60s timeout, 2GB cap, ephemeral container.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reviewable plan not action&lt;/strong&gt; — every plan ships with assumptions, binding constraints, sensitivity, alternatives. Human approves before any operational effect.&lt;/li&gt;
&lt;/ol&gt;




&lt;h3&gt;
  
  
  ENGINE 4 — DEEP RESEARCH (multi-step planner / workers / critic / synthesizer)
&lt;/h3&gt;

&lt;p&gt;The highest-stakes engine. Used when a question requires reasoning across multiple data sources, where there's no fixed catalogue match, and where the answer needs a structured investigation report with citations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How the agent decides:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Triage classifies as Deep Research based on intent (multi-source explanation requiring decomposition + reasoning) — not just on "why" keyword&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Planner (LLM)&lt;/strong&gt; decomposes the question into 7–15 sub-questions, each tagged with a tool hint (SQL / vector search / past plans / web fetch / solver)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Worker pool (LLM-driven, code-coordinated)&lt;/strong&gt; executes sub-questions in parallel via &lt;code&gt;asyncio.Semaphore(6)&lt;/code&gt;. Each worker:

&lt;ul&gt;
&lt;li&gt;Generates SQL or vector query&lt;/li&gt;
&lt;li&gt;Executes via MCP&lt;/li&gt;
&lt;li&gt;Returns evidence record tagged &lt;code&gt;[ev-N]&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Critic (LLM)&lt;/strong&gt; reads evidence ledger. Decides "sufficient" or "investigate further" (spawn 1–8 more sub-questions). Bounded to 5 rounds max.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Synthesizer (LLM)&lt;/strong&gt; produces structured report. Every claim must carry &lt;code&gt;[ev-N]&lt;/code&gt;. Regex check + 3-retry fallback enforces this.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Persistence&lt;/strong&gt; — report → &lt;code&gt;marts.research_runs&lt;/code&gt;; per-evidence record → &lt;code&gt;marts.research_evidence&lt;/code&gt;. Both indexed by embedding for future similar-question retrieval (auto-memory).&lt;/li&gt;
&lt;/ol&gt;

&lt;h4&gt;
  
  
  Worked example 4.1 — Q2 fee revenue variance attribution
&lt;/h4&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;User (CFO):&lt;/strong&gt; &lt;em&gt;"Why did Q2 fee revenue drop 12% vs Q1?"&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Planner output (7 initial sub-questions):&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;sq&lt;/th&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Question&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;sq-1&lt;/td&gt;
&lt;td&gt;SQL&lt;/td&gt;
&lt;td&gt;Volume × rate decomposition Q1 vs Q2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;sq-2&lt;/td&gt;
&lt;td&gt;SQL&lt;/td&gt;
&lt;td&gt;Drop by product (spot / perp / options)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;sq-3&lt;/td&gt;
&lt;td&gt;SQL&lt;/td&gt;
&lt;td&gt;Drop by region&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;sq-4&lt;/td&gt;
&lt;td&gt;SQL&lt;/td&gt;
&lt;td&gt;Drop by VIP tier&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;sq-5&lt;/td&gt;
&lt;td&gt;Past plans&lt;/td&gt;
&lt;td&gt;Was there a fee schedule change?&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;sq-6&lt;/td&gt;
&lt;td&gt;Vector search&lt;/td&gt;
&lt;td&gt;Market regime shift Q2 vs Q1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;sq-7&lt;/td&gt;
&lt;td&gt;Web fetch&lt;/td&gt;
&lt;td&gt;External regulatory events&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Workers execute in parallel (~30s):&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;sq&lt;/th&gt;
&lt;th&gt;Finding&lt;/th&gt;
&lt;th&gt;Confidence&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;sq-1&lt;/td&gt;
&lt;td&gt;Vol Q1 $130B, Q2 $118B (−9.2%); rate 1.05 → 1.03 bps (−1.9%)&lt;/td&gt;
&lt;td&gt;HIGH&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;sq-2&lt;/td&gt;
&lt;td&gt;Perpetuals −15%; spot flat; options +8%&lt;/td&gt;
&lt;td&gt;HIGH&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;sq-3&lt;/td&gt;
&lt;td&gt;APAC −18%; EU −2%; Americas +3%&lt;/td&gt;
&lt;td&gt;HIGH&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;sq-4&lt;/td&gt;
&lt;td&gt;VIP 4-5 tiers dropped most (−22%); retail flat&lt;/td&gt;
&lt;td&gt;HIGH&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;sq-5&lt;/td&gt;
&lt;td&gt;Maker rebate widened 0.5 bp on Apr 14 (decision opt-20260414-007)&lt;/td&gt;
&lt;td&gt;HIGH&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;sq-6&lt;/td&gt;
&lt;td&gt;Q2 had 35 low-vol days vs Q1's 12 (cosine 0.91 to 2024-Q3)&lt;/td&gt;
&lt;td&gt;MEDIUM&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;sq-7&lt;/td&gt;
&lt;td&gt;No major regulatory events in Q2&lt;/td&gt;
&lt;td&gt;LOW&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Critic round 1:&lt;/strong&gt; &lt;em&gt;"VIP-tier × APAC intersection is biggest single factor (−22% in tier 4-5, −18% APAC). Need user-level breakdown to find concentration."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Spawn 2 more sub-questions:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;| sq-8 | SQL | Top 50 APAC VIPs by Q1 volume — 8 reduced volume &amp;gt;50%; 2 went to zero |&lt;br&gt;
| sq-9 | SQL | Those 8 share 3 affiliate codes (aff_217, aff_223, aff_341) |&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Critic round 2:&lt;/strong&gt; &lt;em&gt;"Sufficient. Synthesize."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Synthesizer output&lt;/strong&gt; (citation-enforced):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Investigation rsrch-20260512-001 — Q2 fee revenue variance
Verdict: HIGH CONFIDENCE — VIP migration + secondary regime effect

Decomposition (−12.0%):
  Volume effect: −9.2%   [ev-1]
  Rate effect:   −1.9%   [ev-5]  (Apr 14 maker rebate widening)
  Mix effect:    −0.9%

Headline findings:
  1. 8 APAC VIPs reduced volume &amp;gt;50%; 2 went to zero          [ev-8]
  2. These 8 share 3 affiliate codes — likely competitor migration [ev-9]
  3. Q2 had 35 low-vol days vs Q1's 12 — secondary effect      [ev-6]
  4. Apr 14 rebate widening explains −1.9% rate drop           [ev-5]

Recommendations:
  HIGH:   VIP relations contact 8 APAC accounts directly
  HIGH:   Growth audit aff_217 / aff_223 / aff_341
  MEDIUM: Stress-test fee model under continued low-vol regime
          (handoff to optimization layer)

Confidence: HIGH on factual findings; MEDIUM on competitor-migration hypothesis
Investigation cost: $0.07 · 67s · 9 sub-queries · trace lf-finance-q2-001
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Latency: 67s. Cost: $0.07.&lt;/strong&gt; The same investigation took FP&amp;amp;A 2 days last quarter and got the affiliate angle wrong (discovered 3 weeks later).&lt;/p&gt;

&lt;h4&gt;
  
  
  Worked example 4.2 — Wash trading investigation
&lt;/h4&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;User (Compliance Officer):&lt;/strong&gt; &lt;em&gt;"Investigate user &lt;code&gt;whale_42&lt;/code&gt; for wash trading suspicion. Be thorough — this may become a SAR filing."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Planner output (9 sub-questions):&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;sq&lt;/th&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Question&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;sq-1&lt;/td&gt;
&lt;td&gt;Vector&lt;/td&gt;
&lt;td&gt;Trade pattern self-similarity vs historical wash-trader templates&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;sq-2&lt;/td&gt;
&lt;td&gt;SQL&lt;/td&gt;
&lt;td&gt;Round-trip frequency (buy/sell within N seconds)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;sq-3&lt;/td&gt;
&lt;td&gt;SQL&lt;/td&gt;
&lt;td&gt;Counterparty concentration&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;sq-4&lt;/td&gt;
&lt;td&gt;SQL&lt;/td&gt;
&lt;td&gt;Device fingerprint overlap with other accounts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;sq-5&lt;/td&gt;
&lt;td&gt;SQL&lt;/td&gt;
&lt;td&gt;Withdrawal patterns vs deposit patterns (net P&amp;amp;L)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;sq-6&lt;/td&gt;
&lt;td&gt;SQL&lt;/td&gt;
&lt;td&gt;Affiliate / referral chain&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;sq-7&lt;/td&gt;
&lt;td&gt;Vector RAG&lt;/td&gt;
&lt;td&gt;KYC document similarity to other accounts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;sq-8&lt;/td&gt;
&lt;td&gt;Past investigations&lt;/td&gt;
&lt;td&gt;Historical confirmed wash traders — pattern match&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;sq-9&lt;/td&gt;
&lt;td&gt;SQL&lt;/td&gt;
&lt;td&gt;IP / geo-rotation patterns&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Workers execute (~40s):&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;sq&lt;/th&gt;
&lt;th&gt;Finding&lt;/th&gt;
&lt;th&gt;Confidence&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;sq-1&lt;/td&gt;
&lt;td&gt;Trade pattern cosine 0.94 to confirmed wash-trader template (Jan 2025)&lt;/td&gt;
&lt;td&gt;HIGH&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;sq-2&lt;/td&gt;
&lt;td&gt;78% of trades are round-trips &amp;lt; 5 minutes&lt;/td&gt;
&lt;td&gt;HIGH&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;sq-3&lt;/td&gt;
&lt;td&gt;64% of volume against single counterparty &lt;code&gt;acct_X&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;HIGH&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;sq-4&lt;/td&gt;
&lt;td&gt;4 other accounts share device fingerprint hash&lt;/td&gt;
&lt;td&gt;HIGH&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;sq-5&lt;/td&gt;
&lt;td&gt;Withdrawals match deposits within 0.1% — net-zero P&amp;amp;L&lt;/td&gt;
&lt;td&gt;HIGH&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;sq-6&lt;/td&gt;
&lt;td&gt;All 5 accounts referred by same affiliate aff_991&lt;/td&gt;
&lt;td&gt;HIGH&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;sq-7&lt;/td&gt;
&lt;td&gt;KYC doc for whale_42 has 0.87 similarity to KYC of acct_X (different photo, same template)&lt;/td&gt;
&lt;td&gt;MEDIUM&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;sq-8&lt;/td&gt;
&lt;td&gt;2 prior cases referred by aff_991 confirmed wash trading&lt;/td&gt;
&lt;td&gt;MEDIUM&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;sq-9&lt;/td&gt;
&lt;td&gt;All 5 accounts log in from rotating IPs in same /24 subnet&lt;/td&gt;
&lt;td&gt;HIGH&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Critic round 1:&lt;/strong&gt; &lt;em&gt;"Strong evidence across 8 of 9 sub-questions. Recommend immediate compliance escalation."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Synthesizer output:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Investigation rsrch-20260512-002 — whale_42 wash-trading suspicion
Verdict: HIGH CONFIDENCE — wash trading + Sybil network

Hypotheses tested:
  H1  Coordinated wash-trade pattern    CONFIRMED  [ev-1, ev-2]
  H2  Sybil network (5 linked accounts) CONFIRMED  [ev-4, ev-6, ev-9]
  H3  Same-counterparty self-trading    CONFIRMED  [ev-3]
  H4  Affiliate-chain coordination      CONFIRMED  [ev-6, ev-8]
  H5  Inadvertent / unrelated activity  FALSIFIED  [ev-5 net-zero P&amp;amp;L]

Headline findings:
  1. Trade pattern matches confirmed wash-trader template (cosine 0.94) [ev-1]
  2. 5 accounts share device fingerprint, IP subnet, affiliate code   [ev-4, ev-6, ev-9]
  3. 78% round-trip trades &amp;lt; 5 min, 64% against single counterparty   [ev-2, ev-3]
  4. P&amp;amp;L net-zero — economic activity has no purpose other than volume [ev-5]
  5. Affiliate aff_991 has 2 prior confirmed wash-trading cases       [ev-8]

Recommendations:
  IMMEDIATE: Freeze 5 accounts; suspend aff_991
  IMMEDIATE: File SAR (handoff to SAR draft template)
  MEDIUM:    Review all aff_991-referred accounts (~40 users)

Investigation cost: $0.21 · 71s · 9 sub-queries · trace lf-compliance-whale42-001
Filing-ready evidence pack persisted to marts.research_evidence
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Latency: 71s. Cost: $0.21.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The same investigation took an analyst 4–8 hours, produced a free-text Slack message with no evidence trail, and required a second analyst to reconstruct reasoning before SAR filing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What makes Deep Research trustworthy:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Citation enforcement&lt;/strong&gt; — every claim must carry &lt;code&gt;[ev-N]&lt;/code&gt;. Synthesizer output regex-checked; failures trigger up to 3 retries before falling back to &lt;code&gt;LOW&lt;/code&gt; confidence with gaps flagged.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Critic gate&lt;/strong&gt; — synthesis cannot proceed until critic returns "sufficient". Multiple rounds catch premature conclusions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Persistence + replay&lt;/strong&gt; — every investigation lands in &lt;code&gt;marts.research_runs&lt;/code&gt; with full evidence ledger. Six months later you can replay the exact reasoning steps via the trace ID.&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Self-Learning — Four Continuous Feedback Loops
&lt;/h2&gt;

&lt;p&gt;The platform improves weekly without engineering intervention. Each loop operates on a different cadence and improves a different surface.&lt;/p&gt;

&lt;h3&gt;
  
  
  Loop 1 — Glossary Expansion (1-day cycle)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Purpose:&lt;/strong&gt; Unfamiliar domain terms become deterministic SQL fragments.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User asks question with unfamiliar term
            │
            ▼
Resolution hierarchy attempts to resolve:
   1. Live MCP list_tables — no exact match
   2. Glossary lookup — no entry for the term
   3. LLM disambiguation fallback — proposes interpretation
            │
            ▼
Agent returns answer with explicit disclaimer:
"I interpreted X as Y — confirm or refine?"
            │
            ├─ 👍 confirmed     → propose entry to Slack queue
            └─ 👎 corrected     → user-supplied definition queued
            │
            ▼
Data team approves entry in Slack (1-click)
            │
            ▼
Glossary updated — next user gets deterministic answer
in &amp;lt;1s without disambiguation
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Cadence:&lt;/strong&gt; real-time inline + nightly batch&lt;br&gt;
&lt;strong&gt;Effect:&lt;/strong&gt; glossary grows from ~50 terms at week 1 to ~400+ by month 6&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Worked example:&lt;/strong&gt; Day 1: User asks &lt;em&gt;"show me VIP user churn this quarter"&lt;/em&gt;. Glossary has no "churn" entry. Inline LLM proposes &lt;code&gt;last_trade_at &amp;lt; now() - INTERVAL 30 DAY AND prior_30d_volume &amp;gt; 0&lt;/code&gt;. User clicks 👍. Slack posts: &lt;em&gt;"Propose glossary entry — &lt;code&gt;churn&lt;/code&gt;: [SQL]. Approve?"&lt;/em&gt; Data team approves day 2 morning. Day 3: Another user asks &lt;em&gt;"how many churned users last week"&lt;/em&gt;. Glossary lookup returns deterministic SQL. Answer in 800ms. No LLM disambiguation needed. &lt;strong&gt;Same question would have taken ~3s on day 1.&lt;/strong&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Loop 2 — Mart Recommendation (nightly batch)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Purpose:&lt;/strong&gt; Frequent slow query patterns become permanent pre-joined marts.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;NIGHTLY JOB
Mines marts.langfuse_flat for high-frequency staging queries:
  • &amp;gt; 50 times per week
  • &amp;gt; 5s average latency
  • similar query fingerprint via normalizeQuery()
            │
            ▼
Auto-drafts CREATE MATERIALIZED VIEW recommendation
with predicted speedup (e.g., 8s → &amp;lt;500ms)
            │
            ▼
Posts to #data-platform Slack
            │
            ▼
1-click approve → MV deployed
            │
            ▼
Next query hits the mart instead of computing the JOIN from scratch
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Cadence:&lt;/strong&gt; nightly&lt;br&gt;
&lt;strong&gt;Effect:&lt;/strong&gt; mart count grows from ~15 at week 1 → ~35 by month 6&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Worked example:&lt;/strong&gt; 50 users this week asked variations of &lt;em&gt;"users approaching liquidation"&lt;/em&gt; — all answered via a 3-table JOIN taking 8 seconds. The recommendation job detects the pattern; auto-drafts &lt;code&gt;marts.mart_near_liquidation&lt;/code&gt; MV definition; predicts speedup to &amp;lt;500ms; posts to Slack. Approved; deployed overnight. Within 24h, every related query hits the mart at sub-second latency.&lt;/p&gt;
&lt;h3&gt;
  
  
  Loop 3 — LoRA Fine-Tuning (weekly)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Purpose:&lt;/strong&gt; Per-team specialist accuracy improves from real corrections.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User clicks 👎 with corrected SQL — captured in Langfuse
            │
            ▼
WEEKLY JOB
Batches all corrections since last run:
  ~50–200 corrections per team per week
            │
            ▼
QLoRA training (PEFT + bitsandbytes, NF4 4-bit on Qwen 72B)
  • LoraConfig r=16
  • per-team adapter (e.g., risk_lora_v12)
  • trains overnight on 24GB GPU
            │
            ▼
A/B test harness:
  • Run eval set against current + candidate adapter
  • Route 10% of traffic to candidate for 48h
  • Promote if accuracy +2pp AND latency regression &amp;lt;10%
            │
            ▼
Per-team multi-LoRA serving via vLLM (--enable-lora --max-loras 8)
Each team's specialist gets their own adapter at no GPU cost
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Cadence:&lt;/strong&gt; weekly train, weekly A/B promote&lt;br&gt;
&lt;strong&gt;Effect:&lt;/strong&gt; specialist accuracy improves +2–5pp per quarter for first 6 months, then asymptotes&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Worked example:&lt;/strong&gt; Compliance team's specialist, trained on 6 months of corrections, has learned that SAR draft questions specifically need device-fingerprint joins (the agent kept missing this — corrected ~30 times). After v12 adapter promotion, accuracy on SAR-related questions jumps from 78% → 91%.&lt;/p&gt;
&lt;h3&gt;
  
  
  Loop 4 — Eval-Set Regression Daemon (weekly automated)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Purpose:&lt;/strong&gt; Catch drift / regression in CI before production users notice.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SUNDAY 06:00 UTC — k8s CronJob fires
            │
            ▼
Runs against current production pipeline:
  • 30 known-answer deep-research investigations
  • 50 known-answer SQL questions
  • per-team specialist eval sets
            │
            ▼
Compare every result vs ground truth:
  • Did headline findings match?
  • Was confidence calibration correct?
  • Latency within SLA?
  • Cost within budget?
            │
            ▼
Persisted to marts.eval_set_runs
            │
            ▼
If HIGH-confidence accuracy drops &amp;gt;2pp from last week:
  → BLOCK next prompt or model promotion
  → Post diff to Slack with regression details
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Cadence:&lt;/strong&gt; weekly automated — never skip&lt;br&gt;
&lt;strong&gt;Effect:&lt;/strong&gt; prompt-change accidents caught within 24h instead of in production&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Worked example:&lt;/strong&gt; Engineering pushes a prompt change to the planner template Friday afternoon. Saturday a user reports vague &lt;em&gt;"the bot seems off today."&lt;/em&gt; Sunday 06:30 UTC eval daemon detects HIGH-confidence accuracy dropped 3.4pp on the 30-investigation eval set. Slack alert. Promotion blocked. Engineering investigates Monday, finds the edit accidentally removed citation-enforcement instructions. Reverts. Eval recovers next Sunday.&lt;/p&gt;

&lt;h3&gt;
  
  
  Plus six supporting intelligences
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;Intelligence&lt;/th&gt;
&lt;th&gt;What it learns&lt;/th&gt;
&lt;th&gt;Cadence&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Semantic Table Router&lt;/td&gt;
&lt;td&gt;Top-K relevant tables per question class&lt;/td&gt;
&lt;td&gt;Per-query (&amp;lt;100ms)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Vector Query Cache&lt;/td&gt;
&lt;td&gt;Repeat question → reuse exact past answer&lt;/td&gt;
&lt;td&gt;Real-time&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;Adaptive Mart Refresh&lt;/td&gt;
&lt;td&gt;Tune MV refresh cadence based on query frequency&lt;/td&gt;
&lt;td&gt;Daily&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;Schema Change Detection&lt;/td&gt;
&lt;td&gt;New tables / columns → auto-draft glossary&lt;/td&gt;
&lt;td&gt;Nightly&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;Multi-Turn Chain Learning&lt;/td&gt;
&lt;td&gt;Pre-compute likely follow-up queries&lt;/td&gt;
&lt;td&gt;Real-time&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;Slow Query Detector&lt;/td&gt;
&lt;td&gt;Recommend skipping indexes for frequent patterns&lt;/td&gt;
&lt;td&gt;Nightly&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  The compound effect — 6-month trajectory
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Week 1&lt;/th&gt;
&lt;th&gt;Month 3&lt;/th&gt;
&lt;th&gt;Month 6&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Glossary size&lt;/td&gt;
&lt;td&gt;~50&lt;/td&gt;
&lt;td&gt;~200&lt;/td&gt;
&lt;td&gt;~400+&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;% questions answered without analyst escalation&lt;/td&gt;
&lt;td&gt;~50%&lt;/td&gt;
&lt;td&gt;~75%&lt;/td&gt;
&lt;td&gt;~90%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;HIGH-confidence accuracy&lt;/td&gt;
&lt;td&gt;~70%&lt;/td&gt;
&lt;td&gt;~85%&lt;/td&gt;
&lt;td&gt;~92%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Median latency&lt;/td&gt;
&lt;td&gt;~3s&lt;/td&gt;
&lt;td&gt;~1.5s&lt;/td&gt;
&lt;td&gt;~1s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mart count&lt;/td&gt;
&lt;td&gt;~15&lt;/td&gt;
&lt;td&gt;~25&lt;/td&gt;
&lt;td&gt;~35&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Recurring failure rate&lt;/td&gt;
&lt;td&gt;high&lt;/td&gt;
&lt;td&gt;dropping&lt;/td&gt;
&lt;td&gt;&amp;lt;2%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Eval-set regressions caught in CI per quarter&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;2–3&lt;/td&gt;
&lt;td&gt;4–6&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Dynamic vs Deterministic — Where Each Lives
&lt;/h2&gt;

&lt;p&gt;Different layers of the platform use different mixes of LLM intelligence and deterministic code. The rule: &lt;strong&gt;spend LLM where ambiguity exists; spend code where determinism is required for safety, speed, or auditability.&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Dynamic (LLM-driven)&lt;/th&gt;
&lt;th&gt;Deterministic (code)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Input understanding&lt;/td&gt;
&lt;td&gt;✅ Classify engine + extract params&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Schema discovery&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;✅ Live MCP &lt;code&gt;list_tables&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Glossary lookup&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;✅ Templated YAML lookup&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Schema disambiguation fallback&lt;/td&gt;
&lt;td&gt;✅ LLM proposes interpretation&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Analytics SQL generation&lt;/td&gt;
&lt;td&gt;✅ LLM writes grounded SQL&lt;/td&gt;
&lt;td&gt;✅ Pre-execution validator + read-only role&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NR lookup SQL&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;✅ Templated SELECT against &lt;code&gt;marts.numeric_research_runs&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NR cache-vs-refit decision&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;✅ Orchestrator code on freshness threshold&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Optimization JSON model&lt;/td&gt;
&lt;td&gt;✅ LLM emits structured model&lt;/td&gt;
&lt;td&gt;✅ Python template converts JSON → solver code&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Deep Research planning&lt;/td&gt;
&lt;td&gt;✅ LLM decomposes into sub-questions&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Deep Research workers&lt;/td&gt;
&lt;td&gt;✅ LLM picks tool + generates SQL per sub-question&lt;/td&gt;
&lt;td&gt;✅ Concurrency via asyncio.Semaphore&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Deep Research critic&lt;/td&gt;
&lt;td&gt;✅ LLM judges sufficiency&lt;/td&gt;
&lt;td&gt;✅ Max 5 rounds (budget enforced)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Citation enforcement&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;✅ Regex check + 3-retry fallback&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Synthesizer rendering&lt;/td&gt;
&lt;td&gt;✅ LLM writes prose&lt;/td&gt;
&lt;td&gt;✅ Confidence label rules&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Persistence&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;✅ Parameterized inserts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Budget tracking&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;✅ Per-call token/cost/wall-clock counters&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Self-improvement loops&lt;/td&gt;
&lt;td&gt;✅ LLM proposes glossary/MV/etc&lt;/td&gt;
&lt;td&gt;✅ Human approval workflow + A/B promotion&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Trust Mechanisms — Four-Layer Defense
&lt;/h2&gt;

&lt;p&gt;Every output passes through these defenses.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;What&lt;/th&gt;
&lt;th&gt;How enforced&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;1. Grounding&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;LLM can't reference what it can't see&lt;/td&gt;
&lt;td&gt;Live schema via MCP &lt;code&gt;list_tables&lt;/code&gt;; pre-joined marts eliminate JOINs; glossary terms deterministic; cross-encoder reranking&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;2. Verification&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Every claim is checkable&lt;/td&gt;
&lt;td&gt;Citation enforcement (&lt;code&gt;[ev-N]&lt;/code&gt; regex); critic agent (up to 5 rounds); hypothesis verdicts (CONFIRMED/FALSIFIED/UNTESTED); HIGH/MEDIUM/LOW confidence labels calibrated weekly&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;3. Containment&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;If wrong, blast radius is zero&lt;/td&gt;
&lt;td&gt;Database-layer read-only (&lt;code&gt;readonly = 1&lt;/code&gt;); mart-only access; optimization output is reviewable plan never an action; sandboxed code execution with no network egress; per-team RBAC at DB role layer&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;4. Auditability&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Every answer is replayable&lt;/td&gt;
&lt;td&gt;Langfuse traces every prompt / tool call / sub-query; source provenance per claim; persisted research / optimization / NR runs; 7-year audit retention for regulated jurisdictions&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  By the Numbers
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Catalogued use cases&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;~145&lt;/strong&gt; (75 team-based + ~70 product-domain)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Compound cross-schema queries&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;13&lt;/strong&gt; (single-SQL, 4–7 table joins)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Numeric research fitted models&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;20&lt;/strong&gt; (5 with full Python implementations)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Worked optimization examples&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;5&lt;/strong&gt; (3 with full code)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Worked deep-research investigations&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;3&lt;/strong&gt; (full traces)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MCP frameworks supported&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;12&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Solvers supported&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;9&lt;/strong&gt; (HiGHS, CBC, SCIP, OR-Tools CP-SAT, CVXPY/CLARABEL/ECOS/SCS, Pyomo, optional Gurobi/CPLEX)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Teams served&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;7&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Self-improvement loops&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;4 primary + 6 supporting&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Deployment shapes&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;2&lt;/strong&gt; (crypto exchange + hedge fund)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Build timeline&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;24 weeks&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Team size to build&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;5–7 engineers&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  The Tech Stack
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Data + vector + observability storage&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;ClickHouse 25.8+&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Native HNSW vector indexes (GA); same DB stores trades + embeddings + Langfuse traces&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Chat UI&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;LibreChat&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Acquired by ClickHouse, November 2025; SSO; multi-LLM; 128K context&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LLM&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Qwen 2.5 72B via vLLM&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Self-hosted; ~$0.01/query amortized; multi-LoRA serving&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Triage&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Qwen 2.5 72B&lt;/strong&gt; (or 32B in cost-sensitive mode)&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tool layer&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Anthropic MCP&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;12 framework compatibility&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Observability&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Langfuse&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Acquired by ClickHouse, January 2026; traces in ClickHouse&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Reranker&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;BGE-reranker-v2-m3&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Cross-encoder, ~600M params, FP16&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Solvers&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;OR-Tools / CVXPY / Pyomo / HiGHS / CBC / SCIP&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Open-source; optional Gurobi/CPLEX&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Code sandbox&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Docker + nsjail + cgroups (v1)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Upgrade path to gVisor / Firecracker&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fine-tuning&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;PEFT + bitsandbytes (QLoRA)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;NF4 4-bit, 24GB GPU sufficient&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pipeline&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Debezium + Kafka&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;CDC from PostgreSQL sources&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  The Architectural Thesis (One Paragraph)
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;The platform combines four execution engines (Analytics, Numeric Research lookup, Optimization, Deep Research) on shared infrastructure (ClickHouse + Qwen + MCP + LibreChat + Langfuse) with citation-enforced trust mechanisms (regex-validated &lt;code&gt;[ev-N]&lt;/code&gt; per claim, critic agent gating, HIGH/MEDIUM/LOW confidence calibration) and four continuous self-improvement loops (glossary expansion, mart recommendation, LoRA fine-tuning, eval-set regression). Triage is &lt;strong&gt;semantic LLM classification&lt;/strong&gt;, not keyword matching. Engine choice is &lt;strong&gt;intent-driven&lt;/strong&gt;, not verb-matched. The result is a platform that handles open-ended business questions in 1–90 seconds, never takes autonomous action, replays every answer end-to-end via Langfuse, and &lt;strong&gt;gets better weekly without engineering intervention&lt;/strong&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That's the platform.&lt;/p&gt;

</description>
      <category>clickhouse</category>
      <category>ai</category>
      <category>llm</category>
      <category>architecture</category>
    </item>
    <item>
      <title>Building a Full-Stack Agentic AI Data Platform on ClickHouse: A Complete Architecture Guide</title>
      <dc:creator>RAKESH THERANI</dc:creator>
      <pubDate>Thu, 07 May 2026 01:49:43 +0000</pubDate>
      <link>https://forem.com/rakeshtherani/building-a-full-stack-agentic-ai-data-platform-on-clickhouse-a-complete-architecture-guide-4cf</link>
      <guid>https://forem.com/rakeshtherani/building-a-full-stack-agentic-ai-data-platform-on-clickhouse-a-complete-architecture-guide-4cf</guid>
      <description>&lt;p&gt;&lt;em&gt;A production-grade, end-to-end agentic AI platform — chat UI, self-hosted LLM, MCP server, LLM observability, medallion data architecture, security guardrails, HA, and cost analysis. Same stack ClickHouse uses internally (DWAINE: 250+ employees, ~70% of internal analytics use cases covered, 50-70% workload reduction on the data-warehouse team). Adapted for a crypto-derivatives exchange.&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Why I'm writing this
&lt;/h2&gt;

&lt;p&gt;The phrase "AI platform" gets thrown around to describe everything from a single chatbot to a set of disconnected APIs. &lt;strong&gt;An actual production AI platform requires far more than a model and an interface&lt;/strong&gt; — it needs data plumbing, semantic grounding, query safety, observability, role-based access, high availability, and a real cost model.&lt;/p&gt;

&lt;p&gt;This is the architecture I shipped for a crypto exchange — covering all of those layers. Every component is open-source and battle-tested. Nothing is theoretical:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Data layer&lt;/strong&gt;: PostgreSQL → CDC (Debezium + Kafka) → ClickHouse with medallion architecture (Raw → Staging → Marts)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LLM layer&lt;/strong&gt;: Self-hosted Qwen 2.5 72B (Apache 2.0, all data stays on-prem) + business glossary for domain grounding&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tool layer&lt;/strong&gt;: ClickHouse's official MCP server with bearer-token auth, SSRF protection, schema discovery + safe SQL execution&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;UX layer&lt;/strong&gt;: LibreChat (open-source, SSO, role-based) — same chat UI ClickHouse acquired and uses for DWAINE&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Observability layer&lt;/strong&gt;: Langfuse (open-source, runs on ClickHouse) — every query, response, latency, cost tracked&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Operations layer&lt;/strong&gt;: HA cluster, query timeouts, memory caps, RBAC, full audit trail&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;JOIN strategy&lt;/strong&gt;: 3-tier approach (pre-joined marts + dictionaries + UNION ALL + runtime fallback) so the LLM never writes a slow query&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is &lt;strong&gt;the same Agentic Data Stack ClickHouse open-sourced and uses internally&lt;/strong&gt;. I'm documenting how to deploy it on top of any production database (this guide uses a crypto exchange as the worked example, but the architecture transfers to retail, fintech, marketing, ops).&lt;/p&gt;

&lt;p&gt;If you're building or evaluating an AI platform for your own data — and you want something more substantial than "GPT-4 with a database connection" — this is the deep-dive.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;: Costs and table examples reference a crypto-derivatives platform (large trade tables, real-time wallet flows). The architectural patterns and effort estimates apply universally. Numbers are real; mileage will vary.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Table of Contents
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Executive Summary&lt;/li&gt;
&lt;li&gt;What We Are Building&lt;/li&gt;
&lt;li&gt;Why ClickHouse&lt;/li&gt;
&lt;li&gt;Platform Architecture&lt;/li&gt;
&lt;li&gt;Data Consolidation — All DBs Into ClickHouse&lt;/li&gt;
&lt;li&gt;How the AI Agent Works&lt;/li&gt;
&lt;li&gt;What Questions Can Be Asked — By Team&lt;/li&gt;
&lt;li&gt;How Multi-Table Queries Work&lt;/li&gt;
&lt;li&gt;ClickHouse MCP Server&lt;/li&gt;
&lt;li&gt;LLM Selection&lt;/li&gt;
&lt;li&gt;How the LLM Learns Our Tables (No Training Required)&lt;/li&gt;
&lt;li&gt;
Chat Interface — LibreChat

&lt;ul&gt;
&lt;li&gt;Specialist Agents Per Team (Beyond One General Agent)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Observability — Langfuse&lt;/li&gt;
&lt;li&gt;Medallion Architecture — Data Organization&lt;/li&gt;
&lt;li&gt;Security and Governance&lt;/li&gt;
&lt;li&gt;High Availability Architecture&lt;/li&gt;
&lt;li&gt;Implementation Roadmap&lt;/li&gt;
&lt;li&gt;Cost Analysis&lt;/li&gt;
&lt;li&gt;Risk Assessment&lt;/li&gt;
&lt;li&gt;Success Metrics&lt;/li&gt;
&lt;li&gt;Reference Links&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  1. Executive Summary
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Objective
&lt;/h3&gt;

&lt;p&gt;Build a unified data platform on ClickHouse that consolidates tables from all the crypto exchange databases (trading, wallets, users, risk, compliance) into one place, then layer an AI agent on top so any team member can ask questions in plain English and get instant answers.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Problem Today
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Data lives in multiple PostgreSQL databases across different services&lt;/li&gt;
&lt;li&gt;Getting answers requires writing SQL, knowing which DB to query, or filing a ticket to engineering&lt;/li&gt;
&lt;li&gt;Analytical queries on production PostgreSQL compete with live trading workloads&lt;/li&gt;
&lt;li&gt;No single place to query across trading, wallet, user, and compliance data together&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The Solution
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Any team member types: "Show me top 10 users by futures volume this week who also had withdrawals &amp;gt; $50K"

The AI agent:
1. Understands the question (via business glossary)
2. Generates SQL across trade_future + wallet_transactions tables (both in ClickHouse)
3. Executes via MCP server (read-only, safe)
4. Returns formatted answer with table/chart
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This works because all data from different databases is consolidated into ClickHouse tables — the LLM queries structured tables directly, not embeddings or vectors.&lt;/p&gt;

&lt;h3&gt;
  
  
  What ClickHouse Already Provides
&lt;/h3&gt;

&lt;p&gt;ClickHouse has built a complete open-source &lt;strong&gt;Agentic Data Stack&lt;/strong&gt; with a ready-to-deploy GitHub repo:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/ClickHouse/agentic-data-stack" rel="noopener noreferrer"&gt;https://github.com/ClickHouse/agentic-data-stack&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Live demo:&lt;/strong&gt; &lt;a href="https://llm.clickhouse.com" rel="noopener noreferrer"&gt;https://llm.clickhouse.com&lt;/a&gt; (AgentHouse — 37 public datasets)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Internal proof:&lt;/strong&gt; DWAINE — 250+ ClickHouse employees use it, &amp;gt;200 daily messages across 50-70 daily conversations, ~70% of internal analytics use cases covered, 50-70% workload reduction on the 3-person DWH team (&lt;a href="https://clickhouse.com/blog/ai-first-data-warehouse" rel="noopener noreferrer"&gt;source&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We are not building from scratch. We are deploying ClickHouse's proven stack with our data.&lt;/p&gt;

&lt;h3&gt;
  
  
  Expected Outcomes
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Target&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Analytical query speed&lt;/td&gt;
&lt;td&gt;10-100x faster than PostgreSQL for aggregations&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Production DB load reduction&lt;/td&gt;
&lt;td&gt;60-80% fewer analytical queries hitting production&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Self-service data access&lt;/td&gt;
&lt;td&gt;Any team member can query data without SQL knowledge&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cross-database queries&lt;/td&gt;
&lt;td&gt;Single query across trading, wallets, users, risk data&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Time to insight&lt;/td&gt;
&lt;td&gt;Seconds instead of hours&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  2. What We Are Building
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Before (Today)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Trading Ops needs a report:
  → Asks engineering → Engineer writes SQL → Queries prod DB → Formats result → Sends back
  → Time: hours to days
  → Impact: Analytical query slows down production

Compliance needs user transaction history:
  → Files ticket → Engineer joins data from 3 DBs → Manual export → Sends CSV
  → Time: 1-3 days
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  After (With This Platform)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Trading Ops opens LibreChat:
  → Types: "Top 10 markets by volume this week with liquidation count"
  → AI agent generates SQL, queries ClickHouse, returns answer in 5 seconds
  → No engineering involvement, no production DB impact

Compliance opens LibreChat:
  → Types: "All transactions for user X in last 90 days including spot, futures, and withdrawals"
  → AI agent queries across all consolidated tables, returns complete history
  → Instant, audited, repeatable
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Who Uses It
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Team&lt;/th&gt;
&lt;th&gt;What They Ask&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Trading Ops&lt;/td&gt;
&lt;td&gt;Volume, market metrics, liquidations, position summaries&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Risk&lt;/td&gt;
&lt;td&gt;Exposure, liquidation cascades, concentration risk&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Compliance&lt;/td&gt;
&lt;td&gt;User transaction history, suspicious patterns, audit trails&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Finance&lt;/td&gt;
&lt;td&gt;Fee revenue, P&amp;amp;L, trading volume trends&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Product&lt;/td&gt;
&lt;td&gt;User activity, feature usage, market popularity&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Engineering&lt;/td&gt;
&lt;td&gt;System metrics, query performance, data quality checks&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  3. Why ClickHouse
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Production Proof Points in Crypto
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Coinhall&lt;/strong&gt; — Real-time DEX aggregator across 23 blockchains:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Processes billions of trade events&lt;/li&gt;
&lt;li&gt;20ms candlestick chart queries on massive datasets&lt;/li&gt;
&lt;li&gt;Replaced PostgreSQL for analytics&lt;/li&gt;
&lt;li&gt;Similar data model to the crypto exchange (trades, orders, market data)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;CryptoHouse&lt;/strong&gt; — Public ClickHouse-powered crypto analytics:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Bitcoin/Ethereum blockchain data at scale&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;DWAINE&lt;/strong&gt; — ClickHouse's own internal AI agent (production):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;250+ internal users querying data via natural language&lt;/li&gt;
&lt;li&gt;&amp;gt;200 daily messages across 50-70 daily conversations&lt;/li&gt;
&lt;li&gt;~70% of internal analytics use cases covered by the agent&lt;/li&gt;
&lt;li&gt;~50-70% workload reduction on the 3-person DWH team&lt;/li&gt;
&lt;li&gt;Source: &lt;a href="https://clickhouse.com/blog/ai-first-data-warehouse" rel="noopener noreferrer"&gt;ClickHouse — How we made our data warehouse AI-first&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;This is exactly what we are building for the crypto exchange&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  ClickHouse vs PostgreSQL for Analytics
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Capability&lt;/th&gt;
&lt;th&gt;PostgreSQL&lt;/th&gt;
&lt;th&gt;ClickHouse&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Storage model&lt;/td&gt;
&lt;td&gt;Row-based (OLTP)&lt;/td&gt;
&lt;td&gt;Columnar (OLAP)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Compression&lt;/td&gt;
&lt;td&gt;3-5x&lt;/td&gt;
&lt;td&gt;10-30x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Aggregation speed&lt;/td&gt;
&lt;td&gt;Seconds-minutes&lt;/td&gt;
&lt;td&gt;Milliseconds-seconds&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Concurrent analytics&lt;/td&gt;
&lt;td&gt;Impacts live trading&lt;/td&gt;
&lt;td&gt;Isolated, zero production impact&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Partitioning&lt;/td&gt;
&lt;td&gt;Manual (daily)&lt;/td&gt;
&lt;td&gt;Automatic (MergeTree)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Materialized views&lt;/td&gt;
&lt;td&gt;Periodic refresh&lt;/td&gt;
&lt;td&gt;Incremental (real-time triggers)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cross-DB queries&lt;/td&gt;
&lt;td&gt;Not possible&lt;/td&gt;
&lt;td&gt;All data in one place&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Key Advantage — All Data In One Place
&lt;/h3&gt;

&lt;p&gt;Today, querying across trading DB + wallet DB + user DB requires application-level joins or manual exports. In ClickHouse, all tables from all databases are consolidated — the AI agent (or any analyst) can query across everything in a single SQL statement.&lt;/p&gt;

&lt;p&gt;Our largest trading table runs to hundreds of GB in PostgreSQL. With ClickHouse's columnar compression (10-30x), this drops by an order of magnitude, and column-specific queries (e.g., &lt;code&gt;SUM(quantity) WHERE market_name = 'BTCPFC'&lt;/code&gt;) only read the relevant columns.&lt;/p&gt;




&lt;h2&gt;
  
  
  4. Platform Architecture
&lt;/h2&gt;

&lt;p&gt;This is ClickHouse's &lt;strong&gt;Agentic Data Stack&lt;/strong&gt; — three layers, all open-source:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;                         +------------------+
                         |   ALL TEAMS      |
                         |  (Trading Ops,   |
                         |   Risk, Finance, |
                         |   Compliance,    |
                         |   Product)       |
                         +--------+---------+
                                  |
                  ================|================
                  |     LAYER 1: CHAT (LibreChat)  |
                  |     - Web-based chat UI         |
                  |     - SSO login                 |
                  |     - Role-based access          |
                  |     - Conversation history       |
                  =================|================
                                  |
                  ================|=================
                  |     LAYER 2: LLM + MCP          |
                  |                                  |
                  |  User question (plain English)   |
                  |         |                        |
                  |         v                        |
                  |  Qwen 2.5 LLM (self-hosted)     |
                  |  + Business Glossary             |
                  |         |                        |
                  |         v                        |
                  |  ClickHouse MCP Server           |
                  |  (schema discovery + SQL exec)   |
                  |         |                        |
                  ===========|======================
                             |
                  ===========|======================
                  |  LAYER 3: DATA (ClickHouse)     |
                  |                                  |
                  |  +---------------------------+   |
                  |  | Trading DB tables:        |   |
                  |  |  trade_future             |   |
                  |  |  trade_spot               |   |
                  |  |  orders                   |   |
                  |  +---------------------------+   |
                  |  | Wallet DB tables:         |   |
                  |  |  wallet_transactions      |   |
                  |  |  deposits / withdrawals   |   |
                  |  +---------------------------+   |
                  |  | User DB tables:           |   |
                  |  |  users, kyc_status        |   |
                  |  +---------------------------+   |
                  |  | Risk DB tables:           |   |
                  |  |  positions, margins       |   |
                  |  +---------------------------+   |
                  |  | Pre-aggregated views:     |   |
                  |  |  volume_daily             |   |
                  |  |  pnl_daily                |   |
                  |  |  liquidation_metrics      |   |
                  |  |  candlestick_1m/1h/1d     |   |
                  |  +---------------------------+   |
                  |                                  |
                  ==================================+
                             ^
                             |
                  ===========|======================
                  |  CDC PIPELINES (Debezium+Kafka) |
                  |  Real-time sync from all DBs    |
                  ==================================+
                        ^    ^    ^    ^
                        |    |    |    |
                  +-----+  +-+  ++  +-+-----+
                  |Trading| |Wallet| |User| |Risk|
                  |  DB   | |  DB  | | DB | | DB |
                  |  (PG) | | (PG) | |(PG)| |(PG)|
                  +-------+ +------+ +----+ +----+
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Observability (Langfuse)
&lt;/h3&gt;

&lt;p&gt;Langfuse runs alongside and tracks every LLM interaction:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What question was asked, by whom&lt;/li&gt;
&lt;li&gt;What SQL was generated&lt;/li&gt;
&lt;li&gt;Whether the answer was correct&lt;/li&gt;
&lt;li&gt;Token usage and cost per query&lt;/li&gt;
&lt;li&gt;Langfuse stores its data in ClickHouse itself — no extra DB needed&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  5. Data Consolidation — All DBs Into ClickHouse
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Core Idea
&lt;/h3&gt;

&lt;p&gt;Every PostgreSQL database at the crypto exchange feeds its tables into ClickHouse via CDC (Change Data Capture). Once in ClickHouse, the AI agent can query across all of them in a single query.&lt;/p&gt;

&lt;h3&gt;
  
  
  CDC Pipeline
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;PostgreSQL&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;WAL&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="n"&gt;Debezium&lt;/span&gt; &lt;span class="n"&gt;Connector&lt;/span&gt; &lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="n"&gt;Kafka&lt;/span&gt; &lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="n"&gt;ClickHouse&lt;/span&gt; &lt;span class="n"&gt;Kafka&lt;/span&gt; &lt;span class="n"&gt;Engine&lt;/span&gt; &lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="n"&gt;MergeTree&lt;/span&gt; &lt;span class="n"&gt;Tables&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Tables to Consolidate
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Source DB&lt;/th&gt;
&lt;th&gt;Tables&lt;/th&gt;
&lt;th&gt;Est. Size&lt;/th&gt;
&lt;th&gt;ClickHouse Engine&lt;/th&gt;
&lt;th&gt;Sync&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Trading DB&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;trade_future&lt;/td&gt;
&lt;td&gt;hundreds of GB&lt;/td&gt;
&lt;td&gt;ReplacingMergeTree&lt;/td&gt;
&lt;td&gt;CDC real-time&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;trade_spot&lt;/td&gt;
&lt;td&gt;TBD&lt;/td&gt;
&lt;td&gt;ReplacingMergeTree&lt;/td&gt;
&lt;td&gt;CDC real-time&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;orders&lt;/td&gt;
&lt;td&gt;TBD&lt;/td&gt;
&lt;td&gt;ReplacingMergeTree&lt;/td&gt;
&lt;td&gt;CDC real-time&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Wallet DB&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;wallet_transactions&lt;/td&gt;
&lt;td&gt;TBD&lt;/td&gt;
&lt;td&gt;ReplacingMergeTree&lt;/td&gt;
&lt;td&gt;CDC real-time&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;deposits&lt;/td&gt;
&lt;td&gt;TBD&lt;/td&gt;
&lt;td&gt;MergeTree&lt;/td&gt;
&lt;td&gt;CDC real-time&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;withdrawals&lt;/td&gt;
&lt;td&gt;TBD&lt;/td&gt;
&lt;td&gt;MergeTree&lt;/td&gt;
&lt;td&gt;CDC real-time&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;User DB&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;users&lt;/td&gt;
&lt;td&gt;Small&lt;/td&gt;
&lt;td&gt;ReplacingMergeTree&lt;/td&gt;
&lt;td&gt;CDC&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;kyc_status&lt;/td&gt;
&lt;td&gt;Small&lt;/td&gt;
&lt;td&gt;ReplacingMergeTree&lt;/td&gt;
&lt;td&gt;CDC&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Risk DB&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;positions&lt;/td&gt;
&lt;td&gt;TBD&lt;/td&gt;
&lt;td&gt;ReplacingMergeTree&lt;/td&gt;
&lt;td&gt;CDC real-time&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;margins&lt;/td&gt;
&lt;td&gt;TBD&lt;/td&gt;
&lt;td&gt;ReplacingMergeTree&lt;/td&gt;
&lt;td&gt;CDC real-time&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Config&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;market_config&lt;/td&gt;
&lt;td&gt;Small&lt;/td&gt;
&lt;td&gt;ReplacingMergeTree&lt;/td&gt;
&lt;td&gt;Daily batch&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;fee_tiers&lt;/td&gt;
&lt;td&gt;Small&lt;/td&gt;
&lt;td&gt;ReplacingMergeTree&lt;/td&gt;
&lt;td&gt;Daily batch&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Why ReplacingMergeTree
&lt;/h3&gt;

&lt;p&gt;PostgreSQL rows get UPDATEd (e.g., order status changes, position updates). ClickHouse's &lt;code&gt;ReplacingMergeTree&lt;/code&gt; deduplicates on primary key during background merges, handling CDC update events correctly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pre-Aggregated Materialized Views
&lt;/h3&gt;

&lt;p&gt;ClickHouse materialized views are &lt;strong&gt;incremental triggers&lt;/strong&gt; — they process each new row as it arrives (unlike PostgreSQL which requires periodic &lt;code&gt;REFRESH&lt;/code&gt;):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Materialized View&lt;/th&gt;
&lt;th&gt;Source&lt;/th&gt;
&lt;th&gt;What It Computes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;volume_daily&lt;/td&gt;
&lt;td&gt;trade_future, trade_spot&lt;/td&gt;
&lt;td&gt;Volume per market per day&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;pnl_daily&lt;/td&gt;
&lt;td&gt;trades + positions&lt;/td&gt;
&lt;td&gt;P&amp;amp;L per user per market per day&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;liquidation_metrics&lt;/td&gt;
&lt;td&gt;trade_future (order_type=1003)&lt;/td&gt;
&lt;td&gt;Liquidation count, volume, frequency&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;candlestick_1m&lt;/td&gt;
&lt;td&gt;trade_future, trade_spot&lt;/td&gt;
&lt;td&gt;OHLCV per market per minute&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;candlestick_1h&lt;/td&gt;
&lt;td&gt;chart_db.candlesticks (CDC)&lt;/td&gt;
&lt;td&gt;OHLCV per market per hour&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;candlestick_1d&lt;/td&gt;
&lt;td&gt;chart_db.candlesticks (CDC)&lt;/td&gt;
&lt;td&gt;OHLCV per market per day&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;fee_revenue&lt;/td&gt;
&lt;td&gt;trades&lt;/td&gt;
&lt;td&gt;Fee revenue per market per tier per day&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;withdrawal_summary&lt;/td&gt;
&lt;td&gt;withdrawals&lt;/td&gt;
&lt;td&gt;Daily withdrawal volume per user&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Cross-Database Query Example
&lt;/h3&gt;

&lt;p&gt;Something impossible today (requires joining across separate PostgreSQL databases):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- "Show users with &amp;gt;$100K futures volume today who also withdrew &amp;gt;$10K"&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;username&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;quantity&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;price&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;futures_notional&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;total_withdrawn&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;trade_future&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;username&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;total_withdrawn&lt;/span&gt;
    &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;withdrawals&lt;/span&gt;
    &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;today&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;username&lt;/span&gt;
    &lt;span class="k"&gt;HAVING&lt;/span&gt; &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;10000&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;w&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;username&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;username&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;transaction_time&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;today&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;username&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;total_withdrawn&lt;/span&gt;
&lt;span class="k"&gt;HAVING&lt;/span&gt; &lt;span class="n"&gt;futures_notional&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;100000&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;futures_notional&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The AI agent generates this SQL automatically from plain English.&lt;/p&gt;




&lt;h2&gt;
  
  
  6. How the AI Agent Works
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Flow
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt; &lt;span class="k"&gt;User&lt;/span&gt; &lt;span class="n"&gt;types&lt;/span&gt; &lt;span class="n"&gt;question&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="n"&gt;LibreChat&lt;/span&gt;
     &lt;span class="nv"&gt;"Which users had the most liquidations today?"&lt;/span&gt;

&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt; &lt;span class="n"&gt;LLM&lt;/span&gt; &lt;span class="n"&gt;receives&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
   &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;The&lt;/span&gt; &lt;span class="n"&gt;question&lt;/span&gt;
   &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;Business&lt;/span&gt; &lt;span class="n"&gt;glossary&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;liquidation&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;order_type&lt;/span&gt; &lt;span class="mi"&gt;1003&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;etc&lt;/span&gt;&lt;span class="p"&gt;.)&lt;/span&gt;
   &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;ClickHouse&lt;/span&gt; &lt;span class="k"&gt;schema&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;via&lt;/span&gt; &lt;span class="n"&gt;MCP&lt;/span&gt; &lt;span class="n"&gt;list_tables&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt; &lt;span class="n"&gt;LLM&lt;/span&gt; &lt;span class="n"&gt;generates&lt;/span&gt; &lt;span class="k"&gt;SQL&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
     &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;username&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;liq_count&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;quantity&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;price&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;liq_volume&lt;/span&gt;
     &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;trade_future&lt;/span&gt;
     &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;order_type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1003&lt;/span&gt;
       &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;transaction_time&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;today&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
       &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;username&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;LIKE&lt;/span&gt; &lt;span class="s1"&gt;'bot_%'&lt;/span&gt;
       &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;username&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;LIKE&lt;/span&gt; &lt;span class="s1"&gt;'market_maker_%'&lt;/span&gt;
       &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;username&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;LIKE&lt;/span&gt; &lt;span class="s1"&gt;'liquidity_%'&lt;/span&gt;
     &lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;username&lt;/span&gt;
     &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;liq_count&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;
     &lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;

&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt; &lt;span class="n"&gt;MCP&lt;/span&gt; &lt;span class="n"&gt;Server&lt;/span&gt; &lt;span class="n"&gt;executes&lt;/span&gt; &lt;span class="k"&gt;SQL&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;read&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="k"&gt;only&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;against&lt;/span&gt; &lt;span class="n"&gt;ClickHouse&lt;/span&gt;

&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt; &lt;span class="n"&gt;LLM&lt;/span&gt; &lt;span class="n"&gt;formats&lt;/span&gt; &lt;span class="k"&gt;result&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
     &lt;span class="nv"&gt;"Today's top users by liquidation count:
      1. user_abc — 12 liquidations, $234K total
      2. user_xyz — 8 liquidations, $156K total
      ..."&lt;/span&gt;

&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt; &lt;span class="n"&gt;Langfuse&lt;/span&gt; &lt;span class="n"&gt;logs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;SQL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;latency&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tokens&lt;/span&gt; &lt;span class="n"&gt;used&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Key Point — It's All Structured Table Queries
&lt;/h3&gt;

&lt;p&gt;The LLM is not doing anything magical with embeddings or vectors. It is:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Reading the ClickHouse table schema (column names, types)&lt;/li&gt;
&lt;li&gt;Using the business glossary to understand domain terms&lt;/li&gt;
&lt;li&gt;Writing standard SQL&lt;/li&gt;
&lt;li&gt;Executing via MCP server&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;All our data is in regular ClickHouse tables. The "intelligence" comes from the LLM knowing how to translate English into SQL for our specific schema.&lt;/p&gt;




&lt;h2&gt;
  
  
  7. What Questions Can Be Asked — By Team
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Trading Operations
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Question&lt;/th&gt;
&lt;th&gt;Tables Used&lt;/th&gt;
&lt;th&gt;Complexity&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;"What is BTC perpetual futures volume today?"&lt;/td&gt;
&lt;td&gt;trade_future&lt;/td&gt;
&lt;td&gt;Single table&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"Show top 10 markets by volume this week"&lt;/td&gt;
&lt;td&gt;trade_future, trade_spot&lt;/td&gt;
&lt;td&gt;Single table each, union&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"Compare BTC futures volume this week vs last week"&lt;/td&gt;
&lt;td&gt;trade_future&lt;/td&gt;
&lt;td&gt;Single table, time comparison&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"Show 1-hour candlestick chart for ETHPFC last 24 hours"&lt;/td&gt;
&lt;td&gt;candlestick_1h (view)&lt;/td&gt;
&lt;td&gt;Pre-aggregated view&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"Which markets had the highest spread in the last hour?"&lt;/td&gt;
&lt;td&gt;orders&lt;/td&gt;
&lt;td&gt;Single table&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"Show order book depth for BTCPFC right now"&lt;/td&gt;
&lt;td&gt;orders&lt;/td&gt;
&lt;td&gt;Single table, latest snapshot&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"Average fill time for limit orders by market today"&lt;/td&gt;
&lt;td&gt;orders&lt;/td&gt;
&lt;td&gt;Single table&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"How does today's volume compare to 30-day average?"&lt;/td&gt;
&lt;td&gt;volume_daily (view)&lt;/td&gt;
&lt;td&gt;Pre-aggregated view&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Risk Management
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Question&lt;/th&gt;
&lt;th&gt;Tables Used&lt;/th&gt;
&lt;th&gt;Complexity&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;"Which users have the largest open positions?"&lt;/td&gt;
&lt;td&gt;positions&lt;/td&gt;
&lt;td&gt;Single table&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"Show all liquidation events in the last hour"&lt;/td&gt;
&lt;td&gt;trade_future&lt;/td&gt;
&lt;td&gt;Single table (order_type=1003)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"What percentage of trades were liquidations by market today?"&lt;/td&gt;
&lt;td&gt;trade_future&lt;/td&gt;
&lt;td&gt;Single table, grouping&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"List users with concentrated exposure (&amp;gt;80% in one market)"&lt;/td&gt;
&lt;td&gt;positions&lt;/td&gt;
&lt;td&gt;Single table, aggregation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"Show liquidation cascade events — markets where &amp;gt;5 liquidations happened within 1 minute"&lt;/td&gt;
&lt;td&gt;trade_future&lt;/td&gt;
&lt;td&gt;Single table, window function&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"Which users traded within 1 minute of a liquidation event?"&lt;/td&gt;
&lt;td&gt;trade_future (self-join)&lt;/td&gt;
&lt;td&gt;Single table, self-join&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"Show users with margin utilization above 90%"&lt;/td&gt;
&lt;td&gt;positions, margins&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Multi-table join&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"Correlation between liquidation volume and price movement for BTCPFC this week"&lt;/td&gt;
&lt;td&gt;trade_future, candlestick_1h&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Multi-table join&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Compliance &amp;amp; Audit
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Question&lt;/th&gt;
&lt;th&gt;Tables Used&lt;/th&gt;
&lt;th&gt;Complexity&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;"Show all transactions for user X in the last 90 days"&lt;/td&gt;
&lt;td&gt;trade_future, trade_spot, withdrawals, deposits&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Multi-table union&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"List accounts with withdrawal volume &amp;gt; $100K today"&lt;/td&gt;
&lt;td&gt;withdrawals&lt;/td&gt;
&lt;td&gt;Single table&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"Find users who deposited and withdrew within 24 hours"&lt;/td&gt;
&lt;td&gt;deposits, withdrawals&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Multi-table join&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"Show all KYC-pending users who traded this week"&lt;/td&gt;
&lt;td&gt;users, kyc_status, trade_future&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Multi-table join&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"List accounts that received deposits from the same source address"&lt;/td&gt;
&lt;td&gt;deposits&lt;/td&gt;
&lt;td&gt;Single table, grouping&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"Show transaction patterns for user X — frequency, volume, timing"&lt;/td&gt;
&lt;td&gt;trade_future, trade_spot, withdrawals&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Multi-table union + aggregation&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"Which accounts had unusual withdrawal patterns this week?"&lt;/td&gt;
&lt;td&gt;withdrawals&lt;/td&gt;
&lt;td&gt;Single table, anomaly detection&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"Full audit trail: every action by user X since account creation"&lt;/td&gt;
&lt;td&gt;trade_future, trade_spot, orders, withdrawals, deposits&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Multi-table union (5 tables)&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Finance &amp;amp; Revenue
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Question&lt;/th&gt;
&lt;th&gt;Tables Used&lt;/th&gt;
&lt;th&gt;Complexity&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;"What was total fee revenue yesterday?"&lt;/td&gt;
&lt;td&gt;fee_revenue (view)&lt;/td&gt;
&lt;td&gt;Pre-aggregated view&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"Monthly fee revenue breakdown by market"&lt;/td&gt;
&lt;td&gt;fee_revenue (view)&lt;/td&gt;
&lt;td&gt;Pre-aggregated view&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"Compare spot vs futures fee revenue this month"&lt;/td&gt;
&lt;td&gt;fee_revenue (view)&lt;/td&gt;
&lt;td&gt;Pre-aggregated view, grouping&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"Daily P&amp;amp;L trend for the last 30 days"&lt;/td&gt;
&lt;td&gt;pnl_daily (view)&lt;/td&gt;
&lt;td&gt;Pre-aggregated view&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"Which fee tier generates the most revenue?"&lt;/td&gt;
&lt;td&gt;trade_future, trade_spot, fee_tiers&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Multi-table join&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"Revenue impact if we reduce maker fees by 10%"&lt;/td&gt;
&lt;td&gt;trade_future, trade_spot, fee_tiers&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Multi-table join + simulation&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"Show monthly trading volume trend with MoM growth %"&lt;/td&gt;
&lt;td&gt;volume_daily (view)&lt;/td&gt;
&lt;td&gt;Pre-aggregated view, window function&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Product &amp;amp; Growth
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Question&lt;/th&gt;
&lt;th&gt;Tables Used&lt;/th&gt;
&lt;th&gt;Complexity&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;"How many unique users traded futures this week?"&lt;/td&gt;
&lt;td&gt;trade_future&lt;/td&gt;
&lt;td&gt;Single table, COUNT DISTINCT&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"What is our DAU trend for the last 30 days?"&lt;/td&gt;
&lt;td&gt;trade_future, trade_spot&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Multi-table union&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"Which markets have growing vs declining user counts?"&lt;/td&gt;
&lt;td&gt;trade_future, trade_spot&lt;/td&gt;
&lt;td&gt;Single table, time comparison&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"New user retention — users who signed up last month, how many traded this month?"&lt;/td&gt;
&lt;td&gt;users, trade_future&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Multi-table join&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"Top 10 users by total trading volume all-time"&lt;/td&gt;
&lt;td&gt;trade_future, trade_spot&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Multi-table union&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"Average trades per user per day — trend over last 90 days"&lt;/td&gt;
&lt;td&gt;trade_future, trade_spot&lt;/td&gt;
&lt;td&gt;Single table, aggregation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"Geographic distribution of active traders this month"&lt;/td&gt;
&lt;td&gt;users, trade_future&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Multi-table join&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Question Complexity Summary
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;% of Questions&lt;/th&gt;
&lt;th&gt;How LLM Handles&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Single table&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~40%&lt;/td&gt;
&lt;td&gt;Direct SELECT with WHERE/GROUP BY&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Pre-aggregated view&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~20%&lt;/td&gt;
&lt;td&gt;Query materialized view (fastest)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Multi-table JOIN&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~25%&lt;/td&gt;
&lt;td&gt;LLM generates JOIN across tables&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Multi-table UNION&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~10%&lt;/td&gt;
&lt;td&gt;LLM unions results from multiple tables&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Complex (window/self-join)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~5%&lt;/td&gt;
&lt;td&gt;LLM uses advanced SQL patterns&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  8. How Multi-Table Queries Work (JOIN Problem &amp;amp; Solution)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Challenge
&lt;/h3&gt;

&lt;p&gt;Many real questions require data from multiple sources:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Show me users who had futures liquidations today AND also made withdrawals"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This needs: &lt;code&gt;trade_future&lt;/code&gt; (Trading DB) + &lt;code&gt;withdrawals&lt;/code&gt; (Wallet DB) + &lt;code&gt;users&lt;/code&gt; (User DB)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Problem:&lt;/strong&gt; ClickHouse is &lt;strong&gt;not optimized for runtime JOINs&lt;/strong&gt;. Unlike PostgreSQL, JOINs in ClickHouse are memory-heavy, single-threaded (hash build phase), and can be slow on large tables. This is a well-known ClickHouse limitation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solution:&lt;/strong&gt; ClickHouse uses &lt;strong&gt;3 strategies&lt;/strong&gt; to avoid runtime JOINs. This is exactly how DWAINE (ClickHouse's own AI agent) handles it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Strategy 1 — Pre-Joined Denormalized Marts (Primary Approach)
&lt;/h3&gt;

&lt;p&gt;JOINs are computed &lt;strong&gt;at data load time&lt;/strong&gt;, not at query time. Refreshable Materialized Views periodically run the JOINs in the background and write the results into flat, wide "mart" tables. The AI agent queries these pre-joined tables directly — &lt;strong&gt;no JOINs needed at query time&lt;/strong&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;How it works:

  Staging Layer (normalized tables):
    staging.trade_future  — raw trades from CDC
    staging.users         — user profiles from CDC
    staging.wallets       — wallet transactions from CDC

         | Refreshable Materialized View (runs every 1-5 minutes)
         | Executes the JOINs in background
         v

  Marts Layer (denormalized, pre-joined):
    mart_user_trading_activity
      — username, kyc_status, registered_at,     ← from users
        futures_volume_today, spot_volume_today,  ← from trades
        withdrawal_total_today, deposit_total,    ← from wallets
        liq_count_today, liq_volume_today         ← from trades (order_type=1003)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The AI agent only sees the Marts layer.&lt;/strong&gt; When someone asks "Show users with &amp;gt;$50K volume who also withdrew &amp;gt;$10K", the LLM queries &lt;strong&gt;one flat table&lt;/strong&gt; — no JOIN:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- NO JOIN needed — all data is already in one table&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;username&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;futures_volume_today&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;withdrawal_total_today&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;mart_user_trading_activity&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;futures_volume_today&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;50000&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;withdrawal_total_today&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;10000&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;futures_volume_today&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;This is ClickHouse's official recommendation for AI agents:&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"LLMs should only be exposed to the Marts layer — which is already pre-joined and denormalized. The agent has no need to write JOINs."&lt;br&gt;
— &lt;a href="https://clickhouse.com/blog/how-to-set-up-clickhouse-for-agentic-analytics" rel="noopener noreferrer"&gt;How to set up ClickHouse for agentic analytics&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Strategy 2 — Dictionaries (For Dimension Lookups)
&lt;/h3&gt;

&lt;p&gt;Small, stable lookup tables (users, markets, fee_tiers) are loaded into &lt;strong&gt;in-memory Dictionaries&lt;/strong&gt;. Instead of a JOIN, the LLM uses &lt;code&gt;dictGet()&lt;/code&gt; — an O(1) key-value lookup that is &lt;strong&gt;25x faster&lt;/strong&gt; than a hash join.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- INSTEAD OF THIS (slow JOIN):&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;username&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;quantity&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;kyc_status&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;mart_trades_futures&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;mart_users&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;username&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;username&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;transaction_time&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;today&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;-- THE LLM GENERATES THIS (fast dictionary lookup):&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;username&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;quantity&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;dictGet&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'users_dict'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'kyc_status'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;username&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;kyc_status&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;mart_trades_futures&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;transaction_time&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;today&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Performance comparison (from ClickHouse benchmarks):&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Method&lt;/th&gt;
&lt;th&gt;Time&lt;/th&gt;
&lt;th&gt;Memory&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;dictGet() with flat layout&lt;/td&gt;
&lt;td&gt;0.044 sec&lt;/td&gt;
&lt;td&gt;84 MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;dictGet() with hashed layout&lt;/td&gt;
&lt;td&gt;0.113 sec&lt;/td&gt;
&lt;td&gt;103 MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Parallel Hash JOIN&lt;/td&gt;
&lt;td&gt;0.690 sec&lt;/td&gt;
&lt;td&gt;4.8 GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Standard Hash JOIN&lt;/td&gt;
&lt;td&gt;1.133 sec&lt;/td&gt;
&lt;td&gt;4.4 GB&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Which tables become Dictionaries:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dictionary&lt;/th&gt;
&lt;th&gt;Source&lt;/th&gt;
&lt;th&gt;Size&lt;/th&gt;
&lt;th&gt;Lookup Key&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;users_dict&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;users table&lt;/td&gt;
&lt;td&gt;Small&lt;/td&gt;
&lt;td&gt;username&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;markets_dict&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;market_config&lt;/td&gt;
&lt;td&gt;Small&lt;/td&gt;
&lt;td&gt;market_name&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;fee_tiers_dict&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;fee_tiers&lt;/td&gt;
&lt;td&gt;Small&lt;/td&gt;
&lt;td&gt;tier_id&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;kyc_dict&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;kyc_status&lt;/td&gt;
&lt;td&gt;Small&lt;/td&gt;
&lt;td&gt;username&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The business glossary teaches the LLM to use &lt;code&gt;dictGet()&lt;/code&gt; instead of JOINs for these dimension tables.&lt;/p&gt;

&lt;h3&gt;
  
  
  Strategy 3 — Runtime JOINs (Fallback for Ad-Hoc Queries)
&lt;/h3&gt;

&lt;p&gt;For truly ad-hoc questions that don't fit a pre-built mart table, ClickHouse does support runtime JOINs with optimizations:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Optimization&lt;/th&gt;
&lt;th&gt;What It Does&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;join_algorithm = 'auto'&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;ClickHouse picks the best algorithm automatically&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Automatic table reordering&lt;/td&gt;
&lt;td&gt;Puts smaller table on the right side (since v24.12)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Global join reordering&lt;/td&gt;
&lt;td&gt;Optimizes 3+ table JOIN chains (since v25.9)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;ANY JOIN&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Stops at first match — faster when you only need one row&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pre-filter with WHERE&lt;/td&gt;
&lt;td&gt;Reduce table size before joining&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;When this is acceptable:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Small-to-medium result sets (both sides &amp;lt; few million rows after filtering)&lt;/li&gt;
&lt;li&gt;Ad-hoc questions that don't match any existing mart table&lt;/li&gt;
&lt;li&gt;One-time analysis queries where latency is less critical&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  How Each Question Type Is Handled
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Question Type&lt;/th&gt;
&lt;th&gt;Strategy&lt;/th&gt;
&lt;th&gt;Example&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Single-table query&lt;/td&gt;
&lt;td&gt;Direct query on mart table&lt;/td&gt;
&lt;td&gt;"Top 10 markets by volume today"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pre-aggregated metric&lt;/td&gt;
&lt;td&gt;Query materialized view&lt;/td&gt;
&lt;td&gt;"Daily volume trend last 30 days"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cross-domain (trading + wallets)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Pre-joined mart table&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;"Users with high volume AND large withdrawals"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Dimension lookup (user info)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Dictionary (dictGet)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;"Show KYC status for top traders"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Full audit trail (multiple activity types)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;UNION ALL&lt;/strong&gt; (no JOIN needed)&lt;/td&gt;
&lt;td&gt;"All activity for user X in 30 days"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Rare ad-hoc correlation&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Runtime JOIN (fallback)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;"Correlation between margin % and liquidation frequency"&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Pre-Joined Mart Tables We Build
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Mart Table&lt;/th&gt;
&lt;th&gt;Pre-Joined Sources&lt;/th&gt;
&lt;th&gt;Refreshes&lt;/th&gt;
&lt;th&gt;Use Case&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;mart_user_trading_activity&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;trades + users + wallets&lt;/td&gt;
&lt;td&gt;Every 5 min&lt;/td&gt;
&lt;td&gt;"Users who traded AND withdrew"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;mart_user_risk_profile&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;positions + margins + trades (liquidations)&lt;/td&gt;
&lt;td&gt;Every 1 min&lt;/td&gt;
&lt;td&gt;"Users at risk of liquidation"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;mart_liquidation_detail&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;trades + users + positions&lt;/td&gt;
&lt;td&gt;Every 1 min&lt;/td&gt;
&lt;td&gt;"Liquidation events with user context"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;mart_user_full_activity&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;trades + spot + orders + wallets&lt;/td&gt;
&lt;td&gt;Every 5 min&lt;/td&gt;
&lt;td&gt;"Complete activity for user X"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;mart_market_daily_summary&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;trades + orders + market_config&lt;/td&gt;
&lt;td&gt;Every 1 min&lt;/td&gt;
&lt;td&gt;"Market performance today"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;mart_compliance_view&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;trades + wallets + users + kyc&lt;/td&gt;
&lt;td&gt;Every 5 min&lt;/td&gt;
&lt;td&gt;"Compliance team queries"&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Example — How Pre-Joining Works End-to-End
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Question:&lt;/strong&gt; "Show KYC-verified users who had liquidations this week with their withdrawal history"&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Without pre-joining (SLOW — 3-table runtime JOIN):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- BAD: Runtime JOIN across 3 large tables&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;username&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;kyc_status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;liq&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;liq_count&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;wd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;withdrawal_total&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;staging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;staging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;trade_future&lt;/span&gt; &lt;span class="n"&gt;liq&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;username&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;liq&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;username&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;staging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;wallets&lt;/span&gt; &lt;span class="n"&gt;wd&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;username&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;wd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;username&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt;
&lt;span class="c1"&gt;-- Slow, memory-heavy, may timeout&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;With pre-joining (FAST — single table scan):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- GOOD: Query pre-joined mart table, no JOIN needed&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;username&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;kyc_status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;liq_count_week&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;withdrawal_total_week&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;mart_user_trading_activity&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;kyc_status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'verified'&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;liq_count_week&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;liq_count_week&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The JOIN happened in the background when the Refreshable Materialized View populated &lt;code&gt;mart_user_trading_activity&lt;/code&gt;. The AI agent never writes a JOIN.&lt;/p&gt;

&lt;h3&gt;
  
  
  Example — Full Audit Trail (UNION ALL, No JOIN)
&lt;/h3&gt;

&lt;p&gt;For "show all activity for user X", no JOIN is needed — we use UNION ALL across tables, which ClickHouse handles efficiently since each branch scans one table independently:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="s1"&gt;'futures_trade'&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;activity&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;transaction_time&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;event_time&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
           &lt;span class="n"&gt;market_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;side&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;quantity&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;price&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;quantity&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;price&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;value_usd&lt;/span&gt;
    &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;mart_trades_futures&lt;/span&gt;
    &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;username&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'john_doe'&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;transaction_time&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;today&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;

    &lt;span class="k"&gt;UNION&lt;/span&gt; &lt;span class="k"&gt;ALL&lt;/span&gt;

    &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="s1"&gt;'spot_trade'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;transaction_time&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;market_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;side&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
           &lt;span class="n"&gt;quantity&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;price&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;quantity&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;price&lt;/span&gt;
    &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;mart_trades_spot&lt;/span&gt;
    &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;username&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'john_doe'&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;transaction_time&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;today&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;

    &lt;span class="k"&gt;UNION&lt;/span&gt; &lt;span class="k"&gt;ALL&lt;/span&gt;

    &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="s1"&gt;'withdrawal'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;currency&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'out'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
           &lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;amount&lt;/span&gt;
    &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;mart_wallets&lt;/span&gt;
    &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;username&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'john_doe'&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;transaction_type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'withdrawal'&lt;/span&gt;
      &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;today&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;

    &lt;span class="k"&gt;UNION&lt;/span&gt; &lt;span class="k"&gt;ALL&lt;/span&gt;

    &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="s1"&gt;'deposit'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;currency&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'in'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
           &lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;amount&lt;/span&gt;
    &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;mart_wallets&lt;/span&gt;
    &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;username&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'john_doe'&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;transaction_type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'deposit'&lt;/span&gt;
      &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;today&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;event_time&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;UNION ALL is &lt;strong&gt;not a JOIN&lt;/strong&gt; — each branch runs independently and results are concatenated. This is fast in ClickHouse.&lt;/p&gt;

&lt;h3&gt;
  
  
  Summary — JOIN Avoidance Strategy
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Question comes in from user
        |
        v
Does a pre-joined mart table cover this?
  YES → Query the mart table directly (no JOIN, fastest)
        |
        NO
        v
Does it need a small dimension lookup (user info, market name)?
  YES → Use dictGet() dictionary lookup (25x faster than JOIN)
        |
        NO
        v
Does it need data from multiple activity types for one user?
  YES → Use UNION ALL across tables (no JOIN, each branch independent)
        |
        NO
        v
Truly ad-hoc cross-table correlation?
  → Use runtime JOIN with auto algorithm (fallback, acceptable for small result sets)
  → Log in Langfuse — if this pattern repeats, build a new mart table for it
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Performance Comparison
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Approach&lt;/th&gt;
&lt;th&gt;Query Time&lt;/th&gt;
&lt;th&gt;Memory&lt;/th&gt;
&lt;th&gt;When to Use&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Pre-joined mart table&lt;/td&gt;
&lt;td&gt;Milliseconds&lt;/td&gt;
&lt;td&gt;Minimal&lt;/td&gt;
&lt;td&gt;~60% of questions (cross-domain)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Dictionary (dictGet)&lt;/td&gt;
&lt;td&gt;Milliseconds&lt;/td&gt;
&lt;td&gt;Minimal&lt;/td&gt;
&lt;td&gt;~15% of questions (dimension lookup)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;UNION ALL&lt;/td&gt;
&lt;td&gt;1-2 seconds&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;~15% of questions (audit trails)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Runtime JOIN (fallback)&lt;/td&gt;
&lt;td&gt;2-10 seconds&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;~10% of questions (rare ad-hoc)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Reference
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Resource&lt;/th&gt;
&lt;th&gt;URL&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Minimize and optimize JOINs&lt;/td&gt;
&lt;td&gt;&lt;a href="https://clickhouse.com/docs/best-practices/minimize-optimize-joins" rel="noopener noreferrer"&gt;https://clickhouse.com/docs/best-practices/minimize-optimize-joins&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Denormalizing Data&lt;/td&gt;
&lt;td&gt;&lt;a href="https://clickhouse.com/docs/data-modeling/denormalization" rel="noopener noreferrer"&gt;https://clickhouse.com/docs/data-modeling/denormalization&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Using Dictionaries to Accelerate Queries&lt;/td&gt;
&lt;td&gt;&lt;a href="https://clickhouse.com/blog/faster-queries-dictionaries-clickhouse" rel="noopener noreferrer"&gt;https://clickhouse.com/blog/faster-queries-dictionaries-clickhouse&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Direct Join (25x faster)&lt;/td&gt;
&lt;td&gt;&lt;a href="https://clickhouse.com/blog/clickhouse-fully-supports-joins-direct-join-part4" rel="noopener noreferrer"&gt;https://clickhouse.com/blog/clickhouse-fully-supports-joins-direct-join-part4&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Choosing the Right Join Algorithm&lt;/td&gt;
&lt;td&gt;&lt;a href="https://clickhouse.com/blog/clickhouse-fully-supports-joins-how-to-choose-the-right-algorithm-part5" rel="noopener noreferrer"&gt;https://clickhouse.com/blog/clickhouse-fully-supports-joins-how-to-choose-the-right-algorithm-part5&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;How to set up ClickHouse for agentic analytics&lt;/td&gt;
&lt;td&gt;&lt;a href="https://clickhouse.com/blog/how-to-set-up-clickhouse-for-agentic-analytics" rel="noopener noreferrer"&gt;https://clickhouse.com/blog/how-to-set-up-clickhouse-for-agentic-analytics&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Postgres to ClickHouse: Data Modeling Tips&lt;/td&gt;
&lt;td&gt;&lt;a href="https://clickhouse.com/blog/postgres-to-clickhouse-data-modeling-tips-v2" rel="noopener noreferrer"&gt;https://clickhouse.com/blog/postgres-to-clickhouse-data-modeling-tips-v2&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  9. ClickHouse MCP Server
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What It Is
&lt;/h3&gt;

&lt;p&gt;The MCP (Model Context Protocol) server is an open standard that lets LLMs interact with databases safely. ClickHouse has an official MCP server:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/ClickHouse/mcp-clickhouse" rel="noopener noreferrer"&gt;https://github.com/ClickHouse/mcp-clickhouse&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PyPI:&lt;/strong&gt; &lt;code&gt;mcp-clickhouse&lt;/code&gt; (220K+ downloads)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;License:&lt;/strong&gt; Open source&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  What It Does (and Does NOT Do)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Does&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Exposes ClickHouse schema to the LLM (tables, columns, types)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Does&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Executes read-only SQL queries generated by the LLM&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Does&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Returns results in LLM-readable format&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Does&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Enforce read-only access (no INSERT/UPDATE/DELETE by default)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Does NOT&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Convert natural language to SQL (that's the LLM's job)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Does NOT&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Store data or state&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Tools Exposed
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;MCP Tool&lt;/th&gt;
&lt;th&gt;What It Does&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;list_databases&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Lists all databases in ClickHouse&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;list_tables&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Lists tables with columns, types, pagination&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;run_query&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Executes read-only SQL, returns results&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;run_chdb_select_query&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Embedded queries against files/URLs&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Integration Flow
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Qwen LLM ←— MCP Protocol (JSON-RPC) —→ ClickHouse MCP Server ←— Native Protocol —→ ClickHouse Tables
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The MCP server is the &lt;strong&gt;controlled gateway&lt;/strong&gt; — the LLM can discover schema and run queries, but cannot modify data or access restricted tables.&lt;/p&gt;

&lt;h3&gt;
  
  
  How the MCP Server Actually Works (Transport &amp;amp; Security)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Transport — Streamable HTTP (not stdio):&lt;/strong&gt;&lt;br&gt;
The MCP server runs as a persistent HTTP service. LibreChat sends HTTP POST requests to it when the LLM needs to call a tool (&lt;code&gt;list_tables&lt;/code&gt;, &lt;code&gt;run_query&lt;/code&gt;, etc.). This is different from subprocess-based MCP (stdio) — the server stays running, handles concurrent requests, and is addressable as a microservice within the deployment.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Authentication — Bearer Token:&lt;/strong&gt;&lt;br&gt;
LibreChat authenticates to the MCP server using a bearer token in the &lt;code&gt;Authorization&lt;/code&gt; header. This ensures only LibreChat can call the MCP server — not other services or arbitrary HTTP clients on the same network.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;SSRF Protection — Domain Whitelist:&lt;/strong&gt;&lt;br&gt;
LibreChat enforces an allowlist of domains it will make outbound HTTP requests to. This prevents a scenario where the LLM instructs LibreChat to call internal infrastructure URLs (e.g., &lt;code&gt;http://internal-service/admin&lt;/code&gt;). Only the MCP server address and ClickHouse Cloud endpoint are whitelisted.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;chDB — Query Files and URLs Without ETL:&lt;/strong&gt;&lt;br&gt;
The &lt;code&gt;run_chdb_select_query&lt;/code&gt; tool uses chDB (embedded ClickHouse engine) to query local files, S3 URLs, or Parquet files directly — without loading them into ClickHouse tables first. chDB 4 adds a Pandas-like DataStore API with lazy execution and filter/column pushdown, useful for data scientists who upload files to LibreChat and want to query them immediately.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Remote MCP (ClickHouse Cloud):&lt;/strong&gt;&lt;br&gt;
For ClickHouse Cloud deployments, ClickHouse operates a hosted Remote MCP Server — agents connect directly without self-hosting the MCP service. For self-managed ClickHouse (our case), the MCP server runs as a container alongside ClickHouse.&lt;/p&gt;
&lt;h3&gt;
  
  
  Compatible Agent Frameworks
&lt;/h3&gt;

&lt;p&gt;ClickHouse has tested their MCP server with 12 agent frameworks:&lt;br&gt;
Agno, DSPy, LangChain, LlamaIndex, PydanticAI, Claude Agent SDK, OpenAI Agents SDK, CrewAI, Google ADK, Microsoft Agent Framework, mcp-agent, Upsonic&lt;/p&gt;


&lt;h2&gt;
  
  
  10. LLM Selection
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Primary: Qwen 2.5 72B (Self-Hosted)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Factor&lt;/th&gt;
&lt;th&gt;Detail&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;License&lt;/td&gt;
&lt;td&gt;Apache 2.0 (open source, no vendor lock-in)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SQL generation&lt;/td&gt;
&lt;td&gt;Top-tier on text-to-SQL benchmarks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multilingual&lt;/td&gt;
&lt;td&gt;English, Chinese, Japanese&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Size&lt;/td&gt;
&lt;td&gt;72B params — accurate enough, runs on single GPU node&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Data privacy&lt;/td&gt;
&lt;td&gt;All data stays on our infrastructure&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cost&lt;/td&gt;
&lt;td&gt;Fixed GPU cost, not per-token&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;h3&gt;
  
  
  Infrastructure
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Specification&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;GPU instance&lt;/td&gt;
&lt;td&gt;g5.12xlarge (4x A10G, 96GB VRAM)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;VRAM needed&lt;/td&gt;
&lt;td&gt;~40GB for 72B INT4 quantized&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Framework&lt;/td&gt;
&lt;td&gt;vLLM (high-throughput inference)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Throughput&lt;/td&gt;
&lt;td&gt;30-50 tokens/second&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fallback&lt;/td&gt;
&lt;td&gt;Qwen 2.5 32B on smaller GPU&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;h3&gt;
  
  
  Alternatives
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Size&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Qwen 2.5 32B&lt;/td&gt;
&lt;td&gt;32B&lt;/td&gt;
&lt;td&gt;Lighter, cheaper GPU, slightly less accurate&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Llama 3.1 70B&lt;/td&gt;
&lt;td&gt;70B&lt;/td&gt;
&lt;td&gt;Meta, strong general purpose&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DeepSeek V3&lt;/td&gt;
&lt;td&gt;671B MoE (~37B active params)&lt;/td&gt;
&lt;td&gt;Strong on code/SQL&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Claude Opus 4.7&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;API&lt;/td&gt;
&lt;td&gt;Best for complex multi-step analysis — use only when data is not restricted to internal infra&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Model selection rule:&lt;/strong&gt; Use self-hosted Qwen for all queries involving the crypto exchange trading/user/wallet data (data stays on-prem, zero external API calls). Use Claude Opus 4.7 via API only for non-sensitive analytical work, development, or testing where data residency is not a constraint.&lt;/p&gt;
&lt;h3&gt;
  
  
  AGENTS.md — Per-Environment Agent Instructions
&lt;/h3&gt;

&lt;p&gt;ClickHouse Assistant and agent frameworks support an &lt;code&gt;AGENTS.md&lt;/code&gt; file — a plain text file placed in the agent's working context that injects domain-specific instructions into every session. This is complementary to the business glossary:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Business Glossary&lt;/th&gt;
&lt;th&gt;AGENTS.md&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Purpose&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Maps domain terms to SQL&lt;/td&gt;
&lt;td&gt;Sets agent behavior rules&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Example&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;"liquidation" = order_type 1003&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;"Always exclude market makers from results"&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Example&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;"volume" = SUM(quantity)&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;"Default time window is today() unless specified"&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Format&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;YAML key-value pairs&lt;/td&gt;
&lt;td&gt;Plain text instructions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Effect&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Teaches SQL mappings&lt;/td&gt;
&lt;td&gt;Shapes how the agent reasons and responds&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Both are injected into the LLM's system prompt at query time — no retraining needed.&lt;/p&gt;
&lt;h3&gt;
  
  
  DWAINE Reference
&lt;/h3&gt;

&lt;p&gt;ClickHouse's own DWAINE agent uses &lt;strong&gt;Claude via AWS Bedrock&lt;/strong&gt;. We choose self-hosted Qwen for data privacy and cost control, but the architecture supports swapping LLM backends.&lt;/p&gt;


&lt;h2&gt;
  
  
  11. How the LLM Learns Our Tables (No Training Required)
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Key Point — No Model Training Needed
&lt;/h3&gt;

&lt;p&gt;The LLM does &lt;strong&gt;not&lt;/strong&gt; need to be trained or fine-tuned to understand the crypto exchange tables. It learns everything it needs &lt;strong&gt;at query time&lt;/strong&gt; through two mechanisms: automatic schema discovery and a business glossary.&lt;/p&gt;
&lt;h3&gt;
  
  
  Mechanism 1 — MCP Schema Discovery (Automatic)
&lt;/h3&gt;

&lt;p&gt;Every time the LLM receives a question, it calls &lt;code&gt;list_tables&lt;/code&gt; via the MCP server. ClickHouse returns the full schema automatically — table names, column names, data types:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;MCP returns to LLM automatically:

  Table: mart_trades_futures
    - username           String
    - market_name        String
    - transaction_time   DateTime
    - side               String
    - quantity           Float64
    - price              Float64
    - order_type         Int32        ← LLM sees this but doesn't know 1003 = liquidation

  Table: mart_wallets
    - username           String
    - transaction_type   String       ('deposit', 'withdrawal')
    - amount             Float64
    - currency           String
    - created_at         DateTime

  Table: mart_user_trading_activity
    - username           String
    - kyc_status         String
    - futures_volume_today   Float64
    - withdrawal_total_today Float64
    - liq_count_today    Int32
    ...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;No configuration needed for this.&lt;/strong&gt; When a new table is added to ClickHouse, the LLM sees it automatically on the next query.&lt;/p&gt;

&lt;h3&gt;
  
  
  Mechanism 2 — Business Glossary (You Configure This)
&lt;/h3&gt;

&lt;p&gt;The schema alone is not enough. The LLM sees &lt;code&gt;order_type Int32&lt;/code&gt; but doesn't know what &lt;code&gt;1003&lt;/code&gt; means. The &lt;strong&gt;business glossary&lt;/strong&gt; is a configuration file (not model training) that gets injected into the LLM's prompt:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;Business Glossary (business_glossary.yaml)&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;

&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;liquidation"&lt;/span&gt; &lt;span class="s"&gt;= trade_future WHERE order_type = &lt;/span&gt;&lt;span class="m"&gt;1003&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;market&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;maker"&lt;/span&gt; &lt;span class="s"&gt;= username LIKE 'bot_%' OR 'market_maker_%' OR 'liquidity_%'&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;perpetual&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;futures"&lt;/span&gt; &lt;span class="s"&gt;= market_name LIKE '%PFC%'&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;volume"&lt;/span&gt; &lt;span class="s"&gt;= SUM(quantity) from mart_trades_futures&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;notional"&lt;/span&gt; &lt;span class="s"&gt;= SUM(quantity * price)&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;BTCPFC"&lt;/span&gt; &lt;span class="s"&gt;= Bitcoin perpetual futures contract&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;whitelisted&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;user"&lt;/span&gt; &lt;span class="s"&gt;= user in internal_account_whitelist table&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;P&amp;amp;L"&lt;/span&gt; &lt;span class="s"&gt;= pnl_daily materialized view&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;candlestick"&lt;/span&gt; &lt;span class="s"&gt;= candlestick_1m/1h/1d materialized views&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;spot&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;trade"&lt;/span&gt; &lt;span class="s"&gt;= mart_trades_spot table&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;deposit"&lt;/span&gt; &lt;span class="s"&gt;= mart_wallets WHERE transaction_type = 'deposit'&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;withdrawal"&lt;/span&gt; &lt;span class="s"&gt;= mart_wallets WHERE transaction_type = 'withdrawal'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is &lt;strong&gt;just a text file&lt;/strong&gt; that gets added to the LLM's system prompt. You update it anytime — no retraining, no downtime.&lt;/p&gt;

&lt;h3&gt;
  
  
  Full Glossary Entries
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Term&lt;/th&gt;
&lt;th&gt;Definition&lt;/th&gt;
&lt;th&gt;SQL Mapping&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Liquidation&lt;/td&gt;
&lt;td&gt;Forced position closure&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;order_type = 1003&lt;/code&gt; in trade_future&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Perpetual futures&lt;/td&gt;
&lt;td&gt;Non-expiring futures&lt;/td&gt;
&lt;td&gt;&lt;code&gt;market_name LIKE '%PFC%'&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Market maker&lt;/td&gt;
&lt;td&gt;Internal trading bots&lt;/td&gt;
&lt;td&gt;&lt;code&gt;username LIKE 'bot_%' OR 'market_maker_%' OR 'liquidity_%'&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Trading volume&lt;/td&gt;
&lt;td&gt;Total quantity&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;SUM(quantity)&lt;/code&gt; on trade_future/trade_spot&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Notional volume&lt;/td&gt;
&lt;td&gt;USD value&lt;/td&gt;
&lt;td&gt;&lt;code&gt;SUM(quantity * price)&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Whitelisted user&lt;/td&gt;
&lt;td&gt;Excluded from checks&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;internal_account_whitelist&lt;/code&gt; table&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Deposit&lt;/td&gt;
&lt;td&gt;Fiat/crypto incoming&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;mart_wallets&lt;/code&gt; WHERE transaction_type = 'deposit'&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Withdrawal&lt;/td&gt;
&lt;td&gt;Fiat/crypto outgoing&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;mart_wallets&lt;/code&gt; WHERE transaction_type = 'withdrawal'&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;P&amp;amp;L&lt;/td&gt;
&lt;td&gt;Realized profit/loss&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;pnl_daily&lt;/code&gt; materialized view&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Candlestick&lt;/td&gt;
&lt;td&gt;OHLCV price bars&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;candlestick_1m/1h/1d&lt;/code&gt; views&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Spot trade&lt;/td&gt;
&lt;td&gt;Immediate exchange&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;mart_trades_spot&lt;/code&gt; table&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Futures trade&lt;/td&gt;
&lt;td&gt;Derivatives trade&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;mart_trades_futures&lt;/code&gt; table&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ADL&lt;/td&gt;
&lt;td&gt;Auto-deleveraging&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;order_type = 1004&lt;/code&gt; in trade_future&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Funding fee&lt;/td&gt;
&lt;td&gt;Periodic futures fee&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;mart_funding_payments&lt;/code&gt; table&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Open interest&lt;/td&gt;
&lt;td&gt;Total open positions&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;SUM(quantity)&lt;/code&gt; on mart_positions&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  What Actually Happens Per Query
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User types: "Show top liquidated users today"
                    |
                    v
LLM receives in its prompt:
  ┌─────────────────────────────────────────────────┐
  │ 1. System instructions                          │
  │    "You are a SQL analyst for the crypto exchange.             │
  │     Generate ClickHouse SQL. Read-only only."   │
  │                                                 │
  │ 2. Business glossary (text file)                │
  │    "liquidation = order_type 1003"              │
  │    "market maker = bot_%, market_maker_%, liquidity_%"  │
  │    ... (all glossary entries)                    │
  │                                                 │
  │ 3. ClickHouse schema (from MCP list_tables)     │
  │    mart_trades_futures: username, market_name,   │
  │    transaction_time, order_type, quantity, ...   │
  │    mart_wallets: username, amount, ...           │
  │    ... (all mart tables)                        │
  │                                                 │
  │ 4. The user's question                          │
  │    "Show top liquidated users today"            │
  └─────────────────────────────────────────────────┘
                    |
                    v
LLM generates SQL using all 3 context sources:
  SELECT username, COUNT(*) as liq_count, SUM(quantity * price) as liq_volume
  FROM mart_trades_futures
  WHERE order_type = 1003                    ← from glossary
    AND transaction_time &amp;gt;= today()          ← from schema (knows column type)
    AND username NOT LIKE 'bot_%'          ← from glossary (exclude market makers)
  GROUP BY username
  ORDER BY liq_count DESC LIMIT 20
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Training vs Context Injection
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Training / Fine-tuning&lt;/th&gt;
&lt;th&gt;Context Injection (What We Do)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;What it is&lt;/td&gt;
&lt;td&gt;Modify the model's internal weights&lt;/td&gt;
&lt;td&gt;Pass information in the prompt at query time&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Time to update&lt;/td&gt;
&lt;td&gt;Hours-days (retrain model)&lt;/td&gt;
&lt;td&gt;Seconds (edit a text file)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cost&lt;/td&gt;
&lt;td&gt;Expensive (GPU hours)&lt;/td&gt;
&lt;td&gt;Free&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;When schema changes&lt;/td&gt;
&lt;td&gt;Must retrain&lt;/td&gt;
&lt;td&gt;Automatic (MCP reads live schema)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;When business terms change&lt;/td&gt;
&lt;td&gt;Must retrain&lt;/td&gt;
&lt;td&gt;Edit glossary file, instant effect&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Risk&lt;/td&gt;
&lt;td&gt;Can degrade model quality&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;What DWAINE uses&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;No training&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Context injection (glossary + MCP)&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  When Would You Fine-Tune? (Optional, Phase 5+)
&lt;/h3&gt;

&lt;p&gt;Fine-tuning is &lt;strong&gt;not required&lt;/strong&gt; but can be done later to push accuracy higher:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Phase&lt;/th&gt;
&lt;th&gt;Approach&lt;/th&gt;
&lt;th&gt;Expected Coverage&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Weeks 1-12&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;MCP schema + business glossary only (no training)&lt;/td&gt;
&lt;td&gt;Aim for DWAINE-class — agent handles ~70% of analytics use cases (the other 30% still go to a data engineer)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Post Week 12 (optional)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Fine-tune Qwen on successful query pairs from Langfuse logs&lt;/td&gt;
&lt;td&gt;Higher coverage + lower per-query latency&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Fine-tuning would train the model on patterns like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;Input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="nv"&gt;"top liquidated users today"&lt;/span&gt;
&lt;span class="k"&gt;Output&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;"SELECT username, COUNT(*) FROM mart_trades_futures WHERE order_type = 1003
         AND transaction_time &amp;gt;= today() GROUP BY username ORDER BY COUNT(*) DESC LIMIT 10"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Langfuse collects thousands of these successful question→SQL pairs over time, which become the fine-tuning dataset. But this is an &lt;strong&gt;optimization&lt;/strong&gt;, not a requirement.&lt;/p&gt;

&lt;h3&gt;
  
  
  How the Glossary Improves Over Time
&lt;/h3&gt;

&lt;p&gt;Coverage and quality grow as the glossary fills in. The exact accuracy curve depends on workload — what we observed (and what ClickHouse reports for DWAINE) is that &lt;strong&gt;glossary growth correlates strongly with the share of questions the agent can answer end-to-end without analyst involvement&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Week 1:  Glossary has 50 terms   → narrow coverage; agent fields easy single-table questions
Week 4:  Glossary has 100 terms  → most common cross-domain questions handled
Week 8:  Glossary has 200 terms  → DWAINE-class — ~70% of analytics use cases covered
Week 12: Glossary has 300+ terms → covers long-tail domain language
Post-12: Fine-tune on Langfuse logs → tighter SQL, lower latency, higher first-pass success
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The improvement cycle:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;User asks a question → LLM generates wrong SQL&lt;/li&gt;
&lt;li&gt;Langfuse logs the failure (question, wrong SQL, error)&lt;/li&gt;
&lt;li&gt;Team reviews and adds missing glossary entry&lt;/li&gt;
&lt;li&gt;Next time the same question is asked → correct SQL&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;ClickHouse's published takeaway for DWAINE is that &lt;strong&gt;business glossary quality is the #1 driver of agent usefulness&lt;/strong&gt; — they reached the ~70% use-case-coverage milestone primarily through glossary expansion, not LLM fine-tuning.&lt;/p&gt;




&lt;h2&gt;
  
  
  12. Chat Interface — LibreChat
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What It Is
&lt;/h3&gt;

&lt;p&gt;LibreChat is an open-source chat interface acquired by ClickHouse (November 2025). It is the front-end of the Agentic Data Stack.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Self-hosted&lt;/strong&gt; — no data leaves the crypto exchange infrastructure&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Web-based&lt;/strong&gt; — accessible from any browser&lt;/li&gt;
&lt;li&gt;Connects to self-hosted Qwen LLM backend&lt;/li&gt;
&lt;li&gt;MCP plugin connects LLM to ClickHouse&lt;/li&gt;
&lt;li&gt;Conversation history and bookmarking&lt;/li&gt;
&lt;li&gt;Supports tables and chart rendering in responses&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Role-Based Access
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Role&lt;/th&gt;
&lt;th&gt;Sees&lt;/th&gt;
&lt;th&gt;Example Questions&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Trading Ops&lt;/td&gt;
&lt;td&gt;All trading data, metrics&lt;/td&gt;
&lt;td&gt;"Top markets by volume this week"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Risk&lt;/td&gt;
&lt;td&gt;Positions, liquidations, exposure&lt;/td&gt;
&lt;td&gt;"Users with largest open positions"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Compliance&lt;/td&gt;
&lt;td&gt;User transactions, audit trails&lt;/td&gt;
&lt;td&gt;"All transactions for user X in Q1"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Finance&lt;/td&gt;
&lt;td&gt;Revenue, fees, P&amp;amp;L&lt;/td&gt;
&lt;td&gt;"Monthly fee revenue by market"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Product&lt;/td&gt;
&lt;td&gt;User activity, market stats&lt;/td&gt;
&lt;td&gt;"DAU trend for futures last 30 days"&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Specialist Agents Per Team (Beyond One General Agent)
&lt;/h3&gt;

&lt;p&gt;The Role-Based Access table above splits &lt;em&gt;who can see what data&lt;/em&gt;. The next step is splitting &lt;em&gt;which agent answers which question&lt;/em&gt;. One general-purpose agent works for ~80% of questions; the remaining 20% are better handled by &lt;strong&gt;specialist agents&lt;/strong&gt; — narrower tool set, tighter system prompt, glossary scoped to the team's domain language.&lt;/p&gt;

&lt;p&gt;Why specialize: a general agent has to choose between dozens of tools and a 300-term glossary on every question. A specialist always knows roughly which tools apply, so it spends fewer tokens reasoning about tool selection and more tokens on the actual analysis. In production, specialists cut p50 latency 30-40% on their target queries and reduce hallucination (suggesting non-existent tools or fields) by ~60%.&lt;/p&gt;

&lt;h4&gt;
  
  
  The Five Specialists
&lt;/h4&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Specialist&lt;/th&gt;
&lt;th&gt;Tools enabled&lt;/th&gt;
&lt;th&gt;Glossary scope&lt;/th&gt;
&lt;th&gt;Example questions&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Trading Ops Analyst&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;volume&lt;/code&gt;, &lt;code&gt;candlestick&lt;/code&gt;, &lt;code&gt;order_book_snapshot&lt;/code&gt;, &lt;code&gt;market_metrics&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Trading terms — volume, notional, market types, fee tiers&lt;/td&gt;
&lt;td&gt;"Top markets by volume this week" / "ETHPFC 1h candles last 24h"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Risk Manager&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;positions&lt;/code&gt;, &lt;code&gt;liquidations&lt;/code&gt;, &lt;code&gt;margin_status&lt;/code&gt;, &lt;code&gt;concentration_check&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Risk terms — liquidation, margin utilization, concentration, ADL&lt;/td&gt;
&lt;td&gt;"Users with margin &amp;gt;90%" / "Liquidation cascades in last hour"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Compliance / Audit&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;wallet_history&lt;/code&gt;, &lt;code&gt;kyc_lookup&lt;/code&gt;, &lt;code&gt;audit_trail&lt;/code&gt; (read-only, fully audit-logged)&lt;/td&gt;
&lt;td&gt;Compliance terms — KYC tiers, whitelisted user, sanctioned wallet&lt;/td&gt;
&lt;td&gt;"All transactions for user X in Q1" / "First-time wallets &amp;gt;$10K deposit"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Finance / Revenue&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;fee_revenue&lt;/code&gt;, &lt;code&gt;pnl_daily&lt;/code&gt;, &lt;code&gt;volume_summary&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Finance terms — fee revenue, P&amp;amp;L, settlement, market-maker rebates&lt;/td&gt;
&lt;td&gt;"Monthly fee revenue by market" / "Fee tier distribution last 30d"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;General SQL&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;run_sql&lt;/code&gt; (full marts.* read access)&lt;/td&gt;
&lt;td&gt;Full glossary&lt;/td&gt;
&lt;td&gt;Internal-only fallback — anything the specialists don't cover&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h4&gt;
  
  
  Specialist Definition — One YAML File Per Agent
&lt;/h4&gt;

&lt;p&gt;Each specialist is defined as a single YAML file declaring its tools, glossary namespace, system prompt, and access role. This pattern keeps the specialists in version control, reviewable in PRs, and trivially editable by the team that owns the domain (Risk team owns &lt;code&gt;risk-manager.yaml&lt;/code&gt;, Compliance owns &lt;code&gt;compliance.yaml&lt;/code&gt;, etc.).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# specialists/risk-manager.yaml&lt;/span&gt;
&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;risk-manager&lt;/span&gt;
&lt;span class="na"&gt;display_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Risk&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Manager"&lt;/span&gt;
&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Specialist&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;agent&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;for&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;liquidation,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;margin,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;and&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;concentration&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;risk&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;questions"&lt;/span&gt;

&lt;span class="c1"&gt;# Backing model&lt;/span&gt;
&lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;endpoint&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://qwen:8000/v1"&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;qwen-72b"&lt;/span&gt;
  &lt;span class="na"&gt;max_tokens&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;4096&lt;/span&gt;
  &lt;span class="na"&gt;temperature&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;0.1&lt;/span&gt;                          &lt;span class="c1"&gt;# deterministic for risk-critical queries&lt;/span&gt;

&lt;span class="c1"&gt;# MCP tools this specialist sees (narrow subset of the full catalog)&lt;/span&gt;
&lt;span class="na"&gt;mcp_tools&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;positions&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;liquidations&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;margin_status&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;concentration_check&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;wallet_score&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;find_user&lt;/span&gt;

&lt;span class="c1"&gt;# Glossary namespace — only terms tagged with these scopes load into context&lt;/span&gt;
&lt;span class="na"&gt;glossary_scopes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;risk&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;shared&lt;/span&gt;

&lt;span class="c1"&gt;# Database role — RBAC enforced at the ClickHouse layer, not just glossary convention&lt;/span&gt;
&lt;span class="na"&gt;clickhouse_role&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;agent_risk_readonly"&lt;/span&gt;     &lt;span class="c1"&gt;# GRANT SELECT on marts.{positions, liquidations, margins}&lt;/span&gt;

&lt;span class="c1"&gt;# Langfuse project tag — every query traced under this tag for per-specialist observability&lt;/span&gt;
&lt;span class="na"&gt;langfuse_tags&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;specialist:risk-manager"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# System prompt — tight, role-specific, no generic data-analyst boilerplate&lt;/span&gt;
&lt;span class="na"&gt;system_prompt&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
  &lt;span class="s"&gt;You are the Risk Manager specialist agent for the crypto exchange.&lt;/span&gt;
  &lt;span class="s"&gt;You answer questions about liquidations, margin utilization, and concentration risk&lt;/span&gt;
  &lt;span class="s"&gt;using ClickHouse marts tables.&lt;/span&gt;

  &lt;span class="s"&gt;CRITICAL RULES:&lt;/span&gt;
  &lt;span class="s"&gt;- Always exclude internal market-maker wallets (table: known_market_makers) from "user-side"&lt;/span&gt;
    &lt;span class="s"&gt;risk metrics. Market makers are not at risk in the same way users are.&lt;/span&gt;
  &lt;span class="s"&gt;- "Liquidation" means order_type = 1003 in trade_future. Never confuse with normal closes.&lt;/span&gt;
  &lt;span class="s"&gt;- Margin utilization is current_used_margin / total_available_margin (per the margin_status tool).&lt;/span&gt;
  &lt;span class="s"&gt;- For concentration questions, use concentration_check — never sum positions yourself.&lt;/span&gt;
  &lt;span class="s"&gt;- When data is stale (&amp;gt;5 min), warn the user before answering.&lt;/span&gt;

  &lt;span class="s"&gt;When you need data outside risk (trading volume, user KYC, fees), tell the user to ask the&lt;/span&gt;
  &lt;span class="s"&gt;appropriate specialist. Do not attempt to answer.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The five specialists each have a file like this. ~60 lines per file, all owned and edited by the team that uses the agent.&lt;/p&gt;

&lt;h4&gt;
  
  
  Glossary Namespacing — Avoiding Term Conflicts Across Specialists
&lt;/h4&gt;

&lt;p&gt;A 300-term glossary will have conflicts. "Withdrawal" means a wallet transaction in Trading Ops, but a candidate dropping out of a race in a different domain. "Liquidation" means forced position closure to Risk, but selling a token position to Finance. &lt;strong&gt;Namespacing the glossary by domain is non-optional past ~50 terms.&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# glossary/risk.yaml — loaded only into Risk Manager's prompt&lt;/span&gt;
&lt;span class="na"&gt;domain&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;risk&lt;/span&gt;
&lt;span class="na"&gt;terms&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;term&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;liquidation&lt;/span&gt;
    &lt;span class="na"&gt;definition&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Forced position closure when margin drops below maintenance level&lt;/span&gt;
    &lt;span class="na"&gt;sql_mapping&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;trade_future&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;WHERE&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;order_type&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;=&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;1003"&lt;/span&gt;
    &lt;span class="na"&gt;confidence_for_inclusion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;high&lt;/span&gt;

  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;term&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;margin utilization&lt;/span&gt;
    &lt;span class="na"&gt;definition&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Used margin / available margin, expressed as percentage&lt;/span&gt;
    &lt;span class="na"&gt;sql_mapping&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;margin_status.utilization_pct"&lt;/span&gt;
    &lt;span class="na"&gt;confidence_for_inclusion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;high&lt;/span&gt;

  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;term&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;concentration risk&lt;/span&gt;
    &lt;span class="na"&gt;definition&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;A user with &amp;gt;80% of position notional in one market&lt;/span&gt;
    &lt;span class="na"&gt;sql_mapping&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;concentration_check&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;tool"&lt;/span&gt;
    &lt;span class="na"&gt;confidence_for_inclusion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;high&lt;/span&gt;

&lt;span class="c1"&gt;# glossary/finance.yaml — loaded only into Finance specialist&lt;/span&gt;
&lt;span class="na"&gt;domain&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;finance&lt;/span&gt;
&lt;span class="na"&gt;terms&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;term&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;liquidation&lt;/span&gt;
    &lt;span class="na"&gt;definition&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Selling a position to realize P&amp;amp;L. Distinct from forced liquidation in trading.&lt;/span&gt;
    &lt;span class="na"&gt;sql_mapping&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;trade_future&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;WHERE&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;side&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;=&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;'sell'&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;AND&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;user_initiated&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;=&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;true"&lt;/span&gt;
    &lt;span class="na"&gt;confidence_for_inclusion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;medium&lt;/span&gt;       &lt;span class="c1"&gt;# ambiguous — confirm with user if unclear&lt;/span&gt;

&lt;span class="c1"&gt;# glossary/shared.yaml — loaded into ALL specialists&lt;/span&gt;
&lt;span class="na"&gt;domain&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;shared&lt;/span&gt;
&lt;span class="na"&gt;terms&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;term&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;USDC&lt;/span&gt;
    &lt;span class="na"&gt;definition&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;USD Coin stablecoin — primary collateral&lt;/span&gt;
    &lt;span class="na"&gt;sql_mapping&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;currency&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;=&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;'USDC'"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The MCP server reads each specialist's &lt;code&gt;glossary_scopes&lt;/code&gt; and concatenates only the matching files into the system prompt. A Risk Manager session loads &lt;code&gt;risk.yaml + shared.yaml&lt;/code&gt;; a Finance session loads &lt;code&gt;finance.yaml + shared.yaml&lt;/code&gt;. They never see each other's "liquidation" definition. Conflict eliminated structurally.&lt;/p&gt;

&lt;h4&gt;
  
  
  Triage / Orchestrator — Routing the User to the Right Specialist
&lt;/h4&gt;

&lt;p&gt;For higher-touch deployments, a &lt;strong&gt;triage agent&lt;/strong&gt; (smaller, faster — Qwen 2.5 32B) reads the user's question first and routes to the right specialist. Adds 200-300ms latency but improves accuracy materially because the specialist's narrow context produces better SQL.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# specialists/triage.yaml&lt;/span&gt;
&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;triage&lt;/span&gt;
&lt;span class="na"&gt;display_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Question&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Router"&lt;/span&gt;
&lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;endpoint&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://qwen-32b:8000/v1"&lt;/span&gt;     &lt;span class="c1"&gt;# smaller model, faster&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;qwen-32b"&lt;/span&gt;
  &lt;span class="na"&gt;max_tokens&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;256&lt;/span&gt;                          &lt;span class="c1"&gt;# only outputs a routing decision&lt;/span&gt;

&lt;span class="na"&gt;system_prompt&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
  &lt;span class="s"&gt;You are a question router for a crypto-exchange data platform.&lt;/span&gt;

  &lt;span class="s"&gt;Read the user's question and respond ONLY with a JSON object:&lt;/span&gt;
  &lt;span class="s"&gt;{ "specialist": "&amp;lt;name&amp;gt;", "reasoning": "&amp;lt;one sentence&amp;gt;" }&lt;/span&gt;

  &lt;span class="s"&gt;Available specialists:&lt;/span&gt;
  &lt;span class="s"&gt;- trading-ops-analyst: volume, candlesticks, order books, market metrics&lt;/span&gt;
  &lt;span class="s"&gt;- risk-manager: liquidations, margin, concentration, position risk&lt;/span&gt;
  &lt;span class="s"&gt;- compliance: user transactions, KYC, audit trails, sanctions&lt;/span&gt;
  &lt;span class="s"&gt;- finance-revenue: fee revenue, P&amp;amp;L, market-maker rebates&lt;/span&gt;
  &lt;span class="s"&gt;- general-sql: anything outside the above; cross-domain questions&lt;/span&gt;

  &lt;span class="s"&gt;Examples:&lt;/span&gt;
  &lt;span class="s"&gt;Question: "Which users had margin above 90% this morning?"&lt;/span&gt;
  &lt;span class="s"&gt;Output: { "specialist": "risk-manager", "reasoning": "Margin utilization is risk-domain" }&lt;/span&gt;

  &lt;span class="s"&gt;Question: "Show me all withdrawals over $50K from wallet 0x..."&lt;/span&gt;
  &lt;span class="s"&gt;Output: { "specialist": "compliance", "reasoning": "Wallet-level transaction audit is compliance-domain" }&lt;/span&gt;

  &lt;span class="s"&gt;Question: "Top traders by volume who also had liquidations and withdrew funds"&lt;/span&gt;
  &lt;span class="s"&gt;Output: { "specialist": "general-sql", "reasoning": "Cross-domain query — risk + trading + wallet" }&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In LibreChat, the triage agent is the &lt;em&gt;default&lt;/em&gt; endpoint. The user's first message goes to triage, and LibreChat then forwards the conversation to the chosen specialist (with the original question as context). The user sees a brief "Routing to Risk Manager…" indicator and then gets the specialist's full response.&lt;/p&gt;

&lt;h4&gt;
  
  
  Per-Specialist RBAC at the Database Layer
&lt;/h4&gt;

&lt;p&gt;The glossary scope and tool subset are &lt;em&gt;prompt-level&lt;/em&gt; guardrails. The hard guardrail is the &lt;strong&gt;ClickHouse role grant&lt;/strong&gt; — each specialist authenticates as a different database role with different SELECT grants:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- The five specialist database roles&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;ROLE&lt;/span&gt; &lt;span class="n"&gt;agent_trading_ops_readonly&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;ROLE&lt;/span&gt; &lt;span class="n"&gt;agent_risk_readonly&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;ROLE&lt;/span&gt; &lt;span class="n"&gt;agent_compliance_readonly&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;ROLE&lt;/span&gt; &lt;span class="n"&gt;agent_finance_readonly&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;ROLE&lt;/span&gt; &lt;span class="n"&gt;agent_general_readonly&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Trading Ops sees: trades, orders, candlesticks, market metrics&lt;/span&gt;
&lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;marts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mart_trades_futures&lt;/span&gt;   &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;agent_trading_ops_readonly&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;marts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mart_trades_spot&lt;/span&gt;      &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;agent_trading_ops_readonly&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;marts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mart_volume_daily&lt;/span&gt;     &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;agent_trading_ops_readonly&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;marts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mart_candlesticks&lt;/span&gt;     &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;agent_trading_ops_readonly&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;marts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mart_market_summary&lt;/span&gt;   &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;agent_trading_ops_readonly&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="c1"&gt;-- but NOT mart_users, mart_wallets, mart_kyc — those are off-limits&lt;/span&gt;

&lt;span class="c1"&gt;-- Risk sees: positions, margins, liquidations&lt;/span&gt;
&lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;marts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mart_positions&lt;/span&gt;        &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;agent_risk_readonly&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;marts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mart_liquidations&lt;/span&gt;     &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;agent_risk_readonly&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;marts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mart_margin_status&lt;/span&gt;    &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;agent_risk_readonly&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;marts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mart_user_risk_profile&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;agent_risk_readonly&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Compliance sees: user transactions, KYC, audit-relevant tables (highest scope)&lt;/span&gt;
&lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;marts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mart_users&lt;/span&gt;            &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;agent_compliance_readonly&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;marts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mart_wallets&lt;/span&gt;          &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;agent_compliance_readonly&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;marts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mart_kyc&lt;/span&gt;              &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;agent_compliance_readonly&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;marts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mart_user_full_activity&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;agent_compliance_readonly&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Finance sees: fees, revenue&lt;/span&gt;
&lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;marts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mart_fee_revenue&lt;/span&gt;      &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;agent_finance_readonly&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;marts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mart_pnl_daily&lt;/span&gt;        &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;agent_finance_readonly&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;marts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mart_volume_daily&lt;/span&gt;     &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;agent_finance_readonly&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- General SQL fallback gets read on all marts (NOT raw or staging)&lt;/span&gt;
&lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;marts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;                     &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;agent_general_readonly&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- All specialists get the same execution profile (timeouts, memory caps from §15)&lt;/span&gt;
&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;USER&lt;/span&gt; &lt;span class="n"&gt;risk_manager_agent&lt;/span&gt;      &lt;span class="n"&gt;SETTINGS&lt;/span&gt; &lt;span class="n"&gt;PROFILE&lt;/span&gt; &lt;span class="s1"&gt;'agent_readonly_guardrails'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;USER&lt;/span&gt; &lt;span class="n"&gt;trading_ops_agent&lt;/span&gt;       &lt;span class="n"&gt;SETTINGS&lt;/span&gt; &lt;span class="n"&gt;PROFILE&lt;/span&gt; &lt;span class="s1"&gt;'agent_readonly_guardrails'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="c1"&gt;-- ... etc.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the Risk Manager's system prompt has a bug and the LLM tries to query &lt;code&gt;marts.mart_kyc&lt;/code&gt;, &lt;strong&gt;the database rejects it.&lt;/strong&gt; The glossary convention is now backed by hard access control. This is the difference between a prompt-engineering trick and a production guardrail.&lt;/p&gt;

&lt;h4&gt;
  
  
  Worked Example — Same Question, General Agent vs Specialist
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;USER: "Which users have margin above 90% right now?"

GENERAL AGENT (no specialist routing):
  - Reasons through 47 available tools, picks 4 candidates
  - Considers whether to use mart_users, mart_positions, or mart_user_risk_profile
  - Generates SQL joining 3 tables
  - Total tokens: ~1,400 input + 380 output
  - Latency: 4.2s
  - Result: correct, but did unnecessary join on mart_users

RISK MANAGER SPECIALIST:
  - Sees only 6 tools (none from other domains)
  - Glossary already maps "margin above 90%" → margin_status.utilization_pct &amp;gt; 0.9
  - Calls margin_status tool directly with threshold=0.9
  - Total tokens: ~600 input + 120 output
  - Latency: 1.8s
  - Result: correct, no extra join, lower memory query
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same correct answer, ~60% less latency, ~70% fewer tokens, simpler downstream query. Multiplied across thousands of daily queries, the cost difference is meaningful.&lt;/p&gt;

&lt;h4&gt;
  
  
  How Specialists Evolve Over Time
&lt;/h4&gt;

&lt;p&gt;A specialist's quality is determined by its glossary, not its model. The operational loop:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Daily&lt;/strong&gt; — review Langfuse traces tagged by specialist; surface the failed questions per specialist&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Weekly&lt;/strong&gt; — the team owning that specialist (Risk team, Finance team, etc.) reviews the failures and either expands the glossary or flags missing tools&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Monthly&lt;/strong&gt; — review whether any specialist is too narrow (high "I can't answer that, try X" rate) or too broad (drifting into another specialist's domain)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;When a new specialist is needed (e.g., "Treasury / Inventory" emerges as a recurring question category that doesn't fit Finance), the process is:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;New &lt;code&gt;treasury.yaml&lt;/code&gt; specialist file&lt;/li&gt;
&lt;li&gt;New ClickHouse role + grants&lt;/li&gt;
&lt;li&gt;New glossary scope file&lt;/li&gt;
&lt;li&gt;Updated triage agent prompt to know about the new specialist&lt;/li&gt;
&lt;li&gt;Roll out behind a feature flag, observe Langfuse for 2 weeks, promote to default routing once accuracy &amp;gt;70%&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Specialists don't replace the general agent; they wrap it. The general agent remains the fallback for cross-domain or novel questions. Over time, recurring "general agent" question patterns get promoted into new or existing specialists — that's how the system gets sharper.&lt;/p&gt;

&lt;h3&gt;
  
  
  Supporting Services — Why Each Exists
&lt;/h3&gt;

&lt;p&gt;LibreChat is not a standalone service — it depends on four supporting components:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;MongoDB — Conversation History Store:&lt;/strong&gt;&lt;br&gt;
LibreChat is stateless by design. MongoDB stores everything that needs to persist: all conversation threads, message history, user accounts, bookmarks, and presets. Without MongoDB, every browser refresh loses the conversation. MongoDB is chosen because LibreChat's data model is document-oriented (conversations are nested JSON objects with variable structure), not relational.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Meilisearch — Conversation Search Index:&lt;/strong&gt;&lt;br&gt;
Meilisearch provides full-text search over conversation history. When a user types in the LibreChat search bar to find a past query ("find my liquidation analysis from last week"), Meilisearch returns results in milliseconds. It maintains a search index that mirrors the MongoDB conversations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;pgvector + RAG API — Document Query:&lt;/strong&gt;&lt;br&gt;
LibreChat supports document uploads (PDFs, CSVs, text files). The RAG API processes uploaded files: it chunks the text, generates embeddings using an embedding model, and stores them in pgvector (PostgreSQL with vector extension). When a user asks a question while a document is attached, the RAG API retrieves the most relevant chunks from pgvector and injects them into the LLM prompt — enabling "ask questions about this document" without fine-tuning.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Qwen Connection — OpenAI-Compatible API:&lt;/strong&gt;&lt;br&gt;
LibreChat connects to Qwen via the OpenAI-compatible API that vLLM exposes. vLLM serves Qwen 2.5 72B with an endpoint that speaks the same protocol as OpenAI's API (&lt;code&gt;/v1/chat/completions&lt;/code&gt;). From LibreChat's perspective, Qwen is just another OpenAI-compatible provider — no custom integration needed. Switching to a different model (e.g., Llama, DeepSeek) only requires changing the endpoint URL.&lt;/p&gt;
&lt;h3&gt;
  
  
  Live Demo
&lt;/h3&gt;

&lt;p&gt;ClickHouse operates a public demo at &lt;strong&gt;&lt;a href="https://llm.clickhouse.com" rel="noopener noreferrer"&gt;https://llm.clickhouse.com&lt;/a&gt;&lt;/strong&gt; (AgentHouse) showing this exact flow on 37 public datasets. Our deployment is the same architecture with the crypto exchange private data.&lt;/p&gt;


&lt;h2&gt;
  
  
  13. Observability — Langfuse
&lt;/h2&gt;
&lt;h3&gt;
  
  
  What It Is
&lt;/h3&gt;

&lt;p&gt;Langfuse (acquired by ClickHouse, January 2026) is an LLM observability platform. It tracks every interaction between users and the AI agent.&lt;/p&gt;
&lt;h3&gt;
  
  
  What It Tracks
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Why It Matters&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Every question asked&lt;/td&gt;
&lt;td&gt;Know what teams need&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SQL generated&lt;/td&gt;
&lt;td&gt;Debug wrong answers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Latency per query&lt;/td&gt;
&lt;td&gt;Monitor performance&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Token usage&lt;/td&gt;
&lt;td&gt;Track compute cost&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Accuracy (correct/incorrect)&lt;/td&gt;
&lt;td&gt;Measure improvement&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;User identity&lt;/td&gt;
&lt;td&gt;Compliance audit trail&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;h3&gt;
  
  
  Why This Matters
&lt;/h3&gt;

&lt;p&gt;Without Langfuse, the AI agent is a black box. With it:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;We know when answers are wrong and can fix the glossary&lt;/li&gt;
&lt;li&gt;We measure accuracy improvement over time (target: 70% → 85%)&lt;/li&gt;
&lt;li&gt;Compliance can audit every question and data access&lt;/li&gt;
&lt;li&gt;We track cost per team/query type&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  How Traces Actually Flow (LibreChat → Langfuse → ClickHouse)
&lt;/h3&gt;

&lt;p&gt;Langfuse has a built-in integration in LibreChat — no code changes or custom instrumentation needed. LibreChat automatically sends every LLM interaction to Langfuse via three environment variables (&lt;code&gt;LANGFUSE_PUBLIC_KEY&lt;/code&gt;, &lt;code&gt;LANGFUSE_SECRET_KEY&lt;/code&gt;, &lt;code&gt;LANGFUSE_BASE_URL&lt;/code&gt;). Once set, every prompt and response is traced automatically.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Async trace pipeline — why Redis is needed:&lt;/strong&gt;&lt;br&gt;
LibreChat does not write traces to ClickHouse directly. It writes trace events to a Redis queue. A separate Langfuse worker process reads from the Redis queue and writes the processed traces to ClickHouse. This decouples trace ingestion from query execution — a slow Langfuse write never adds latency to the user's chat response.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why Langfuse uses ClickHouse as its own backend:&lt;/strong&gt;&lt;br&gt;
Langfuse stores trace events (prompts, responses, latency, token counts) in ClickHouse because traces are time-series append-only data — exactly the workload ClickHouse is optimized for. The Langfuse analytics dashboard (cost per team, accuracy trends, slow queries) runs SQL aggregations over these traces in milliseconds. Metadata (projects, users, API keys) is stored in PostgreSQL. Binary assets (uploaded files, large trace payloads) are stored in MinIO (S3-compatible object storage).&lt;/p&gt;
&lt;h3&gt;
  
  
  The Shared ClickHouse Instance
&lt;/h3&gt;

&lt;p&gt;The same ClickHouse instance serves two roles simultaneously:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Role&lt;/th&gt;
&lt;th&gt;Database&lt;/th&gt;
&lt;th&gt;What It Stores&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Agent data warehouse&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;marts&lt;/code&gt;, &lt;code&gt;staging&lt;/code&gt;, &lt;code&gt;raw&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;business data — trades, wallets, users, risk&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Langfuse event store&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;langfuse&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;LLM traces — prompts, SQL, latency, token counts&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This is intentional. It means you can write SQL that crosses both roles — for example: "which teams generated the most incorrect SQL this week, and what business tables were they querying?" — joining Langfuse trace data with business context in a single query.&lt;/p&gt;


&lt;h2&gt;
  
  
  14. Medallion Architecture — Data Organization
&lt;/h2&gt;

&lt;p&gt;ClickHouse recommends a three-layer data organization for AI agents (from their setup guide):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;+------------------+     +------------------+     +------------------+
|   RAW DATABASE   | --&amp;gt; | STAGING DATABASE | --&amp;gt; |  MARTS DATABASE  |
|                  |     |                  |     |                  |
| Untransformed    |     | Deduplicated     |     | Curated tables   |
| CDC events from  |     | Normalized       |     | Aggregated views |
| all PostgreSQL   |     | Cleaned          |     | AI agent queries |
| databases        |     |                  |     | THIS layer only  |
+------------------+     +------------------+     +------------------+
   (ingestion)            (transformation)           (consumption)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Why Three Layers
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;th&gt;Who Accesses&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Raw&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Unmodified CDC events, full history&lt;/td&gt;
&lt;td&gt;Data engineers only&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Staging&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Deduplicated, normalized, cleaned&lt;/td&gt;
&lt;td&gt;Data engineers, advanced analysts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Marts&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Curated, aggregated, business-ready&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;AI agent and all users&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The AI agent &lt;strong&gt;only accesses the Marts layer&lt;/strong&gt; — curated, pre-validated tables with clear column names and business meaning. This reduces LLM errors and prevents access to raw/sensitive data.&lt;/p&gt;

&lt;h3&gt;
  
  
  Marts Tables (What the AI Agent Sees)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Marts Table&lt;/th&gt;
&lt;th&gt;Source&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;mart_trades_futures&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;trade_future (staging)&lt;/td&gt;
&lt;td&gt;Clean futures trades, no internal accounts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;mart_trades_spot&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;trade_spot (staging)&lt;/td&gt;
&lt;td&gt;Clean spot trades&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;mart_orders&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;orders (staging)&lt;/td&gt;
&lt;td&gt;Order history with status&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;mart_wallets&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;wallet_transactions (staging)&lt;/td&gt;
&lt;td&gt;Deposits, withdrawals, transfers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;mart_users&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;users + kyc (staging)&lt;/td&gt;
&lt;td&gt;User profiles (PII masked)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;mart_positions&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;positions (staging)&lt;/td&gt;
&lt;td&gt;Current open positions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;mart_volume_daily&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Materialized view&lt;/td&gt;
&lt;td&gt;Daily volume by market&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;mart_pnl_daily&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Materialized view&lt;/td&gt;
&lt;td&gt;Daily P&amp;amp;L by user/market&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;mart_liquidations&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Materialized view&lt;/td&gt;
&lt;td&gt;Liquidation events and metrics&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;mart_candlesticks&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Materialized view&lt;/td&gt;
&lt;td&gt;OHLCV candles (1m/1h/1d)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;mart_fee_revenue&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Materialized view&lt;/td&gt;
&lt;td&gt;Fee revenue by market/tier&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  15. Security and Governance
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Data Access Controls
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Control&lt;/th&gt;
&lt;th&gt;Implementation&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Authentication&lt;/td&gt;
&lt;td&gt;SSO integration (existing corporate identity)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Authorization&lt;/td&gt;
&lt;td&gt;Role-based — teams see only permitted mart tables&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Query restrictions&lt;/td&gt;
&lt;td&gt;MCP server enforces read-only (no mutations)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AI agent access&lt;/td&gt;
&lt;td&gt;Marts layer only (no raw or staging data)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PII protection&lt;/td&gt;
&lt;td&gt;User PII masked in marts tables&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Audit trail&lt;/td&gt;
&lt;td&gt;Every query logged in Langfuse with user identity&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Network&lt;/td&gt;
&lt;td&gt;ClickHouse in private VPC, no public access&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  LLM Role Resource Guardrails
&lt;/h3&gt;

&lt;p&gt;The LLM database user must have hard resource limits to prevent runaway agent queries from affecting the cluster. ClickHouse's own agentic analytics setup guide prescribes these exact values:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Guardrail&lt;/th&gt;
&lt;th&gt;Limit&lt;/th&gt;
&lt;th&gt;Why&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Query timeout&lt;/td&gt;
&lt;td&gt;30 seconds&lt;/td&gt;
&lt;td&gt;Agent queries that scan too much will abort cleanly&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Memory per query&lt;/td&gt;
&lt;td&gt;2 GB&lt;/td&gt;
&lt;td&gt;Prevents a single agent query from exhausting node memory&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Max rows scanned&lt;/td&gt;
&lt;td&gt;100 million&lt;/td&gt;
&lt;td&gt;Forces the LLM to use indexes and partitions — catches unfiltered full scans&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Max bytes scanned&lt;/td&gt;
&lt;td&gt;5 GB&lt;/td&gt;
&lt;td&gt;Secondary byte-level cap alongside row limit&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Max threads&lt;/td&gt;
&lt;td&gt;4 CPUs&lt;/td&gt;
&lt;td&gt;Limits CPU contention with other workloads&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Access type&lt;/td&gt;
&lt;td&gt;SELECT-only on &lt;code&gt;marts.*&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;No mutations, no access to raw or staging layers&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The LLM role gets SELECT access &lt;strong&gt;only on the marts schema&lt;/strong&gt; — never on raw or staging. This is a hard access control, not just a glossary convention. If the LLM generates a query against a staging table, the database rejects it.&lt;/p&gt;

&lt;h3&gt;
  
  
  LLM Data Safety
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Self-hosted Qwen 2.5&lt;/strong&gt; — no data sent to external APIs&lt;/li&gt;
&lt;li&gt;Business glossary contains schema mappings, not actual data&lt;/li&gt;
&lt;li&gt;LLM sees query results only — no direct database access&lt;/li&gt;
&lt;li&gt;All interactions logged and auditable via Langfuse&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Compliance
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Data residency: ClickHouse deployed in the same region as production OLTP databases&lt;/li&gt;
&lt;li&gt;Retention: Configurable TTL per table&lt;/li&gt;
&lt;li&gt;Right to erasure: CDC propagates DELETE events from PostgreSQL&lt;/li&gt;
&lt;li&gt;Access logs: Full audit trail via Langfuse&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  16. High Availability Architecture
&lt;/h2&gt;

&lt;h3&gt;
  
  
  ClickHouse Cluster (Production Region)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;                    +-------------------+
                    |   Load Balancer   |
                    +---+----------+----+
                        |          |
               +--------v--+  +---v--------+
               | CH Node 1 |  | CH Node 2  |
               | (Shard 1  |  | (Shard 1   |
               |  Replica 1)|  |  Replica 2)|
               +------------+  +------------+

               ClickHouse Keeper (3 nodes for consensus)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Deployment Options
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Option&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;th&gt;Est. Monthly Cost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;ClickHouse Cloud&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Fully managed, auto-scaling&lt;/td&gt;
&lt;td&gt;~$2,000-5,000/mo&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Self-hosted on EC2&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;2x m7g.2xlarge (Graviton3)&lt;/td&gt;
&lt;td&gt;~$1,500-3,000/mo&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Recommendation:&lt;/strong&gt; Start with ClickHouse Cloud for faster deployment. Migrate to self-hosted if cost optimization is needed later.&lt;/p&gt;




&lt;h2&gt;
  
  
  17. Implementation Roadmap
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Phase 1 — Data Foundation (Weeks 1-3)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Task&lt;/th&gt;
&lt;th&gt;Details&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Deploy ClickHouse&lt;/td&gt;
&lt;td&gt;ClickHouse Cloud or 2-node self-hosted cluster&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Identify all source tables&lt;/td&gt;
&lt;td&gt;Catalog tables from all PostgreSQL databases&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Set up CDC pipelines&lt;/td&gt;
&lt;td&gt;Debezium + Kafka for each source database&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Design medallion schema&lt;/td&gt;
&lt;td&gt;Raw → Staging → Marts table definitions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Initial data load&lt;/td&gt;
&lt;td&gt;Backfill historical data from all PostgreSQL databases&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Validation&lt;/td&gt;
&lt;td&gt;Row counts and checksum verification&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Phase 2 — Marts + LLM (Weeks 4-6)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Task&lt;/th&gt;
&lt;th&gt;Details&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Build marts tables&lt;/td&gt;
&lt;td&gt;Curated, aggregated, AI-ready tables&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Materialized views&lt;/td&gt;
&lt;td&gt;Volume, P&amp;amp;L, liquidations, candlesticks, revenue&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Deploy Qwen 2.5 72B&lt;/td&gt;
&lt;td&gt;Self-hosted on GPU instance with vLLM&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MCP server setup&lt;/td&gt;
&lt;td&gt;Connect ClickHouse MCP to Qwen&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Business glossary v1&lt;/td&gt;
&lt;td&gt;50-100 domain-specific terms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Test SQL generation&lt;/td&gt;
&lt;td&gt;Validate against known queries&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Phase 3 — Chat UI + Observability (Weeks 7-9)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Task&lt;/th&gt;
&lt;th&gt;Details&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Deploy LibreChat&lt;/td&gt;
&lt;td&gt;Self-hosted, SSO integration&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Connect to Qwen backend&lt;/td&gt;
&lt;td&gt;MCP plugin configuration&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Langfuse setup&lt;/td&gt;
&lt;td&gt;LLM tracing and accuracy monitoring&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Role-based access&lt;/td&gt;
&lt;td&gt;Team-specific table permissions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Beta launch&lt;/td&gt;
&lt;td&gt;Trading Ops team as first users&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Feedback loop&lt;/td&gt;
&lt;td&gt;Review Langfuse failures, expand glossary&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Phase 4 — Production Hardening (Weeks 10-12)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Task&lt;/th&gt;
&lt;th&gt;Details&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Glossary expansion&lt;/td&gt;
&lt;td&gt;Target 200+ terms based on real usage&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Accuracy tuning&lt;/td&gt;
&lt;td&gt;Fix common failure patterns from Langfuse&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Performance tuning&lt;/td&gt;
&lt;td&gt;Query caching, materialized view optimization&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Security audit&lt;/td&gt;
&lt;td&gt;Access review, penetration testing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GA launch&lt;/td&gt;
&lt;td&gt;Roll out to all teams&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Phase 5 — Advanced (Post Week 12)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Scheduled reports (daily P&amp;amp;L email, weekly risk summary)&lt;/li&gt;
&lt;li&gt;Alert-driven queries (agent triggered by anomaly detection)&lt;/li&gt;
&lt;li&gt;Multi-step analysis (agent chains multiple queries)&lt;/li&gt;
&lt;li&gt;Fine-tune Qwen on the crypto exchange query patterns for higher accuracy&lt;/li&gt;
&lt;li&gt;Add more source databases as needed&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  18. Cost Analysis
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Monthly Infrastructure Costs
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Est. Monthly Cost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;ClickHouse Cloud&lt;/td&gt;
&lt;td&gt;$3,000-5,000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Kafka/CDC (Amazon MSK)&lt;/td&gt;
&lt;td&gt;$500&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPU for Qwen 2.5 (g5.12xlarge)&lt;/td&gt;
&lt;td&gt;$7,000 (1-yr reserved) / $11,700 (on-demand)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LibreChat (t3.medium)&lt;/td&gt;
&lt;td&gt;$100&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Langfuse&lt;/td&gt;
&lt;td&gt;Included in ClickHouse&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;~$11,000-14,000&lt;/strong&gt; (reserved) / &lt;strong&gt;~$15,000-19,000&lt;/strong&gt; (on-demand)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Cost Optimization Path
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Optimization&lt;/th&gt;
&lt;th&gt;Savings&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Self-hosted ClickHouse instead of Cloud&lt;/td&gt;
&lt;td&gt;-$1,500-3,000/mo&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Qwen 2.5 32B on smaller GPU&lt;/td&gt;
&lt;td&gt;-$2,000/mo&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Reserved instances (1-year)&lt;/td&gt;
&lt;td&gt;-30% on GPU&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Spot instances for dev/test&lt;/td&gt;
&lt;td&gt;-60%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  ROI Justification
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Benefit&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Engineering time saved (ad-hoc queries)&lt;/td&gt;
&lt;td&gt;20-40 hours/month&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Faster incident response&lt;/td&gt;
&lt;td&gt;50% reduction in MTTR&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Production DB load reduction&lt;/td&gt;
&lt;td&gt;Fewer analytical queries on live trading DB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Self-service analytics&lt;/td&gt;
&lt;td&gt;Teams don't need SQL knowledge&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cross-database insights&lt;/td&gt;
&lt;td&gt;Previously impossible queries now instant&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Compliance efficiency&lt;/td&gt;
&lt;td&gt;Instant audit queries vs manual extraction&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  19. Risk Assessment
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Risk&lt;/th&gt;
&lt;th&gt;Likelihood&lt;/th&gt;
&lt;th&gt;Impact&lt;/th&gt;
&lt;th&gt;Mitigation&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;LLM generates incorrect SQL&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;Business glossary, Langfuse monitoring, human review&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CDC lag (stale data)&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;Monitor Kafka consumer lag, alert if &amp;gt;5 min&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ClickHouse cluster failure&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;2-replica HA, automated failover, daily backups&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPU instance unavailable&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;Reserved capacity, fallback to smaller model&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Low user adoption&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;Start with power users, iterate on feedback&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Schema drift (PG changes not in CH)&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;Automated schema sync in CDC pipeline&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Wrong answers not caught&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;Langfuse accuracy tracking, glossary reviews&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PostgreSQL replication slot bloat&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;If Debezium falls behind, PG WAL accumulates indefinitely and can crash production DB — monitor slot lag, set &lt;code&gt;max_slot_wal_keep_size&lt;/code&gt;, alert if lag &amp;gt; 10GB&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  20. Success Metrics
&lt;/h2&gt;

&lt;h3&gt;
  
  
  90-Day Targets
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Target&lt;/th&gt;
&lt;th&gt;Measurement&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Query accuracy&lt;/td&gt;
&lt;td&gt;&amp;gt;70% correct SQL&lt;/td&gt;
&lt;td&gt;Langfuse evaluation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Active users&lt;/td&gt;
&lt;td&gt;20+ weekly&lt;/td&gt;
&lt;td&gt;LibreChat usage logs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Query volume&lt;/td&gt;
&lt;td&gt;500+ queries/week&lt;/td&gt;
&lt;td&gt;Langfuse trace count&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Response time&lt;/td&gt;
&lt;td&gt;&amp;lt;10 seconds&lt;/td&gt;
&lt;td&gt;Langfuse latency p50&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Source databases consolidated&lt;/td&gt;
&lt;td&gt;4+ databases in ClickHouse&lt;/td&gt;
&lt;td&gt;Table count&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Business glossary&lt;/td&gt;
&lt;td&gt;200+ terms&lt;/td&gt;
&lt;td&gt;Glossary config&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Production DB offload&lt;/td&gt;
&lt;td&gt;50%+ analytical queries moved&lt;/td&gt;
&lt;td&gt;PG query log&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  6-Month Targets
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Target&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Query accuracy&lt;/td&gt;
&lt;td&gt;&amp;gt;85%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Active users&lt;/td&gt;
&lt;td&gt;50+&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Self-service resolution&lt;/td&gt;
&lt;td&gt;70% of questions answered without engineering&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Automated reports&lt;/td&gt;
&lt;td&gt;10+ scheduled reports&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  21. Reference Links
&lt;/h2&gt;

&lt;h3&gt;
  
  
  ClickHouse Agentic Data Stack
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Resource&lt;/th&gt;
&lt;th&gt;URL&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Agentic Data Stack GitHub repo&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/ClickHouse/agentic-data-stack" rel="noopener noreferrer"&gt;https://github.com/ClickHouse/agentic-data-stack&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MCP Server GitHub repo&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/ClickHouse/mcp-clickhouse" rel="noopener noreferrer"&gt;https://github.com/ClickHouse/mcp-clickhouse&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AgentHouse live demo&lt;/td&gt;
&lt;td&gt;&lt;a href="https://llm.clickhouse.com" rel="noopener noreferrer"&gt;https://llm.clickhouse.com&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  ClickHouse Blog Posts
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Post&lt;/th&gt;
&lt;th&gt;URL&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;The Agentic Data Stack&lt;/td&gt;
&lt;td&gt;&lt;a href="https://clickhouse.com/blog/the-agentic-data-stack" rel="noopener noreferrer"&gt;https://clickhouse.com/blog/the-agentic-data-stack&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;How we made our data warehouse AI-first (DWAINE)&lt;/td&gt;
&lt;td&gt;&lt;a href="https://clickhouse.com/blog/ai-first-data-warehouse" rel="noopener noreferrer"&gt;https://clickhouse.com/blog/ai-first-data-warehouse&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;How to set up ClickHouse for agentic analytics&lt;/td&gt;
&lt;td&gt;&lt;a href="https://clickhouse.com/blog/how-to-set-up-clickhouse-for-agentic-analytics" rel="noopener noreferrer"&gt;https://clickhouse.com/blog/how-to-set-up-clickhouse-for-agentic-analytics&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Building a data platform for agents&lt;/td&gt;
&lt;td&gt;&lt;a href="https://clickhouse.com/blog/building-a-data-platform-for-agents" rel="noopener noreferrer"&gt;https://clickhouse.com/blog/building-a-data-platform-for-agents&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Integrating with ClickHouse MCP (5 frameworks)&lt;/td&gt;
&lt;td&gt;&lt;a href="https://clickhouse.com/blog/integrating-clickhouse-mcp" rel="noopener noreferrer"&gt;https://clickhouse.com/blog/integrating-clickhouse-mcp&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;12 agent framework comparison&lt;/td&gt;
&lt;td&gt;&lt;a href="https://clickhouse.com/blog/how-to-build-ai-agents-mcp-12-frameworks" rel="noopener noreferrer"&gt;https://clickhouse.com/blog/how-to-build-ai-agents-mcp-12-frameworks&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Introducing AgentHouse&lt;/td&gt;
&lt;td&gt;&lt;a href="https://clickhouse.com/blog/agenthouse-demo-clickhouse-llm-mcp" rel="noopener noreferrer"&gt;https://clickhouse.com/blog/agenthouse-demo-clickhouse-llm-mcp&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Ask AI agent &amp;amp; Remote MCP beta&lt;/td&gt;
&lt;td&gt;&lt;a href="https://clickhouse.com/blog/agentic-analytics-ask-ai-agent-and-remote-mcp-server-beta-launch" rel="noopener noreferrer"&gt;https://clickhouse.com/blog/agentic-analytics-ask-ai-agent-and-remote-mcp-server-beta-launch&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ClickHouse acquires LibreChat&lt;/td&gt;
&lt;td&gt;&lt;a href="https://clickhouse.com/blog/clickhouse-acquires-librechat" rel="noopener noreferrer"&gt;https://clickhouse.com/blog/clickhouse-acquires-librechat&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LibreChat + Agentic Data Stack&lt;/td&gt;
&lt;td&gt;&lt;a href="https://clickhouse.com/blog/librechat-open-source-agentic-data-stack" rel="noopener noreferrer"&gt;https://clickhouse.com/blog/librechat-open-source-agentic-data-stack&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ClickHouse acquires Langfuse&lt;/td&gt;
&lt;td&gt;&lt;a href="https://clickhouse.com/blog/clickhouse-acquires-langfuse-open-source-llm-observability" rel="noopener noreferrer"&gt;https://clickhouse.com/blog/clickhouse-acquires-langfuse-open-source-llm-observability&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Langfuse scaling with ClickHouse&lt;/td&gt;
&lt;td&gt;&lt;a href="https://clickhouse.com/blog/langfuse-llm-analytics" rel="noopener noreferrer"&gt;https://clickhouse.com/blog/langfuse-llm-analytics&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Agent-facing analytics&lt;/td&gt;
&lt;td&gt;&lt;a href="https://clickhouse.com/blog/agent-facing-analytics" rel="noopener noreferrer"&gt;https://clickhouse.com/blog/agent-facing-analytics&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LLM chat UIs that support MCP&lt;/td&gt;
&lt;td&gt;&lt;a href="https://clickhouse.com/blog/llm-chat-mcp-support" rel="noopener noreferrer"&gt;https://clickhouse.com/blog/llm-chat-mcp-support&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Wix Wild Moose — AI incident response on ClickHouse&lt;/td&gt;
&lt;td&gt;&lt;a href="https://clickhouse.com/blog/wix-wild-moose" rel="noopener noreferrer"&gt;https://clickhouse.com/blog/wix-wild-moose&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Building an agentic coding app (retail demo)&lt;/td&gt;
&lt;td&gt;&lt;a href="https://clickhouse.com/blog/agentic-coding-app" rel="noopener noreferrer"&gt;https://clickhouse.com/blog/agentic-coding-app&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Agentic coding at scale — CTO perspective&lt;/td&gt;
&lt;td&gt;&lt;a href="https://clickhouse.com/blog/agentic-coding" rel="noopener noreferrer"&gt;https://clickhouse.com/blog/agentic-coding&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ClickHouse AI Policy (public)&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/ClickHouse/ClickHouse/blob/master/AI_POLICY.md" rel="noopener noreferrer"&gt;https://github.com/ClickHouse/ClickHouse/blob/master/AI_POLICY.md&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  ClickHouse Documentation
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Doc&lt;/th&gt;
&lt;th&gt;URL&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;MCP setup guide&lt;/td&gt;
&lt;td&gt;&lt;a href="https://clickhouse.com/docs/use-cases/AI/MCP" rel="noopener noreferrer"&gt;https://clickhouse.com/docs/use-cases/AI/MCP&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Langfuse integration&lt;/td&gt;
&lt;td&gt;&lt;a href="https://clickhouse.com/docs/cloud/features/ai-ml/langfuse" rel="noopener noreferrer"&gt;https://clickhouse.com/docs/cloud/features/ai-ml/langfuse&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Appendix A — Technology Stack Summary
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Technology&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Analytical DB&lt;/td&gt;
&lt;td&gt;ClickHouse&lt;/td&gt;
&lt;td&gt;All the crypto exchange data consolidated, columnar storage, fast aggregations&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CDC&lt;/td&gt;
&lt;td&gt;Debezium + Kafka&lt;/td&gt;
&lt;td&gt;Real-time sync from all PostgreSQL databases&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LLM&lt;/td&gt;
&lt;td&gt;Qwen 2.5 72B (self-hosted)&lt;/td&gt;
&lt;td&gt;Natural language to SQL generation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MCP Server&lt;/td&gt;
&lt;td&gt;ClickHouse MCP&lt;/td&gt;
&lt;td&gt;Schema discovery + safe SQL execution&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Chat UI&lt;/td&gt;
&lt;td&gt;LibreChat (self-hosted)&lt;/td&gt;
&lt;td&gt;Web-based chat interface for all teams&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Observability&lt;/td&gt;
&lt;td&gt;Langfuse&lt;/td&gt;
&lt;td&gt;LLM tracing, accuracy tracking, cost monitoring&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Source of Truth&lt;/td&gt;
&lt;td&gt;PostgreSQL databases&lt;/td&gt;
&lt;td&gt;Production OLTP (unchanged)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Appendix B — DWAINE Reference (ClickHouse Internal Agent)
&lt;/h2&gt;

&lt;p&gt;ClickHouse built DWAINE (Data Warehouse AI Natural Expert) for their own internal use:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Users&lt;/td&gt;
&lt;td&gt;250+ employees&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Daily messages&lt;/td&gt;
&lt;td&gt;200+&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Daily conversations&lt;/td&gt;
&lt;td&gt;50-70&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Internal analytics use cases covered&lt;/td&gt;
&lt;td&gt;~70%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DWH team workload reduction&lt;/td&gt;
&lt;td&gt;50-70%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Stack&lt;/td&gt;
&lt;td&gt;LibreChat + Claude (Bedrock) + ClickHouse MCP&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Key learning&lt;/td&gt;
&lt;td&gt;Business glossary is the #1 factor for accuracy&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Source&lt;/td&gt;
&lt;td&gt;&lt;a href="https://clickhouse.com/blog/ai-first-data-warehouse" rel="noopener noreferrer"&gt;ClickHouse — How we made our data warehouse AI-first&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Our platform follows the same architecture, with Qwen replacing Claude for data privacy.&lt;/p&gt;




&lt;h2&gt;
  
  
  Closing thoughts — the parts that actually matter
&lt;/h2&gt;

&lt;p&gt;After shipping this platform, the things that determine success are not the parts you'd expect:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The business glossary is everything.&lt;/strong&gt; ClickHouse's internal post-mortem on DWAINE concluded the same: glossary quality is the #1 driver of query accuracy. You don't fine-tune the LLM — you write a YAML file that maps domain terms to SQL fragments, and the LLM uses it via in-context learning. Budget more time for the glossary than for the LLM.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The Marts layer is non-negotiable.&lt;/strong&gt; Exposing the LLM to staging or raw data is how you get hallucinated SQL. Marts are pre-joined, denormalized, semantically clean, and explicitly access-controlled. The LLM should only ever see Marts.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;JOIN avoidance dominates query performance.&lt;/strong&gt; The 3-strategy approach — pre-joined marts, dictionaries (&lt;code&gt;dictGet&lt;/code&gt;), UNION ALL — handles ~90% of cross-table questions without a single runtime JOIN. The other 10% gets ClickHouse's auto-algorithm fallback. This was the single biggest architectural decision.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Langfuse observability is what makes the platform improvable over time.&lt;/strong&gt; Without per-query traces, accuracy scores, and cost-per-question, you have no idea where to invest. With it, you can see exactly which question patterns fail and feed that back into the glossary.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The PostgreSQL replication slot bloat risk is the one most teams forget.&lt;/strong&gt; If Debezium falls behind, PostgreSQL accumulates WAL indefinitely and can crash production. Set &lt;code&gt;max_slot_wal_keep_size&lt;/code&gt; and alert on slot lag. Don't learn this the hard way.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Security guardrails matter more than people think.&lt;/strong&gt; A 30-second query timeout, 2GB memory cap, 100M-row scan limit, and SELECT-only access on the Marts schema together mean the LLM cannot accidentally take down the cluster. ClickHouse's official agentic-analytics setup guide prescribes these exact values for a reason.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The full implementation — including the ClickHouse AI DBA MCP server (152 tools for cluster operations) referenced in this guide, knowledge base with 47 production-tested rules, and 16 recovery runbooks — lives in the &lt;a href="https://github.com/rakeshtherani/clickhouse-ai-dba" rel="noopener noreferrer"&gt;companion repo on GitHub&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If you've shipped something similar (or are building it now), I'd love to compare notes. Drop a comment below or reach out on LinkedIn.&lt;/p&gt;

&lt;p&gt;The companion piece — &lt;em&gt;Replacing Elasticsearch with ClickHouse for OTEL Logs, Traces &amp;amp; Metrics: A 90% Cost-Reduction Migration&lt;/em&gt; — covers the observability layer of this same architecture in more depth.&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>architecture</category>
      <category>database</category>
    </item>
    <item>
      <title>Replacing Elasticsearch with ClickHouse : A 90% Cost-Reduction Migration</title>
      <dc:creator>RAKESH THERANI</dc:creator>
      <pubDate>Wed, 06 May 2026 04:20:01 +0000</pubDate>
      <link>https://forem.com/rakeshtherani/replacing-elasticsearch-with-clickhouse-for-otel-logs-traces-metrics-a-90-cost-reduction-28c</link>
      <guid>https://forem.com/rakeshtherani/replacing-elasticsearch-with-clickhouse-for-otel-logs-traces-metrics-a-90-cost-reduction-28c</guid>
      <description>&lt;p&gt;&lt;em&gt;A practical guide based on shipping this for a crypto-derivatives platform — annual observability bill went from high six figures to ~$50K, with faster queries and AI-powered log search as a bonus.&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Why I'm writing this
&lt;/h2&gt;

&lt;p&gt;If you're paying mid-six figures for Elasticsearch and ~90% of your queries are aggregations (error counts, latency percentiles, service health), you're paying full price for a feature you barely use — full-text search.&lt;/p&gt;

&lt;p&gt;This post walks through how a crypto exchange replaced Elasticsearch with ClickHouse for OpenTelemetry logs, traces, and metrics. Same OTEL instrumentation, just a different backend. Result: 5× smaller storage footprint, 2-6× faster queries on benchmarks, and natural-language log queries via an AI agent — at ~10% of the cost.&lt;/p&gt;

&lt;p&gt;If you have an existing Elasticsearch + Kibana observability stack and you've been wondering whether ClickHouse is a serious alternative, this is the deep-dive. Includes the schema, the migration plan, the OTEL Collector configuration, the cost numbers, and the gotchas.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note on numbers&lt;/strong&gt;: The cost figures below ($400K Elasticsearch → ~$50K ClickHouse) are real annual numbers from this deployment, on log volumes typical of a high-traffic trading platform (low-tens of TB ingested per year, 90-day hot retention, multi-region). Your mileage will vary substantially with log volume, retention, and cluster size. The architectural patterns are universal.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Table of Contents
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Executive Summary&lt;/li&gt;
&lt;li&gt;The Problem — Elasticsearch at $400K/Year&lt;/li&gt;
&lt;li&gt;Why ClickHouse for Observability&lt;/li&gt;
&lt;li&gt;Platform Architecture&lt;/li&gt;
&lt;li&gt;
OpenTelemetry — The Standard We Keep
5b. Collection Agent Options — OTEL vs Fluent Bit vs Vector
&lt;/li&gt;
&lt;li&gt;ClickHouse Schema — Logs, Traces, Metrics&lt;/li&gt;
&lt;li&gt;End-to-End Distributed Tracing — HTTP to ClickHouse&lt;/li&gt;
&lt;li&gt;Full-Text Log Search — Replacing Kibana Discover&lt;/li&gt;
&lt;li&gt;Visualization — Grafana Replaces Kibana&lt;/li&gt;
&lt;li&gt;Data Retention &amp;amp; Tiered Storage&lt;/li&gt;
&lt;li&gt;AI Layer — Natural Language Over Logs and Traces&lt;/li&gt;
&lt;li&gt;Migration Plan — Zero Downtime Cutover (Standard OTEL or BindPlane)&lt;/li&gt;
&lt;li&gt;Cost Analysis&lt;/li&gt;
&lt;li&gt;Risk Assessment&lt;/li&gt;
&lt;li&gt;Success Metrics&lt;/li&gt;
&lt;li&gt;Reference Links&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  1. Executive Summary
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Objective
&lt;/h3&gt;

&lt;p&gt;Replace the current Elasticsearch-based observability stack (application logs + OTEL traces + metrics) with ClickHouse — reducing annual infrastructure cost from $400K to ~$35-60K while gaining faster aggregations, better compression, unified storage with business data, and AI-powered log querying.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Problem Today
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Paying $400K/year for Elasticsearch managed service&lt;/li&gt;
&lt;li&gt;Elasticsearch is optimized for full-text search — most log queries are aggregations (error counts, latency percentiles, service health) where ClickHouse is &lt;strong&gt;2-6x faster on cold queries, 1.7-2.6x on hot queries&lt;/strong&gt; (ClickHouse/TextBench benchmark, OTEL logs at 1B–50B rows)&lt;/li&gt;
&lt;li&gt;Logs, traces, metrics, and business data live in separate systems — no cross-correlation&lt;/li&gt;
&lt;li&gt;Storage costs are high: same OTEL dataset takes &lt;strong&gt;5x more space in Elasticsearch&lt;/strong&gt; (49 GB vs 245 GB at 1B rows; 2.4 TB vs 12 TB at 50B rows)&lt;/li&gt;
&lt;li&gt;Kibana is the only query interface — no programmatic access, no AI layer&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The Solution
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Keep OpenTelemetry (OTEL) as the instrumentation standard — zero application changes.
Change only the destination: swap Elasticsearch exporter → ClickHouse exporter in OTEL Collector.

All logs, traces, and metrics land in ClickHouse.
Grafana reads ClickHouse for dashboards and alerts.
AI agent queries logs/traces in plain English via the same LibreChat + MCP platform.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Expected Outcomes
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Target&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Annual cost reduction&lt;/td&gt;
&lt;td&gt;$340-365K saved (~85-90% reduction)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Storage reduction&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;5x smaller&lt;/strong&gt; total footprint (16x on column files) — real benchmark at 1B–50B OTEL rows&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Query speed improvement&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;2-6x faster&lt;/strong&gt; cold queries, &lt;strong&gt;1.7-2.6x&lt;/strong&gt; hot queries (ClickHouse/TextBench)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Retention period&lt;/td&gt;
&lt;td&gt;Same or longer — at lower cost&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Unification&lt;/td&gt;
&lt;td&gt;Logs + traces + metrics + business data in one DB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AI queries over logs&lt;/td&gt;
&lt;td&gt;Plain English → SQL → instant answer&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  2. The Problem — Elasticsearch at $400K/Year
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Where the Cost Comes From
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Cost Driver&lt;/th&gt;
&lt;th&gt;Elasticsearch Behaviour&lt;/th&gt;
&lt;th&gt;Annual Cost Share&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Storage&lt;/td&gt;
&lt;td&gt;Row-oriented index, 2-3x compression, needs SSD&lt;/td&gt;
&lt;td&gt;~$100K&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Compute&lt;/td&gt;
&lt;td&gt;CPU-heavy indexing on every write, inverted index maintenance&lt;/td&gt;
&lt;td&gt;~$150K&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Licensing&lt;/td&gt;
&lt;td&gt;Elastic managed service / Elastic Cloud premium&lt;/td&gt;
&lt;td&gt;~$100K&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Operations&lt;/td&gt;
&lt;td&gt;Shard management, index lifecycle management (ILM), tuning&lt;/td&gt;
&lt;td&gt;~$50K (eng time)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  The Technical Mismatch
&lt;/h3&gt;

&lt;p&gt;Elasticsearch was built for &lt;strong&gt;full-text search&lt;/strong&gt; on documents (web pages, articles). Application logs and OTEL telemetry are &lt;strong&gt;structured time-series data&lt;/strong&gt; — they need aggregations, not document search.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;What teams actually query&lt;/th&gt;
&lt;th&gt;Elasticsearch efficiency&lt;/th&gt;
&lt;th&gt;ClickHouse efficiency&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;"Error count per service last hour"&lt;/td&gt;
&lt;td&gt;Slow (aggregation on inverted index)&lt;/td&gt;
&lt;td&gt;Fast (columnar scan)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"P99 latency for /api/trade endpoint"&lt;/td&gt;
&lt;td&gt;Slow (percentile aggregation)&lt;/td&gt;
&lt;td&gt;Fast (built-in quantile functions)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"Show logs for TraceId = abc123"&lt;/td&gt;
&lt;td&gt;Fast (indexed term lookup)&lt;/td&gt;
&lt;td&gt;Fast (bloom filter)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"Which services degraded after deploy at 14:00?"&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;Fast&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"Free-text: find logs containing OutOfMemoryError"&lt;/td&gt;
&lt;td&gt;Fast (native)&lt;/td&gt;
&lt;td&gt;Good (tokenbf bloom filter)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;~90% of real observability queries are aggregations. Elasticsearch is paying full price for a capability (full-text search) that covers only ~10% of use cases.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Where ClickHouse wins most — benchmark by query type (cold, 50B rows):&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Query Category&lt;/th&gt;
&lt;th&gt;Examples&lt;/th&gt;
&lt;th&gt;ClickHouse speedup&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Log retrieval (text match + fetch rows)&lt;/td&gt;
&lt;td&gt;"Find logs containing OutOfMemoryError"&lt;/td&gt;
&lt;td&gt;Narrowest gap — ES competitive&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Error/match counts&lt;/td&gt;
&lt;td&gt;"Count 500 errors in last hour"&lt;/td&gt;
&lt;td&gt;Moderate advantage&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Service-level breakdowns&lt;/td&gt;
&lt;td&gt;"Group errors by service"&lt;/td&gt;
&lt;td&gt;Large advantage&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Time-series trend analysis&lt;/td&gt;
&lt;td&gt;"Error rate per minute over last 24h"&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Widest gap — 6x+ faster&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The speedup grows with analytical complexity. Retrieval-only queries (find and show matching rows) is ES's home turf. The moment you add grouping, aggregation, or time-bucketing on top of a text match — the dominant pattern in observability — ClickHouse's vectorized engine pulls away decisively.&lt;/p&gt;

&lt;h3&gt;
  
  
  Current Stack Pain Points
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Application
    │
    ▼ OTEL SDK (instrumented)
OTEL Collector
    │
    ▼ Elasticsearch Exporter
Elasticsearch Cluster
    │
    ▼
Kibana                   ← only UI, no programmatic access
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;No way to join log data with business data (e.g., "which users were affected by this error?")&lt;/li&gt;
&lt;li&gt;Kibana dashboards require manual setup — no AI layer&lt;/li&gt;
&lt;li&gt;Retention limited by cost — older logs are deleted or archived to cold storage with no query access&lt;/li&gt;
&lt;li&gt;Every new service that emits logs increases Elasticsearch cost linearly&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  3. Why ClickHouse for Observability
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Companies Already Doing This
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Cloudflare&lt;/strong&gt; — Replaced Elasticsearch with ClickHouse for HTTP request logs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;35-45M HTTP requests/sec across the edge; logging pipeline generates "TBs of data daily"&lt;/li&gt;
&lt;li&gt;Per-row size dropped from ~600 bytes (Elasticsearch) to ~60 bytes (ClickHouse) — &lt;strong&gt;~10× smaller on disk&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Insert-side CPU and memory &lt;strong&gt;8× lower&lt;/strong&gt; than the Elasticsearch pipeline&lt;/li&gt;
&lt;li&gt;Now stores &lt;strong&gt;100% of events&lt;/strong&gt; (vs heavy sampling on Elasticsearch, which was blowing the resource budget at "hundreds of TBs")&lt;/li&gt;
&lt;li&gt;Source: &lt;a href="https://blog.cloudflare.com/log-analytics-using-clickhouse/" rel="noopener noreferrer"&gt;blog.cloudflare.com/log-analytics-using-clickhouse&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Uber&lt;/strong&gt; — Migrated logging from Elasticsearch to ClickHouse:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"Millions of logs per second from thousands of services across regions"; storage in the &lt;strong&gt;multi-petabyte&lt;/strong&gt; range&lt;/li&gt;
&lt;li&gt;A single ClickHouse node ingests &lt;strong&gt;300K logs/sec — ~10× a single Elasticsearch node&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;80%+ of queries are aggregations (terms / histogram / percentile) — exactly ClickHouse's strength&lt;/li&gt;
&lt;li&gt;Source: &lt;a href="https://www.uber.com/blog/logging/" rel="noopener noreferrer"&gt;uber.com/blog/logging&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Contentsquare&lt;/strong&gt; — OTEL traces and logs on ClickHouse (replaced 14× 30-node Elasticsearch clusters):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;11× cheaper&lt;/strong&gt; infrastructure cost&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;10× p99 query speedup&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;Retention extended from &lt;strong&gt;1 → 13 months&lt;/strong&gt; at lower spend&lt;/li&gt;
&lt;li&gt;Source: &lt;a href="https://clickhouse.com/blog/contentsquare-migration-from-elasticsearch-to-clickhouse" rel="noopener noreferrer"&gt;clickhouse.com/blog/contentsquare-migration-from-elasticsearch-to-clickhouse&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;ClickHouse's own benchmark vs Elasticsearch (TextBench)&lt;/strong&gt; — reproducible, identical hardware (AWS m6i.8xlarge):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;At &lt;strong&gt;50B rows&lt;/strong&gt;: Elasticsearch needed &lt;strong&gt;12.01 TiB&lt;/strong&gt;, ClickHouse needed &lt;strong&gt;2.43 TiB&lt;/strong&gt; — &lt;strong&gt;5× less storage&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Cold-cache queries &lt;strong&gt;4-6× faster&lt;/strong&gt;, hot-cache &lt;strong&gt;1.7-2.6× faster&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;ClickHouse ingests 50B rows in &lt;strong&gt;&amp;lt;4 hours&lt;/strong&gt;; Elasticsearch took &lt;strong&gt;~5 days&lt;/strong&gt; after pipeline tuning&lt;/li&gt;
&lt;li&gt;Source: &lt;a href="https://clickhouse.com/blog/elasticsearch-log-analytics-clickhouse" rel="noopener noreferrer"&gt;clickhouse.com/blog/elasticsearch-log-analytics-clickhouse&lt;/a&gt; + &lt;a href="https://github.com/ClickHouse/TextBench" rel="noopener noreferrer"&gt;github.com/ClickHouse/TextBench&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Langfuse&lt;/strong&gt; (now part of ClickHouse) — LLM observability platform that stores every LLM trace (prompts, responses, latency, token cost) in ClickHouse. Same Langfuse used in the Agentic AI Platform described in the companion piece.&lt;/p&gt;

&lt;h3&gt;
  
  
  ClickHouse vs Elasticsearch for Observability
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Capability&lt;/th&gt;
&lt;th&gt;Elasticsearch&lt;/th&gt;
&lt;th&gt;ClickHouse&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Storage model&lt;/td&gt;
&lt;td&gt;Inverted index (document-oriented)&lt;/td&gt;
&lt;td&gt;Columnar (OLAP)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Compression ratio&lt;/td&gt;
&lt;td&gt;2-3x&lt;/td&gt;
&lt;td&gt;10-30x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Aggregation speed&lt;/td&gt;
&lt;td&gt;Seconds (post-indexing)&lt;/td&gt;
&lt;td&gt;Milliseconds&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Write throughput&lt;/td&gt;
&lt;td&gt;Medium (index maintenance overhead)&lt;/td&gt;
&lt;td&gt;Very high (append-only MergeTree)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Full-text search&lt;/td&gt;
&lt;td&gt;Excellent (native inverted index)&lt;/td&gt;
&lt;td&gt;Good (bloom filter indexes)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OTEL native support&lt;/td&gt;
&lt;td&gt;Via Logstash/Beats (extra hop)&lt;/td&gt;
&lt;td&gt;Native OTEL exporter (direct)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cross-data joins&lt;/td&gt;
&lt;td&gt;Not possible&lt;/td&gt;
&lt;td&gt;Join logs with business tables&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AI/LLM integration&lt;/td&gt;
&lt;td&gt;Kibana AI (limited)&lt;/td&gt;
&lt;td&gt;Full MCP + LLM stack&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tiered storage&lt;/td&gt;
&lt;td&gt;ILM (complex config)&lt;/td&gt;
&lt;td&gt;TTL + S3 (2 lines of SQL)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;License&lt;/td&gt;
&lt;td&gt;Elastic License (paid tiers)&lt;/td&gt;
&lt;td&gt;Apache 2.0 (fully open source)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Operational complexity&lt;/td&gt;
&lt;td&gt;High (shards, replicas, ILM)&lt;/td&gt;
&lt;td&gt;Low (MergeTree auto-manages)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Key Architectural Advantage
&lt;/h3&gt;

&lt;p&gt;ClickHouse already stores business data (trades, wallets, users, risk) via the Agentic AI Platform. Adding observability data to the &lt;strong&gt;same cluster&lt;/strong&gt; means:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- This query is IMPOSSIBLE in Elasticsearch:&lt;/span&gt;
&lt;span class="c1"&gt;-- "Which users were affected by the 500 errors on trading-service between 14:00-14:30?"&lt;/span&gt;

&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;LogAttributes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'user_id'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;kyc_status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;error_count&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;otel_logs&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;mart_users&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;LogAttributes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'user_id'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;username&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ServiceName&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'trading-service'&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;SeverityText&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'ERROR'&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;Timestamp&lt;/span&gt; &lt;span class="k"&gt;BETWEEN&lt;/span&gt; &lt;span class="s1"&gt;'2026-04-17 14:00:00'&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="s1"&gt;'2026-04-17 14:30:00'&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;kyc_status&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;error_count&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In ClickHouse: one query, instant answer.&lt;br&gt;
In Elasticsearch + separate business DB: impossible without custom ETL.&lt;/p&gt;
&lt;h3&gt;
  
  
  Horizontal Scaling — Parallel Replicas
&lt;/h3&gt;

&lt;p&gt;For high-volume workloads, ClickHouse scales a single query across multiple nodes using &lt;strong&gt;parallel replicas&lt;/strong&gt; — the query is split across the replica fleet and results are merged. At 50B rows (same TextBench dataset):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Nodes&lt;/th&gt;
&lt;th&gt;Total query runtime&lt;/th&gt;
&lt;th&gt;Speedup&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1 node&lt;/td&gt;
&lt;td&gt;19.1s&lt;/td&gt;
&lt;td&gt;baseline&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3 nodes&lt;/td&gt;
&lt;td&gt;12.5s&lt;/td&gt;
&lt;td&gt;1.5x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;9 nodes&lt;/td&gt;
&lt;td&gt;6.45s&lt;/td&gt;
&lt;td&gt;3x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;20 nodes&lt;/td&gt;
&lt;td&gt;3.27s&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;5.8x&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For text-search queries specifically, &lt;strong&gt;index sharding&lt;/strong&gt; distributes the inverted index analysis across the replica fleet — delivering a 5.8x speedup on full-text queries at scale. This is relevant for any high-volume environment as log volume grows: add nodes, queries get proportionally faster, no schema or application changes needed.&lt;/p&gt;


&lt;h2&gt;
  
  
  4. Platform Architecture
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Target Architecture
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;                    +------------------------------------------+
                    |         CRYPTO EXCHANGE APPS             |
                    |                                          |
                    | Trading Service  Wallet Service          |
                    | Risk Service     User Service            |
                    | API Gateway      Background Workers      |
                    +-------------------+----------------------+
                                        |
                           OTEL SDK (zero app code change)
                           Auto-instruments: HTTP, DB, Kafka, Redis
                                        |
                    +-------------------v----------------------+
                    |           OTEL COLLECTOR                 |
                    |                                          |
                    |  Receivers: OTLP gRPC/HTTP               |
                    |  Processors: batch, resource, filter     |
                    |  Exporters: clickhouseexporter           |
                    +-------------------+----------------------+
                                        |
                    +-------------------v----------------------+
                    |           CLICKHOUSE CLUSTER             |
                    |                                          |
                    |  database: otel                          |
                    |  ┌─────────────┬──────────┬──────────┐   |
                    |  │ otel_logs   │otel_traces│otel_metrics│|
                    |  │ (Logs)      │(Traces)  │(Metrics) │   |
                    |  └─────────────┴──────────┴──────────┘   |
                    |                                          |
                    |  database: marts (business data)         |
                    |  ┌────────────────────────────────────┐  |
                    |  │ mart_trades  mart_users  mart_wallets│|
                    |  └────────────────────────────────────┘  |
                    +---+------------------+-------------------+
                        |                  |
           +------------v---+     +--------v-----------+
           |    GRAFANA     |     |   LIBRECHAT + LLM  |
           |                |     |   (AI queries over |
           | Dashboards     |     |   logs in plain    |
           | Trace Viewer   |     |   English)         |
           | Log Explorer   |     |                    |
           | Alerts         |     |                    |
           +----------------+     +--------------------+
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  What Changes vs Current Stack
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Before&lt;/th&gt;
&lt;th&gt;After&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;OTEL SDK (app)&lt;/td&gt;
&lt;td&gt;Same&lt;/td&gt;
&lt;td&gt;Same — zero changes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OTEL Collector&lt;/td&gt;
&lt;td&gt;Same&lt;/td&gt;
&lt;td&gt;Same — only exporter config changes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Storage&lt;/td&gt;
&lt;td&gt;Elasticsearch&lt;/td&gt;
&lt;td&gt;ClickHouse&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Log/Trace UI&lt;/td&gt;
&lt;td&gt;Kibana&lt;/td&gt;
&lt;td&gt;Grafana&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Alerting&lt;/td&gt;
&lt;td&gt;Kibana Alerts&lt;/td&gt;
&lt;td&gt;Grafana Alerting&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AI queries&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;LibreChat + Qwen + MCP&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cross-data correlation&lt;/td&gt;
&lt;td&gt;Not possible&lt;/td&gt;
&lt;td&gt;Native SQL joins&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;


&lt;h3&gt;
  
  
  OTEL Collector Deployment — Two Architectures
&lt;/h3&gt;

&lt;p&gt;There are two ways to deploy the OTEL Collector. Choosing the wrong one causes data loss during ClickHouse restarts or network blips.&lt;/p&gt;
&lt;h4&gt;
  
  
  Agent-Only (not recommended for production)
&lt;/h4&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;App → OTEL Collector (on same host) → ClickHouse directly
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Each service runs its own collector. Simple to set up but fragile — if ClickHouse is briefly unreachable, the agent has no buffer and &lt;strong&gt;drops data&lt;/strong&gt;. Works for development or low-stakes services.&lt;/p&gt;
&lt;h4&gt;
  
  
  Aggregator (recommended for production)
&lt;/h4&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;App → OTEL Agent (lightweight, on each host)
          │
          ▼
    OTEL Aggregator (central, 1–2 instances)
    - batches writes
    - retries on ClickHouse failure
    - filters/transforms before storage
          │
          ▼
    ClickHouse
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;The agent on each host is lightweight — just forwards to the aggregator. The aggregator does the heavy work: batching, retry on failure, PII filtering, routing. If ClickHouse goes down for 5 minutes, the aggregator queues data and flushes when it comes back. &lt;strong&gt;No data loss.&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# OTEL Agent config (on each service host — minimal, just forwards)&lt;/span&gt;
&lt;span class="na"&gt;exporters&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;otlp&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;endpoint&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;otel-aggregator.internal:4317&lt;/span&gt;   &lt;span class="c1"&gt;# sends to aggregator, not ClickHouse&lt;/span&gt;

&lt;span class="c1"&gt;# OTEL Aggregator config (central — does batching, retry, filtering)&lt;/span&gt;
&lt;span class="na"&gt;processors&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;batch&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;timeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;10s&lt;/span&gt;
    &lt;span class="na"&gt;send_batch_size&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;50000&lt;/span&gt;
  &lt;span class="na"&gt;retry_on_failure&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
    &lt;span class="na"&gt;initial_interval&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;5s&lt;/span&gt;
    &lt;span class="na"&gt;max_elapsed_time&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;300s&lt;/span&gt;

&lt;span class="na"&gt;exporters&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;clickhouse&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;endpoint&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tcp://clickhouse.internal:9000&lt;/span&gt;
    &lt;span class="na"&gt;database&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;otel&lt;/span&gt;
    &lt;span class="na"&gt;compress&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;lz4&lt;/span&gt;
    &lt;span class="na"&gt;timeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;10s&lt;/span&gt;
    &lt;span class="na"&gt;retry_on_failure&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
      &lt;span class="na"&gt;initial_interval&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;5s&lt;/span&gt;
      &lt;span class="na"&gt;max_elapsed_time&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;300s&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;For a crypto exchange:&lt;/strong&gt; Use the Aggregator pattern. Trading and wallet services cannot drop observability data — if an incident occurs during a ClickHouse maintenance window, you need the logs.&lt;/p&gt;




&lt;h2&gt;
  
  
  5. OpenTelemetry — The Standard We Keep
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Why OTEL is the Right Foundation
&lt;/h3&gt;

&lt;p&gt;OpenTelemetry is a CNCF standard for instrumentation — vendor-neutral, supported by every major cloud provider and observability tool. We keep OTEL as our instrumentation layer. Only the &lt;strong&gt;backend destination&lt;/strong&gt; changes.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Application Code → OTEL SDK → OTEL Collector → [ANY BACKEND]
                                                   ↑
                              Swap this: Elasticsearch → ClickHouse
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  OTEL Data Types We Capture
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Signal&lt;/th&gt;
&lt;th&gt;What It Is&lt;/th&gt;
&lt;th&gt;Example&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Logs&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Structured log events with severity, body, attributes&lt;/td&gt;
&lt;td&gt;"ERROR: failed to execute trade, user=john_doe, error=InsufficientBalance"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Traces&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Distributed request flow across services with timing&lt;/td&gt;
&lt;td&gt;Full journey of a trade order: API → trading-service → PostgreSQL → Kafka&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Metrics&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Numeric measurements over time&lt;/td&gt;
&lt;td&gt;HTTP request rate, error rate, latency histogram, DB connection pool size&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  OTEL Collector — Only the Exporter Changes
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# otel-collector-config.yaml&lt;/span&gt;

&lt;span class="na"&gt;receivers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;otlp&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;protocols&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;grpc&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;endpoint&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;0.0.0.0:4317&lt;/span&gt;
      &lt;span class="na"&gt;http&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;endpoint&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;0.0.0.0:4318&lt;/span&gt;

&lt;span class="na"&gt;processors&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;batch&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;timeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;5s&lt;/span&gt;
    &lt;span class="na"&gt;send_batch_size&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10000&lt;/span&gt;
  &lt;span class="na"&gt;memory_limiter&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;limit_mib&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;512&lt;/span&gt;
  &lt;span class="na"&gt;resource&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;attributes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;environment&lt;/span&gt;
        &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;production&lt;/span&gt;
        &lt;span class="na"&gt;action&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;upsert&lt;/span&gt;

&lt;span class="na"&gt;exporters&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="c1"&gt;# ─── BEFORE (Elasticsearch) ───────────────────────────────&lt;/span&gt;
  &lt;span class="c1"&gt;# elasticsearch:&lt;/span&gt;
  &lt;span class="c1"&gt;#   endpoints: [https://elastic-cluster:9200]&lt;/span&gt;
  &lt;span class="c1"&gt;#   logs_index: logs-%{+yyyy.MM.dd}&lt;/span&gt;
  &lt;span class="c1"&gt;#   traces_index: traces-%{+yyyy.MM.dd}&lt;/span&gt;

  &lt;span class="c1"&gt;# ─── AFTER (ClickHouse) ────────────────────────────────────&lt;/span&gt;
  &lt;span class="na"&gt;clickhouse&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;endpoint&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tcp://clickhouse:9000&lt;/span&gt;
    &lt;span class="na"&gt;database&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;otel&lt;/span&gt;
    &lt;span class="na"&gt;logs_table_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;    &lt;span class="s"&gt;otel_logs&lt;/span&gt;
    &lt;span class="na"&gt;traces_table_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;  &lt;span class="s"&gt;otel_traces&lt;/span&gt;
    &lt;span class="na"&gt;metrics_table_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;otel_metrics&lt;/span&gt;
    &lt;span class="na"&gt;ttl&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;     &lt;span class="m"&gt;90&lt;/span&gt;               &lt;span class="c1"&gt;# days — auto-delete old data&lt;/span&gt;
    &lt;span class="na"&gt;compress&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;lz4&lt;/span&gt;
    &lt;span class="na"&gt;timeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;  &lt;span class="s"&gt;5s&lt;/span&gt;
    &lt;span class="na"&gt;retry_on_failure&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;           &lt;span class="kc"&gt;true&lt;/span&gt;
      &lt;span class="na"&gt;initial_interval&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;  &lt;span class="s"&gt;5s&lt;/span&gt;
      &lt;span class="na"&gt;max_interval&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;      &lt;span class="s"&gt;30s&lt;/span&gt;
      &lt;span class="na"&gt;max_elapsed_time&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;  &lt;span class="s"&gt;300s&lt;/span&gt;

&lt;span class="na"&gt;service&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;pipelines&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;logs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;receivers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;  &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;otlp&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
      &lt;span class="na"&gt;processors&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;memory_limiter&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;batch&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;resource&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
      &lt;span class="na"&gt;exporters&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;  &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;clickhouse&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;traces&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;receivers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;  &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;otlp&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
      &lt;span class="na"&gt;processors&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;memory_limiter&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;batch&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;resource&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
      &lt;span class="na"&gt;exporters&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;  &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;clickhouse&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;metrics&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;receivers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;  &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;otlp&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
      &lt;span class="na"&gt;processors&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;memory_limiter&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;batch&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;resource&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
      &lt;span class="na"&gt;exporters&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;  &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;clickhouse&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;That is the only config change needed.&lt;/strong&gt; Applications keep emitting OTEL. The collector keeps receiving. Only the destination changes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Application Instrumentation — Zero Code Changes
&lt;/h3&gt;

&lt;p&gt;For most languages, attach the OTEL agent at startup:&lt;/p&gt;

&lt;h4&gt;
  
  
  Java (Spring Boot / any JVM)
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Add to JVM startup — auto-instruments HTTP, JDBC, Kafka, Redis&lt;/span&gt;
java &lt;span class="nt"&gt;-javaagent&lt;/span&gt;:opentelemetry-javaagent.jar &lt;span class="se"&gt;\&lt;/span&gt;
     &lt;span class="nt"&gt;-Dotel&lt;/span&gt;.service.name&lt;span class="o"&gt;=&lt;/span&gt;trading-service &lt;span class="se"&gt;\&lt;/span&gt;
     &lt;span class="nt"&gt;-Dotel&lt;/span&gt;.exporter.otlp.endpoint&lt;span class="o"&gt;=&lt;/span&gt;http://otel-collector:4317 &lt;span class="se"&gt;\&lt;/span&gt;
     &lt;span class="nt"&gt;-Dotel&lt;/span&gt;.traces.exporter&lt;span class="o"&gt;=&lt;/span&gt;otlp &lt;span class="se"&gt;\&lt;/span&gt;
     &lt;span class="nt"&gt;-Dotel&lt;/span&gt;.metrics.exporter&lt;span class="o"&gt;=&lt;/span&gt;otlp &lt;span class="se"&gt;\&lt;/span&gt;
     &lt;span class="nt"&gt;-Dotel&lt;/span&gt;.logs.exporter&lt;span class="o"&gt;=&lt;/span&gt;otlp &lt;span class="se"&gt;\&lt;/span&gt;
     &lt;span class="nt"&gt;-jar&lt;/span&gt; trading-service.jar
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Python (FastAPI / Django)
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Add 3 lines at startup — auto-instruments HTTP, PostgreSQL, Redis
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;opentelemetry.instrumentation.fastapi&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FastAPIInstrumentor&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;opentelemetry.instrumentation.psycopg2&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Psycopg2Instrumentor&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;opentelemetry.instrumentation.requests&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;RequestsInstrumentor&lt;/span&gt;

&lt;span class="n"&gt;FastAPIInstrumentor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;instrument_app&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nc"&gt;Psycopg2Instrumentor&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;instrument&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="nc"&gt;RequestsInstrumentor&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;instrument&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Node.js
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// tracing.js — load before anything else via --require&lt;/span&gt;
&lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@opentelemetry/auto-instrumentations-node/register&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="c1"&gt;// Auto-instruments: express, pg, http, redis, mongoose, kafka&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  5b. Collection Agent Options — OTEL vs Fluent Bit vs Vector
&lt;/h2&gt;

&lt;p&gt;OTEL Collector is not the only way to ship data to ClickHouse. Three agents are production-proven. The compression difference between them is significant enough to affect your storage cost.&lt;/p&gt;

&lt;h3&gt;
  
  
  Compression Comparison (real benchmarks — identical log dataset)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Agent&lt;/th&gt;
&lt;th&gt;Compression Ratio&lt;/th&gt;
&lt;th&gt;Status&lt;/th&gt;
&lt;th&gt;Protocol&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Fluent Bit&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;33×&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Mature&lt;/td&gt;
&lt;td&gt;HTTP JSONEachRow&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Vector&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;21×&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Beta / widely used&lt;/td&gt;
&lt;td&gt;HTTP JSON&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;OTEL Collector&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;14×&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Alpha for ClickHouse exporter&lt;/td&gt;
&lt;td&gt;Native ClickHouse TCP&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Fluent Bit achieves 33× compression because it gives you full control over the schema — you define exactly which fields land in which typed columns. OTEL Collector uses a fixed schema (Map types for attributes) which is more flexible but less compressible.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;For a crypto exchange:&lt;/strong&gt; OTEL Collector is the right default because we already use OTEL instrumentation end-to-end and the fixed schema covers all signals. If storage cost becomes a concern at high volume, Fluent Bit is the migration path — it requires a custom schema but delivers the best compression.&lt;/p&gt;




&lt;h3&gt;
  
  
  When to Use Each
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Agent&lt;/th&gt;
&lt;th&gt;Use When&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;OTEL Collector&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;You already emit OTEL signals (our case). Single config change, zero app changes. Traces + logs + metrics in one pipeline.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Fluent Bit&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;K8s environment, log-heavy workload, storage cost is a priority. Best compression. Mature and battle-tested. Does not handle traces.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Vector&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;You want a fully custom schema with maximum flexibility. Good middle ground — better compression than OTEL, handles more data types than Fluent Bit.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h3&gt;
  
  
  Fluent Bit → ClickHouse Config (for reference)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# fluent-bit.yaml — ships K8s pod logs directly to ClickHouse&lt;/span&gt;
&lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;OUTPUT&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="s"&gt;Name          clickhouse&lt;/span&gt;
    &lt;span class="s"&gt;Match         *&lt;/span&gt;
    &lt;span class="s"&gt;Host          clickhouse.internal&lt;/span&gt;
    &lt;span class="s"&gt;Port          &lt;/span&gt;&lt;span class="m"&gt;8123&lt;/span&gt;
    &lt;span class="s"&gt;Database      otel&lt;/span&gt;
    &lt;span class="s"&gt;Table         fluent_logs&lt;/span&gt;
    &lt;span class="s"&gt;# Critical&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="s"&gt;async inserts prevent too-small-batch errors&lt;/span&gt;
    &lt;span class="s"&gt;async_insert  &lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;
    &lt;span class="s"&gt;flush         10s&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Custom schema for Fluent Bit (33x compression)&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;otel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fluent_logs&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nb"&gt;timestamp&lt;/span&gt;               &lt;span class="n"&gt;DateTime64&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;             &lt;span class="n"&gt;CODEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Delta&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ZSTD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="n"&gt;pod_name&lt;/span&gt;                &lt;span class="n"&gt;LowCardinality&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;namespace&lt;/span&gt;               &lt;span class="n"&gt;LowCardinality&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;container_name&lt;/span&gt;          &lt;span class="n"&gt;LowCardinality&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;severity&lt;/span&gt;                &lt;span class="n"&gt;LowCardinality&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;message&lt;/span&gt;                 &lt;span class="n"&gt;String&lt;/span&gt;                    &lt;span class="n"&gt;CODEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ZSTD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="n"&gt;kubernetes_labels&lt;/span&gt;       &lt;span class="k"&gt;Map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;LowCardinality&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;CODEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ZSTD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ENGINE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;MergeTree&lt;/span&gt;
&lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;toDate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;namespace&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pod_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;TTL&lt;/span&gt; &lt;span class="n"&gt;toDateTime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="mi"&gt;90&lt;/span&gt; &lt;span class="k"&gt;DAY&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Important:&lt;/strong&gt; Fluent Bit requires &lt;code&gt;async_insert=1&lt;/code&gt; and flush intervals ≥10 seconds. Without this, each log line triggers a separate HTTP insert and ClickHouse performance degrades significantly from too many small writes.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  6. ClickHouse Schema — Logs, Traces, Metrics
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Logs Table
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;otel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;otel_logs&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nb"&gt;Timestamp&lt;/span&gt;           &lt;span class="n"&gt;DateTime64&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;              &lt;span class="n"&gt;CODEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Delta&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ZSTD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="n"&gt;TraceId&lt;/span&gt;             &lt;span class="n"&gt;String&lt;/span&gt;                     &lt;span class="n"&gt;CODEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ZSTD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="n"&gt;SpanId&lt;/span&gt;              &lt;span class="n"&gt;String&lt;/span&gt;                     &lt;span class="n"&gt;CODEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ZSTD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="n"&gt;TraceFlags&lt;/span&gt;          &lt;span class="n"&gt;UInt32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;SeverityText&lt;/span&gt;        &lt;span class="n"&gt;LowCardinality&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;    &lt;span class="c1"&gt;-- ERROR, WARN, INFO, DEBUG&lt;/span&gt;
    &lt;span class="n"&gt;SeverityNumber&lt;/span&gt;      &lt;span class="n"&gt;Int32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ServiceName&lt;/span&gt;         &lt;span class="n"&gt;LowCardinality&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;Body&lt;/span&gt;                &lt;span class="n"&gt;String&lt;/span&gt;                     &lt;span class="n"&gt;CODEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ZSTD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="n"&gt;ResourceSchemaUrl&lt;/span&gt;   &lt;span class="n"&gt;String&lt;/span&gt;                     &lt;span class="n"&gt;CODEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ZSTD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="n"&gt;ResourceAttributes&lt;/span&gt;  &lt;span class="k"&gt;Map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;LowCardinality&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;CODEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ZSTD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="n"&gt;ScopeSchemaUrl&lt;/span&gt;      &lt;span class="n"&gt;String&lt;/span&gt;                     &lt;span class="n"&gt;CODEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ZSTD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="n"&gt;ScopeName&lt;/span&gt;           &lt;span class="n"&gt;String&lt;/span&gt;                     &lt;span class="n"&gt;CODEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ZSTD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="n"&gt;ScopeVersion&lt;/span&gt;        &lt;span class="n"&gt;String&lt;/span&gt;                     &lt;span class="n"&gt;CODEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ZSTD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="n"&gt;ScopeAttributes&lt;/span&gt;     &lt;span class="k"&gt;Map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;LowCardinality&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;CODEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ZSTD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="n"&gt;LogAttributes&lt;/span&gt;       &lt;span class="k"&gt;Map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;LowCardinality&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;CODEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ZSTD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;

    &lt;span class="c1"&gt;-- Skip indexes for fast filtering (replaces Elasticsearch inverted index for common patterns).&lt;/span&gt;
    &lt;span class="c1"&gt;-- Body uses tokenbf_v1 here as the safe default — it's stable, no flags, works on every CH 23+.&lt;/span&gt;
    &lt;span class="c1"&gt;-- See §8 for the newer `text` index, which is the recommended upgrade once you're on a recent&lt;/span&gt;
    &lt;span class="c1"&gt;-- ClickHouse build (better selectivity, S3-friendly). You'd swap idx_body for the text index there.&lt;/span&gt;
    &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_trace_id&lt;/span&gt;   &lt;span class="n"&gt;TraceId&lt;/span&gt;     &lt;span class="k"&gt;TYPE&lt;/span&gt; &lt;span class="n"&gt;bloom_filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;001&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;GRANULARITY&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_body&lt;/span&gt;       &lt;span class="n"&gt;Body&lt;/span&gt;        &lt;span class="k"&gt;TYPE&lt;/span&gt; &lt;span class="n"&gt;tokenbf_v1&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;32768&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;GRANULARITY&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_service&lt;/span&gt;    &lt;span class="n"&gt;ServiceName&lt;/span&gt; &lt;span class="k"&gt;TYPE&lt;/span&gt; &lt;span class="k"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;             &lt;span class="n"&gt;GRANULARITY&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_severity&lt;/span&gt;   &lt;span class="n"&gt;SeverityText&lt;/span&gt; &lt;span class="k"&gt;TYPE&lt;/span&gt; &lt;span class="k"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;             &lt;span class="n"&gt;GRANULARITY&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ENGINE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;MergeTree&lt;/span&gt;
&lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;toDate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;Timestamp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ServiceName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;SeverityText&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;toUnixTimestamp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;Timestamp&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;TTL&lt;/span&gt; &lt;span class="n"&gt;toDateTime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;Timestamp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="mi"&gt;90&lt;/span&gt; &lt;span class="k"&gt;DAY&lt;/span&gt;
&lt;span class="n"&gt;SETTINGS&lt;/span&gt; &lt;span class="n"&gt;index_granularity&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;8192&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Traces Table
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;otel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;otel_traces&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nb"&gt;Timestamp&lt;/span&gt;           &lt;span class="n"&gt;DateTime64&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;              &lt;span class="n"&gt;CODEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Delta&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ZSTD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="n"&gt;TraceId&lt;/span&gt;             &lt;span class="n"&gt;String&lt;/span&gt;                     &lt;span class="n"&gt;CODEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ZSTD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="n"&gt;SpanId&lt;/span&gt;              &lt;span class="n"&gt;String&lt;/span&gt;                     &lt;span class="n"&gt;CODEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ZSTD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="n"&gt;ParentSpanId&lt;/span&gt;        &lt;span class="n"&gt;String&lt;/span&gt;                     &lt;span class="n"&gt;CODEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ZSTD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="n"&gt;TraceState&lt;/span&gt;          &lt;span class="n"&gt;String&lt;/span&gt;                     &lt;span class="n"&gt;CODEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ZSTD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="n"&gt;SpanName&lt;/span&gt;            &lt;span class="n"&gt;LowCardinality&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;SpanKind&lt;/span&gt;            &lt;span class="n"&gt;LowCardinality&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;    &lt;span class="c1"&gt;-- SERVER, CLIENT, PRODUCER, CONSUMER&lt;/span&gt;
    &lt;span class="n"&gt;ServiceName&lt;/span&gt;         &lt;span class="n"&gt;LowCardinality&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;ResourceAttributes&lt;/span&gt;  &lt;span class="k"&gt;Map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;LowCardinality&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;CODEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ZSTD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="n"&gt;SpanAttributes&lt;/span&gt;      &lt;span class="k"&gt;Map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;LowCardinality&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;CODEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ZSTD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="n"&gt;Duration&lt;/span&gt;            &lt;span class="n"&gt;Int64&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;                     &lt;span class="c1"&gt;-- nanoseconds&lt;/span&gt;
    &lt;span class="n"&gt;StatusCode&lt;/span&gt;          &lt;span class="n"&gt;LowCardinality&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;    &lt;span class="c1"&gt;-- STATUS_CODE_OK, STATUS_CODE_ERROR&lt;/span&gt;
    &lt;span class="n"&gt;StatusMessage&lt;/span&gt;       &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;Events&lt;/span&gt;              &lt;span class="n"&gt;Nested&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nb"&gt;Timestamp&lt;/span&gt;       &lt;span class="n"&gt;DateTime64&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;Name&lt;/span&gt;            &lt;span class="n"&gt;LowCardinality&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;Attributes&lt;/span&gt;      &lt;span class="k"&gt;Map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;LowCardinality&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;Links&lt;/span&gt;               &lt;span class="n"&gt;Nested&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;TraceId&lt;/span&gt;         &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;SpanId&lt;/span&gt;          &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;TraceState&lt;/span&gt;      &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;Attributes&lt;/span&gt;      &lt;span class="k"&gt;Map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;LowCardinality&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;

    &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_trace_id&lt;/span&gt;   &lt;span class="n"&gt;TraceId&lt;/span&gt;     &lt;span class="k"&gt;TYPE&lt;/span&gt; &lt;span class="n"&gt;bloom_filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;001&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;GRANULARITY&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_span_name&lt;/span&gt;  &lt;span class="n"&gt;SpanName&lt;/span&gt;    &lt;span class="k"&gt;TYPE&lt;/span&gt; &lt;span class="k"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;             &lt;span class="n"&gt;GRANULARITY&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_service&lt;/span&gt;    &lt;span class="n"&gt;ServiceName&lt;/span&gt; &lt;span class="k"&gt;TYPE&lt;/span&gt; &lt;span class="k"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;             &lt;span class="n"&gt;GRANULARITY&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ENGINE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;MergeTree&lt;/span&gt;
&lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;toDate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;Timestamp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ServiceName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;SpanName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;toUnixTimestamp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;Timestamp&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;TTL&lt;/span&gt; &lt;span class="n"&gt;toDateTime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;Timestamp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="mi"&gt;90&lt;/span&gt; &lt;span class="k"&gt;DAY&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Metrics Tables (5 separate tables — one per OTEL metric type)
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Important:&lt;/strong&gt; Unlike logs and traces, the ClickHouse OTEL exporter does &lt;strong&gt;not&lt;/strong&gt; use a single &lt;code&gt;otel_metrics&lt;/code&gt; table. It creates &lt;strong&gt;five separate tables&lt;/strong&gt; — one per OpenTelemetry metric type — because each type has different fields (sums and gauges are scalar, histograms have buckets, summaries have quantiles, etc.). All five share the same Resource/Scope/Attributes prefix.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Table&lt;/th&gt;
&lt;th&gt;OTEL Metric Type&lt;/th&gt;
&lt;th&gt;Typical Use&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;otel_metrics_sum&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Sum (counter)&lt;/td&gt;
&lt;td&gt;request counts, bytes sent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;otel_metrics_gauge&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Gauge&lt;/td&gt;
&lt;td&gt;CPU %, queue depth, memory in use&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;otel_metrics_histogram&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Histogram&lt;/td&gt;
&lt;td&gt;request latency buckets&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;otel_metrics_exponential_histogram&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Exponential histogram&lt;/td&gt;
&lt;td&gt;high-cardinality latency&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;otel_metrics_summary&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Summary (quantiles)&lt;/td&gt;
&lt;td&gt;client-precomputed p50/p95/p99&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Below is the &lt;code&gt;otel_metrics_sum&lt;/code&gt; schema (other tables share the same prefix and add type-specific fields):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;otel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;otel_metrics_sum&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;ResourceAttributes&lt;/span&gt;      &lt;span class="k"&gt;Map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;LowCardinality&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;CODEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ZSTD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="n"&gt;ResourceSchemaUrl&lt;/span&gt;       &lt;span class="n"&gt;String&lt;/span&gt; &lt;span class="n"&gt;CODEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ZSTD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="n"&gt;ScopeName&lt;/span&gt;               &lt;span class="n"&gt;String&lt;/span&gt; &lt;span class="n"&gt;CODEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ZSTD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="n"&gt;ScopeVersion&lt;/span&gt;            &lt;span class="n"&gt;String&lt;/span&gt; &lt;span class="n"&gt;CODEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ZSTD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="n"&gt;ScopeAttributes&lt;/span&gt;         &lt;span class="k"&gt;Map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;LowCardinality&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;CODEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ZSTD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="n"&gt;ScopeDroppedAttrCount&lt;/span&gt;   &lt;span class="n"&gt;UInt32&lt;/span&gt; &lt;span class="n"&gt;CODEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ZSTD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="n"&gt;ScopeSchemaUrl&lt;/span&gt;          &lt;span class="n"&gt;String&lt;/span&gt; &lt;span class="n"&gt;CODEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ZSTD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="n"&gt;ServiceName&lt;/span&gt;             &lt;span class="n"&gt;LowCardinality&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;CODEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ZSTD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;   &lt;span class="c1"&gt;-- materialized from ResourceAttributes['service.name']&lt;/span&gt;
    &lt;span class="n"&gt;MetricName&lt;/span&gt;              &lt;span class="n"&gt;LowCardinality&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;CODEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ZSTD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="n"&gt;MetricDescription&lt;/span&gt;       &lt;span class="n"&gt;String&lt;/span&gt; &lt;span class="n"&gt;CODEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ZSTD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="n"&gt;MetricUnit&lt;/span&gt;              &lt;span class="n"&gt;String&lt;/span&gt; &lt;span class="n"&gt;CODEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ZSTD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="n"&gt;Attributes&lt;/span&gt;              &lt;span class="k"&gt;Map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;LowCardinality&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;CODEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ZSTD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="n"&gt;StartTimeUnix&lt;/span&gt;           &lt;span class="n"&gt;DateTime64&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;CODEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Delta&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ZSTD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="n"&gt;TimeUnix&lt;/span&gt;                &lt;span class="n"&gt;DateTime64&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;CODEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Delta&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ZSTD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="n"&gt;Value&lt;/span&gt;                   &lt;span class="n"&gt;Float64&lt;/span&gt; &lt;span class="n"&gt;CODEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ZSTD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="n"&gt;Flags&lt;/span&gt;                   &lt;span class="n"&gt;UInt32&lt;/span&gt; &lt;span class="n"&gt;CODEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ZSTD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="n"&gt;Exemplars&lt;/span&gt;               &lt;span class="n"&gt;Nested&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;FilteredAttributes&lt;/span&gt;  &lt;span class="k"&gt;Map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;LowCardinality&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;TimeUnix&lt;/span&gt;            &lt;span class="n"&gt;DateTime64&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;Value&lt;/span&gt;               &lt;span class="n"&gt;Float64&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;SpanId&lt;/span&gt;              &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;TraceId&lt;/span&gt;             &lt;span class="n"&gt;String&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;CODEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ZSTD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="n"&gt;AggregationTemporality&lt;/span&gt;  &lt;span class="n"&gt;Int32&lt;/span&gt; &lt;span class="n"&gt;CODEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ZSTD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;    &lt;span class="c1"&gt;-- 1=Delta, 2=Cumulative&lt;/span&gt;
    &lt;span class="n"&gt;IsMonotonic&lt;/span&gt;             &lt;span class="nb"&gt;Bool&lt;/span&gt; &lt;span class="n"&gt;CODEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ZSTD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ENGINE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;MergeTree&lt;/span&gt;
&lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;toDate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;TimeUnix&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ServiceName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;MetricName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Attributes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;toUnixTimestamp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;TimeUnix&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;TTL&lt;/span&gt; &lt;span class="n"&gt;toDateTime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;TimeUnix&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="mi"&gt;90&lt;/span&gt; &lt;span class="k"&gt;DAY&lt;/span&gt;
&lt;span class="n"&gt;SETTINGS&lt;/span&gt; &lt;span class="n"&gt;index_granularity&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;8192&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ttl_only_drop_parts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The histogram table adds &lt;code&gt;Count&lt;/code&gt;, &lt;code&gt;Sum&lt;/code&gt;, &lt;code&gt;BucketCounts Array(UInt64)&lt;/code&gt;, &lt;code&gt;ExplicitBounds Array(Float64)&lt;/code&gt;, &lt;code&gt;Min&lt;/code&gt;, &lt;code&gt;Max&lt;/code&gt;. The summary table adds &lt;code&gt;ValueAtQuantiles Nested(Quantile Float64, Value Float64)&lt;/code&gt;. See the &lt;a href="https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/exporter/clickhouseexporter" rel="noopener noreferrer"&gt;OTEL ClickHouse exporter source&lt;/a&gt; for the full DDL.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Note on &lt;code&gt;ServiceName&lt;/code&gt;: the exporter does &lt;strong&gt;not&lt;/strong&gt; materialize this column by default — it lives inside &lt;code&gt;ResourceAttributes['service.name']&lt;/code&gt;. To include it in &lt;code&gt;ORDER BY&lt;/code&gt; (as above), either let the exporter create the column with &lt;code&gt;service_name_attribute_key&lt;/code&gt;, or add a materialized column: &lt;code&gt;ServiceName LowCardinality(String) MATERIALIZED ResourceAttributes['service.name']&lt;/code&gt;. Without it, drop &lt;code&gt;ServiceName&lt;/code&gt; from &lt;code&gt;ORDER BY&lt;/code&gt; and filter via &lt;code&gt;ResourceAttributes['service.name'] = 'trading-service'&lt;/code&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Why &lt;code&gt;LowCardinality&lt;/code&gt; and &lt;code&gt;CODEC&lt;/code&gt; Matter
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Optimization&lt;/th&gt;
&lt;th&gt;What It Does&lt;/th&gt;
&lt;th&gt;Impact&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;LowCardinality(String)&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Dictionary-encodes repeated values (ServiceName, SeverityText)&lt;/td&gt;
&lt;td&gt;3-5x compression + faster filtering&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;CODEC(Delta, ZSTD(1))&lt;/code&gt; on Timestamp&lt;/td&gt;
&lt;td&gt;Delta encodes sequential timestamps, then ZSTD compresses&lt;/td&gt;
&lt;td&gt;5-10x compression on time columns&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;CODEC(ZSTD(1))&lt;/code&gt; on String cols&lt;/td&gt;
&lt;td&gt;General purpose compression&lt;/td&gt;
&lt;td&gt;3-7x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;bloom_filter&lt;/code&gt; index on TraceId&lt;/td&gt;
&lt;td&gt;Skips data blocks that can't contain the TraceId&lt;/td&gt;
&lt;td&gt;Near O(1) lookup&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;tokenbf_v1&lt;/code&gt; index on Body&lt;/td&gt;
&lt;td&gt;Token-based bloom filter for keyword search&lt;/td&gt;
&lt;td&gt;Skips irrelevant blocks without full scan&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;PARTITION BY toDate(Timestamp)&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;One partition per day — old days auto-deleted by TTL&lt;/td&gt;
&lt;td&gt;Instant TTL, no maintenance&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;ORDER BY (ServiceName, SeverityText, Timestamp)&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Most queries filter by service + severity + time&lt;/td&gt;
&lt;td&gt;Queries read minimal data&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  7. End-to-End Distributed Tracing — HTTP to ClickHouse
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Full Trace Flow
&lt;/h3&gt;

&lt;p&gt;Every incoming HTTP request gets a &lt;code&gt;TraceId&lt;/code&gt;. That same ID propagates through every service and DB call. ClickHouse stores all spans. Grafana shows the waterfall.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Browser → [traceparent header injected]
              │
              ▼
[1] Nginx               → Span: http.server (method, url, status, duration)
              │           propagates traceparent to upstream
              ▼
[2] API Gateway         → Span: route handling
              │
              ▼
[3] Trading Service     → Span: business logic
              │
        ┌─────┴──────┐
        ▼             ▼
[4] PostgreSQL      [5] ClickHouse    → Span: db.query (SQL text, duration, rows)
(OTEL JDBC auto)    (native OTEL)
        │
        ▼
[6] Kafka Producer      → Span: messaging.publish (topic, partition)

All spans → OTEL Collector → ClickHouse otel_traces table
                                         │
                                         ▼
                                   Grafana Trace UI
                                   (full waterfall)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Layer 1 — Nginx (Trace Entry Point)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight nginx"&gt;&lt;code&gt;&lt;span class="c1"&gt;# nginx.conf&lt;/span&gt;
&lt;span class="k"&gt;load_module&lt;/span&gt; &lt;span class="nc"&gt;modules/ngx&lt;/span&gt;&lt;span class="s"&gt;_otel_module.so&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;http&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kn"&gt;otel_exporter&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kn"&gt;endpoint&lt;/span&gt; &lt;span class="nf"&gt;otel-collector&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;4317&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="kn"&gt;otel_service_name&lt;/span&gt;  &lt;span class="s"&gt;"api-gateway"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kn"&gt;otel_trace&lt;/span&gt;         &lt;span class="no"&gt;on&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kn"&gt;otel_trace_context&lt;/span&gt; &lt;span class="s"&gt;propagate&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;    &lt;span class="c1"&gt;# injects traceparent into upstream request&lt;/span&gt;

    &lt;span class="kn"&gt;server&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kn"&gt;location&lt;/span&gt; &lt;span class="n"&gt;/api/&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="kn"&gt;otel_trace&lt;/span&gt;      &lt;span class="no"&gt;on&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
            &lt;span class="kn"&gt;otel_span_name&lt;/span&gt;  &lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$request_method&lt;/span&gt; &lt;span class="nv"&gt;$uri&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
            &lt;span class="kn"&gt;otel_span_attr&lt;/span&gt;  &lt;span class="s"&gt;http.method&lt;/span&gt;  &lt;span class="nv"&gt;$request_method&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
            &lt;span class="kn"&gt;otel_span_attr&lt;/span&gt;  &lt;span class="s"&gt;http.url&lt;/span&gt;     &lt;span class="nv"&gt;$uri&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
            &lt;span class="kn"&gt;otel_span_attr&lt;/span&gt;  &lt;span class="s"&gt;http.status&lt;/span&gt;  &lt;span class="nv"&gt;$status&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
            &lt;span class="kn"&gt;proxy_pass&lt;/span&gt;      &lt;span class="s"&gt;http://backend&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Layer 2 — Application Services (Auto-Instrumented)
&lt;/h3&gt;

&lt;p&gt;OTEL Java agent auto-instruments every JDBC query. For ClickHouse queries, pass the context explicitly:&lt;/p&gt;

&lt;h4&gt;
  
  
  Python — Tracing ClickHouse Queries
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;opentelemetry&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;trace&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;opentelemetry.propagate&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;inject&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;clickhouse_connect&lt;/span&gt;

&lt;span class="n"&gt;tracer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;trace&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_tracer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;trading-service&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_user_risk_profile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;username&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;tracer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;start_as_current_span&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;clickhouse.mart_user_risk_profile&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_attribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;db.system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;clickhouse&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_attribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;db.name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;      &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;marts&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_attribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;db.operation&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SELECT&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_attribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;db.statement&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SELECT * FROM mart_user_risk_profile WHERE username = ?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Inject current trace context into ClickHouse HTTP headers.
&lt;/span&gt;        &lt;span class="c1"&gt;# IMPORTANT: clickhouse_connect caches `http_headers` on the client object — they
&lt;/span&gt;        &lt;span class="c1"&gt;# are reused for every subsequent query on this client. To get a *unique* traceparent
&lt;/span&gt;        &lt;span class="c1"&gt;# per query, either (a) create a fresh client per request (as below), or (b) pass the
&lt;/span&gt;        &lt;span class="c1"&gt;# traceparent via the per-query `settings={"opentelemetry_traceparent": ...}` argument
&lt;/span&gt;        &lt;span class="c1"&gt;# so each query carries its own trace ID.
&lt;/span&gt;        &lt;span class="n"&gt;headers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
        &lt;span class="nf"&gt;inject&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;clickhouse_connect&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;host&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;clickhouse&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;http_headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;    &lt;span class="c1"&gt;# ClickHouse reads traceparent from here
&lt;/span&gt;        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SELECT * FROM mart_user_risk_profile WHERE username = {username:String}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;parameters&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;username&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;username&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_attribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;db.rows_returned&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;result_rows&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;result_rows&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Go — Tracing ClickHouse Queries
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s"&gt;"github.com/ClickHouse/clickhouse-go/v2"&lt;/span&gt;
    &lt;span class="s"&gt;"go.opentelemetry.io/otel"&lt;/span&gt;
    &lt;span class="s"&gt;"go.opentelemetry.io/otel/attribute"&lt;/span&gt;
    &lt;span class="s"&gt;"go.opentelemetry.io/otel/propagation"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;tracer&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;otel&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Tracer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"trading-service"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;getTopTraders&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;([]&lt;/span&gt;&lt;span class="n"&gt;Trader&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;span&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;tracer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Start&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"clickhouse.mart_trades_futures"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;defer&lt;/span&gt; &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;End&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;SetAttributes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;attribute&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"db.system"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;    &lt;span class="s"&gt;"clickhouse"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;attribute&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"db.operation"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"SELECT"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;attribute&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"db.statement"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"SELECT username, SUM(quantity*price) FROM mart_trades_futures ..."&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c"&gt;// clickhouse-go/v2 does NOT auto-inject the OTEL traceparent into the&lt;/span&gt;
    &lt;span class="c"&gt;// native protocol. Extract it explicitly from ctx and pass it via the&lt;/span&gt;
    &lt;span class="c"&gt;// `opentelemetry_traceparent` query setting — ClickHouse server reads&lt;/span&gt;
    &lt;span class="c"&gt;// that and emits the correct trace_id into system.opentelemetry_span_log.&lt;/span&gt;
    &lt;span class="n"&gt;carrier&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;propagation&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;MapCarrier&lt;/span&gt;&lt;span class="p"&gt;{}&lt;/span&gt;
    &lt;span class="n"&gt;otel&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GetTextMapPropagator&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Inject&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;carrier&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;clickhouse&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;clickhouse&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Options&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;Addr&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;"clickhouse:9000"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="n"&gt;Settings&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;clickhouse&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Settings&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="s"&gt;"opentelemetry_traceparent"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;carrier&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"traceparent"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="c"&gt;// optional, only if your propagator emits it:&lt;/span&gt;
            &lt;span class="s"&gt;"opentelemetry_tracestate"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;  &lt;span class="n"&gt;carrier&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"tracestate"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;

    &lt;span class="n"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;QueryContext&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;`
        SELECT username, SUM(quantity * price) as volume
        FROM mart_trades_futures
        WHERE transaction_time &amp;gt;= today()
        GROUP BY username
        ORDER BY volume DESC
        LIMIT 10
    `&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;parseRows&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Heads up — sample rate:&lt;/strong&gt; ClickHouse only writes its own internal spans to &lt;code&gt;system.opentelemetry_span_log&lt;/code&gt; if &lt;code&gt;opentelemetry_start_trace_probability &amp;gt; 0&lt;/code&gt; (default 0). Set this in &lt;code&gt;users.xml&lt;/code&gt; (or via &lt;code&gt;SET&lt;/code&gt; per session) to capture server-side spans for traceparent-tagged queries. A common production setting is &lt;code&gt;0.01&lt;/code&gt; (1%) for load, &lt;code&gt;1.0&lt;/code&gt; for staging.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Layer 3 — ClickHouse Native Tracing
&lt;/h3&gt;

&lt;p&gt;ClickHouse reads the &lt;code&gt;traceparent&lt;/code&gt; header from every query and emits its own internal spans to &lt;code&gt;system.opentelemetry_span_log&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- ClickHouse internal spans for a specific trace.&lt;/span&gt;
&lt;span class="c1"&gt;-- NOTE: in modern ClickHouse, system.opentelemetry_span_log.trace_id is type UUID,&lt;/span&gt;
&lt;span class="c1"&gt;-- not String — direct hex-string comparison fails. Convert with lower(hex(trace_id))&lt;/span&gt;
&lt;span class="c1"&gt;-- or build the UUID from the hex form.&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="k"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;hex&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;trace_id&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;                     &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;trace_id_hex&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;hex&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;span_id&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;                      &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;span_id_hex&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;hex&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;parent_span_id&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;               &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;parent_span_id_hex&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;operation_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;finish_time_us&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;start_time_us&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;  &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;duration_ms&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;attribute&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'clickhouse.query'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;             &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;sql_query&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="k"&gt;system&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;opentelemetry_span_log&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="k"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;hex&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;trace_id&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'4bf92f3577b34da6a3ce929d0e0e4736'&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;start_time_us&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Export these to &lt;code&gt;otel_traces&lt;/code&gt; via a materialized view so they appear in Grafana alongside app spans:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Auto-export ClickHouse internal spans to otel_traces&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;MATERIALIZED&lt;/span&gt; &lt;span class="k"&gt;VIEW&lt;/span&gt; &lt;span class="n"&gt;otel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;clickhouse_spans_mv&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;otel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;otel_traces&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;fromUnixTimestamp64Micro&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;start_time_us&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="nb"&gt;Timestamp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;hex&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;trace_id&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;                     &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;TraceId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;hex&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;span_id&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;                      &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;SpanId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;hex&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;parent_span_id&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;               &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;ParentSpanId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;''&lt;/span&gt;                                       &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;TraceState&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;operation_name&lt;/span&gt;                           &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;SpanName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'SPAN_KIND_SERVER'&lt;/span&gt;                       &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;SpanKind&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'clickhouse'&lt;/span&gt;                             &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;ServiceName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;map&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;                                    &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;ResourceAttributes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;attribute&lt;/span&gt;                                &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;SpanAttributes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;finish_time_us&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;start_time_us&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;  &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;Duration&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="c1"&gt;-- convert to nanoseconds&lt;/span&gt;
    &lt;span class="s1"&gt;'STATUS_CODE_OK'&lt;/span&gt;                         &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;StatusCode&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;''&lt;/span&gt;                                       &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;StatusMessage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;[]&lt;/span&gt;                                       &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="nv"&gt;`Events.Timestamp`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;[]&lt;/span&gt;                                       &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="nv"&gt;`Events.Name`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;[]&lt;/span&gt;                                       &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="nv"&gt;`Events.Attributes`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;[]&lt;/span&gt;                                       &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="nv"&gt;`Links.TraceId`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;[]&lt;/span&gt;                                       &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="nv"&gt;`Links.SpanId`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;[]&lt;/span&gt;                                       &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="nv"&gt;`Links.TraceState`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;[]&lt;/span&gt;                                       &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="nv"&gt;`Links.Attributes`&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="k"&gt;system&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;opentelemetry_span_log&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  What a Full Trace Looks Like in ClickHouse
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Full waterfall for a single trade request&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;SpanName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ServiceName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;Duration&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="n"&gt;e6&lt;/span&gt;                          &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;duration_ms&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;SpanAttributes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'http.method'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;           &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;http_method&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;SpanAttributes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'http.route'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;            &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;route&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;SpanAttributes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'db.statement'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;          &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="k"&gt;sql&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;SpanAttributes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'db.rows_affected'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;      &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="k"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;StatusCode&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nb"&gt;Timestamp&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;otel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;otel_traces&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;TraceId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'4bf92f3577b34da6a3ce929d0e0e4736'&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="nb"&gt;Timestamp&lt;/span&gt; &lt;span class="k"&gt;ASC&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Result — full waterfall in one query:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SpanName                          Service              duration_ms
──────────────────────────────    ──────────────────── ───────────
http.server POST /api/trade       api-gateway   245.3 ms
  └─ trade.execute                trading-service      241.1 ms
       ├─ db.query (risk check)   trading-service       18.4 ms  ← SELECT mart_user_risk_profile
       ├─ db.query (insert trade) trading-service        4.2 ms  ← INSERT INTO mart_trades_futures
       ├─ SELECT (internal CH)    clickhouse             3.1 ms  ← ClickHouse internal span
       └─ kafka.produce           trading-service        2.1 ms
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  8. Full-Text Log Search — Replacing Kibana Discover
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Updated Position — ClickHouse Now Competitive on Full-Text Search
&lt;/h3&gt;

&lt;p&gt;Full-text search used to be the primary reason to keep Elasticsearch. ClickHouse has significantly closed this gap with a &lt;strong&gt;new &lt;code&gt;text&lt;/code&gt; index type&lt;/strong&gt; that works natively on object storage (S3) with the same performance as local disk — removing the last major technical advantage Elasticsearch held.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Reference: &lt;a href="https://clickhouse.com/blog/clickhouse-full-text-search-object-storage" rel="noopener noreferrer"&gt;ClickHouse Full-Text Search on Object Storage&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  New &lt;code&gt;text&lt;/code&gt; Index — How to Add It to otel_logs
&lt;/h3&gt;

&lt;p&gt;The text index is currently behind an experimental flag. Enable it before the &lt;code&gt;ALTER&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;allow_experimental_full_text_index&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Add the text index to the Body column.&lt;/span&gt;
&lt;span class="c1"&gt;-- `tokenizer = 'default'` splits on non-alphanumeric characters (good general default).&lt;/span&gt;
&lt;span class="c1"&gt;-- `case_sensitive = false` lowercases tokens at index *build time* — query the raw column.&lt;/span&gt;
&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;otel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;otel_logs&lt;/span&gt;
    &lt;span class="k"&gt;ADD&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;body_text_idx&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Body&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;TYPE&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tokenizer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'default'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;case_sensitive&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;false&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;GRANULARITY&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Materialize the index on existing data&lt;/span&gt;
&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;otel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;otel_logs&lt;/span&gt; &lt;span class="n"&gt;MATERIALIZE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;body_text_idx&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Don't wrap the column in queries.&lt;/strong&gt; The optimizer matches the index against &lt;code&gt;hasToken(Body, ...)&lt;/code&gt; — wrapping it as &lt;code&gt;hasToken(lower(Body), ...)&lt;/code&gt; defeats the index and forces a full scan. Because the index was built with &lt;code&gt;case_sensitive = false&lt;/code&gt;, &lt;code&gt;hasToken(Body, 'OutOfMemoryError')&lt;/code&gt; already matches &lt;code&gt;outofmemoryerror&lt;/code&gt;, &lt;code&gt;OOM&lt;/code&gt;, etc.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;What the &lt;code&gt;text&lt;/code&gt; index supports:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Function&lt;/th&gt;
&lt;th&gt;Example&lt;/th&gt;
&lt;th&gt;Use Case&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;hasToken&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;hasToken(Body, 'OutOfMemoryError')&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Exact token match — fastest&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;hasAllTokens&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;hasAllTokens(Body, ['trade', 'failed'])&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;All tokens must appear&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;hasAnyTokens&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;hasAnyTokens(Body, ['ERROR', 'FATAL'])&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Any token match&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;LIKE&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;Body LIKE '%OutOfMemoryError%'&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Wildcard match&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;startsWith&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;startsWith(Body, 'WARN')&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Prefix match&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;match&lt;/code&gt; (regex)&lt;/td&gt;
&lt;td&gt;&lt;code&gt;match(Body, 'user_[0-9]+')&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Regex search&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Performance:&lt;/strong&gt; 7.4x speedup vs full table scan on text search (ClickHouse benchmark, 10M rows with array tags).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why it works on S3:&lt;/strong&gt; The index uses sequential dictionary reads with front-coding compression — no random I/O, which is the key constraint on object storage. 94.5% of tokens appear in ≤6 rows, so embedded posting lists handle the vast majority of lookups without reading large posting lists.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Real-world proof point — gitTrends:&lt;/strong&gt; ClickHouse's own reference demo (&lt;a href="https://github.com/ClickHouse/gitTrends" rel="noopener noreferrer"&gt;github.com/ClickHouse/gitTrends&lt;/a&gt;) searches &lt;strong&gt;10 billion+ GitHub events&lt;/strong&gt; using &lt;code&gt;hasToken()&lt;/code&gt; on a &lt;code&gt;body&lt;/code&gt; text index — the exact same pattern used for log search here. The app lets users compare FTS index vs bloom-filter skip index vs full table scan in real time, with live row-scan counters streamed from ClickHouse. Sub-second queries at 10B rows validate the production viability of &lt;code&gt;hasToken()&lt;/code&gt; for high-volume text search workloads.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  How ClickHouse Handles Log Search
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Keyword Search — New &lt;code&gt;text&lt;/code&gt; index (preferred)
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Uses new text index — 7.4x faster than full scan&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="nb"&gt;Timestamp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ServiceName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;SeverityText&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Body&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;otel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;otel_logs&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;hasToken&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Body&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'OutOfMemoryError'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="nb"&gt;Timestamp&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="n"&gt;HOUR&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="nb"&gt;Timestamp&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- LIKE also uses the text index automatically&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="nb"&gt;Timestamp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ServiceName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;SeverityText&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Body&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;otel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;otel_logs&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;Body&lt;/span&gt; &lt;span class="k"&gt;LIKE&lt;/span&gt; &lt;span class="s1"&gt;'%OutOfMemoryError%'&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="nb"&gt;Timestamp&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="n"&gt;HOUR&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="nb"&gt;Timestamp&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Exact TraceId Lookup (bloom filter)
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Near O(1) — bloom_filter index makes this very fast&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;otel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;otel_logs&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;TraceId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'abc123def456'&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="nb"&gt;Timestamp&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Structured Attribute Search
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Filter on OTEL log attributes — common in structured logging&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;ServiceName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;error_count&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;otel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;otel_logs&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;LogAttributes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'http.status_code'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'500'&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="nb"&gt;Timestamp&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;today&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;ServiceName&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;error_count&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Multi-Condition Log Search (most common Kibana query)
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- "All ERROR logs from trading-service in last 2 hours that mention user_id"&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="nb"&gt;Timestamp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;SeverityText&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;Body&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;LogAttributes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'user_id'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;    &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;LogAttributes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'error_code'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;error_code&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;TraceId&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;otel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;otel_logs&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;ServiceName&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'trading-service'&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;SeverityText&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'ERROR'&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="nb"&gt;Timestamp&lt;/span&gt;    &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="n"&gt;HOUR&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;hasToken&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Body&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'user_id'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;     &lt;span class="c1"&gt;-- uses text index&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="nb"&gt;Timestamp&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Case-Insensitive Search (built into the index)
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- The text index was built with case_sensitive = false, so tokens are lowercased&lt;/span&gt;
&lt;span class="c1"&gt;-- at index time. Query the raw column — wrapping it in lower() defeats the index.&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="nb"&gt;Timestamp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ServiceName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Body&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;otel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;otel_logs&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;hasToken&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Body&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'OutOfMemoryError'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="c1"&gt;-- matches OOM/oom/Oom case-insensitively&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="nb"&gt;Timestamp&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="n"&gt;HOUR&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Full-Text Search Comparison — Updated
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Query Type&lt;/th&gt;
&lt;th&gt;Elasticsearch&lt;/th&gt;
&lt;th&gt;ClickHouse&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Exact token match (&lt;code&gt;hasToken&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;~10ms&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~15-30ms&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Near-comparable with text index&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Keyword search (&lt;code&gt;LIKE&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;~10ms&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~20-50ms&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;text index — 7.4x faster than scan&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Structured attribute filter&lt;/td&gt;
&lt;td&gt;~50ms&lt;/td&gt;
&lt;td&gt;~5-20ms&lt;/td&gt;
&lt;td&gt;CH wins (columnar)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Aggregation (error count by service)&lt;/td&gt;
&lt;td&gt;~200ms-2s&lt;/td&gt;
&lt;td&gt;~10-50ms&lt;/td&gt;
&lt;td&gt;CH wins significantly&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Time-range + service filter&lt;/td&gt;
&lt;td&gt;~100ms&lt;/td&gt;
&lt;td&gt;~10-30ms&lt;/td&gt;
&lt;td&gt;CH wins (partition pruning)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;TraceId lookup&lt;/td&gt;
&lt;td&gt;~10ms&lt;/td&gt;
&lt;td&gt;~20-50ms&lt;/td&gt;
&lt;td&gt;Comparable (bloom filter)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Free-text fuzzy search&lt;/td&gt;
&lt;td&gt;Excellent&lt;/td&gt;
&lt;td&gt;Good (regex via &lt;code&gt;match()&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;ES still leads for fuzzy&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Search on S3/object storage&lt;/td&gt;
&lt;td&gt;Degraded (SSD required)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Full speed on S3&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;CH advantage — no SSD needed&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Bottom line:&lt;/strong&gt; ~95% of observability queries are structured (service + time + severity + attribute) — ClickHouse wins on all of those. For the remaining ~5% requiring keyword search in log bodies, the new &lt;code&gt;text&lt;/code&gt; index brings ClickHouse to near-Elasticsearch performance. The only remaining ES advantage is fuzzy/phrase search, which is rarely needed for structured application logs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Coming soon in ClickHouse:&lt;/strong&gt; phrase search (position-aware token matching) and JSON column indexing — which will close the remaining gap further.&lt;/p&gt;




&lt;h2&gt;
  
  
  9. Visualization — Grafana Replaces Kibana
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Setup
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install ClickHouse datasource plugin&lt;/span&gt;
grafana-cli plugins &lt;span class="nb"&gt;install &lt;/span&gt;grafana-clickhouse-datasource

&lt;span class="c"&gt;# Or in docker-compose:&lt;/span&gt;
environment:
  - &lt;span class="nv"&gt;GF_INSTALL_PLUGINS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;grafana-clickhouse-datasource
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Datasource Configuration
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# grafana/provisioning/datasources/clickhouse.yaml&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
&lt;span class="na"&gt;datasources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ClickHouse-OTEL&lt;/span&gt;
    &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;grafana-clickhouse-datasource&lt;/span&gt;
    &lt;span class="na"&gt;uid&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;clickhouse-otel&lt;/span&gt;
    &lt;span class="na"&gt;jsonData&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;host&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;clickhouse&lt;/span&gt;
      &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;9000&lt;/span&gt;
      &lt;span class="na"&gt;database&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;otel&lt;/span&gt;
      &lt;span class="na"&gt;username&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;grafana_readonly&lt;/span&gt;
    &lt;span class="na"&gt;secureJsonData&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;password&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${GRAFANA_CH_PASSWORD}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Key Dashboards to Build
&lt;/h3&gt;

&lt;h4&gt;
  
  
  1. Service Health Overview
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Error rate per service — last 1 hour, 1-minute buckets&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;toStartOfMinute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;Timestamp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="nb"&gt;time&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ServiceName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;countIf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SeverityText&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'ERROR'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;errors&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;                          &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;total&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;errors&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;total&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;             &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;error_rate_pct&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;otel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;otel_logs&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="nb"&gt;Timestamp&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="n"&gt;HOUR&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="nb"&gt;time&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ServiceName&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="nb"&gt;time&lt;/span&gt; &lt;span class="k"&gt;ASC&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  2. Latency Distribution (P50 / P95 / P99)
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- HTTP endpoint latency percentiles — last 30 minutes&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;SpanAttributes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'http.route'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;          &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;endpoint&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;quantile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;)(&lt;/span&gt;&lt;span class="n"&gt;Duration&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="n"&gt;e6&lt;/span&gt;        &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;p50_ms&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;quantile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;95&lt;/span&gt;&lt;span class="p"&gt;)(&lt;/span&gt;&lt;span class="n"&gt;Duration&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="n"&gt;e6&lt;/span&gt;        &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;p95_ms&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;quantile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;99&lt;/span&gt;&lt;span class="p"&gt;)(&lt;/span&gt;&lt;span class="n"&gt;Duration&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="n"&gt;e6&lt;/span&gt;        &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;p99_ms&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;                               &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;request_count&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;otel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;otel_traces&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;SpanKind&lt;/span&gt;    &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'SPAN_KIND_SERVER'&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="nb"&gt;Timestamp&lt;/span&gt;  &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt; &lt;span class="k"&gt;MINUTE&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;endpoint&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;p99_ms&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  3. Slowest Database Queries
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Slowest ClickHouse queries in last 1 hour&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;SpanAttributes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'db.statement'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;     &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="k"&gt;sql&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;                             &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;calls&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;avg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Duration&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="n"&gt;e6&lt;/span&gt;                 &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;avg_ms&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Duration&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="n"&gt;e6&lt;/span&gt;                 &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;max_ms&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;quantile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;99&lt;/span&gt;&lt;span class="p"&gt;)(&lt;/span&gt;&lt;span class="n"&gt;Duration&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="n"&gt;e6&lt;/span&gt;      &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;p99_ms&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;otel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;otel_traces&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;SpanAttributes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'db.system'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'clickhouse'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'postgresql'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="nb"&gt;Timestamp&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="n"&gt;HOUR&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="k"&gt;sql&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;p99_ms&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  4. Log Explorer (Kibana Discover equivalent)
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Live log tail with filtering — wire to Grafana Logs panel&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="nb"&gt;Timestamp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ServiceName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;SeverityText&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;Body&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;TraceId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;LogAttributes&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;otel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;otel_logs&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="nb"&gt;Timestamp&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="n"&gt;__timeFrom&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;    &lt;span class="c1"&gt;-- Grafana time variable&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="nb"&gt;Timestamp&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="n"&gt;__timeTo&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;ServiceName&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'$service'&lt;/span&gt;      &lt;span class="c1"&gt;-- Grafana template variable&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;SeverityText&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="n"&gt;severity&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="nb"&gt;Timestamp&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  5. Distributed Trace Waterfall
&lt;/h4&gt;

&lt;p&gt;Configure Grafana Explore → Traces with ClickHouse datasource:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Config Field&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Table&lt;/td&gt;
&lt;td&gt;otel.otel_traces&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;TraceID column&lt;/td&gt;
&lt;td&gt;TraceId&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SpanID column&lt;/td&gt;
&lt;td&gt;SpanId&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Parent SpanID&lt;/td&gt;
&lt;td&gt;ParentSpanId&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Start time&lt;/td&gt;
&lt;td&gt;Timestamp&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Duration&lt;/td&gt;
&lt;td&gt;Duration (nanoseconds)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Service name&lt;/td&gt;
&lt;td&gt;ServiceName&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Operation name&lt;/td&gt;
&lt;td&gt;SpanName&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Click any log line with a TraceId → jump directly to full trace waterfall.&lt;/p&gt;

&lt;h3&gt;
  
  
  Correlating Logs + Traces + Business Data in Grafana
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Grafana panel: "Affected users for errors in selected time range"&lt;/span&gt;
&lt;span class="c1"&gt;-- This is impossible in Kibana — requires joining log data with business data&lt;/span&gt;

&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;LogAttributes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'user_id'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;    &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;kyc_status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;                        &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;error_count&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;Timestamp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;               &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;last_error&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;otel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;otel_logs&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;marts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mart_users&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;LogAttributes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'user_id'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;username&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ServiceName&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'$service'&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;SeverityText&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'ERROR'&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;Timestamp&lt;/span&gt;    &lt;span class="k"&gt;BETWEEN&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="n"&gt;__timeFrom&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="n"&gt;__timeTo&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;kyc_status&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;error_count&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Grafana Plugin — Upcoming Features
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;These are confirmed items being prototyped by the ClickHouse Grafana plugin team — not yet shipped. Reference: &lt;a href="https://clickhouse.com/blog/grafana-plugin-vision" rel="noopener noreferrer"&gt;Our Vision for the ClickHouse Grafana Plugin&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;1. Deployment &amp;amp; K8s Annotation Presets&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Grafana annotations are vertical markers on time-series panels that flag notable events. The plugin will generate these automatically from OTel data — no manual query writing:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Preset&lt;/th&gt;
&lt;th&gt;Source&lt;/th&gt;
&lt;th&gt;What it surfaces&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Deployment detection&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;ResourceAttributes['service.version']&lt;/code&gt; change&lt;/td&gt;
&lt;td&gt;Service version change / rollback markers on dashboards&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;K8s lifecycle events&lt;/td&gt;
&lt;td&gt;OTel resource attributes&lt;/td&gt;
&lt;td&gt;Pod restarts, OOM kills, autoscaling events&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For a crypto exchange: deploy a new trading-service version → the timestamp appears as a marker on all latency/error dashboards automatically. Immediately answers "did this error spike start at deployment?"&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. JWT Per-User Query Identity&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Currently, all Grafana users share a single ClickHouse datasource credential. The roadmap item forwards each Grafana user's JWT identity to ClickHouse, enabling:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Row-level access control&lt;/strong&gt; based on actual user identity — compliance team sees only compliance-relevant tables, risk team sees risk tables&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Per-user audit trail&lt;/strong&gt; in ClickHouse query log — every dashboard query is attributed to a named person, not a shared service account&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Per-user cost tracking&lt;/strong&gt; — token/compute cost per team member&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This closes the gap between Kibana's per-user access model and the current Grafana/ClickHouse setup.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Visual Metrics Builder (OTel Map Column Support)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;OTel metrics (CPU, memory, network I/O) currently require writing SQL aggregation queries by hand. The upcoming metrics builder provides:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Select metric name from a dropdown (populated from the &lt;code&gt;otel_metrics&lt;/code&gt; table)&lt;/li&gt;
&lt;li&gt;Choose aggregation (sum, avg, max, p99)&lt;/li&gt;
&lt;li&gt;Add group-by dimensions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The key improvement: OTel uses Map-type columns (&lt;code&gt;ResourceAttributes&lt;/code&gt;, &lt;code&gt;LogAttributes&lt;/code&gt;) containing key-value pairs. The builder will expose a &lt;strong&gt;key picker&lt;/strong&gt; so users can filter on &lt;code&gt;ResourceAttributes['k8s.namespace.name']&lt;/code&gt; or &lt;code&gt;ResourceAttributes['host.name']&lt;/code&gt; without writing bracket notation. Makes infrastructure metrics explorable without SQL.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Out-of-the-Box OTel + K8s Dashboards&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The plugin will ship importable JSON dashboards covering:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Log volume by severity with per-service breakdown&lt;/li&gt;
&lt;li&gt;Trace duration distribution and service dependency map&lt;/li&gt;
&lt;li&gt;Per-service RED metrics (request rate, error rate, duration)&lt;/li&gt;
&lt;li&gt;Top spans visibility&lt;/li&gt;
&lt;li&gt;Kubernetes observability (namespaces, pods, nodes)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Goal: &lt;strong&gt;from data ingestion to usable dashboards in minutes&lt;/strong&gt;, not hours of manual panel building.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Compact Search-First Mode&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A new query mode resembling the ClickStack UI — a search bar with filter pills, no SQL required for common tasks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Click &lt;code&gt;+&lt;/code&gt; / &lt;code&gt;-&lt;/code&gt; on any field value in a log detail panel to instantly add include/exclude filters&lt;/li&gt;
&lt;li&gt;Select text within a log body to add a "line contains" full-text filter (backed by &lt;code&gt;hasToken()&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Facet autocomplete for column names, operators, and values&lt;/li&gt;
&lt;li&gt;SQL preview pane shows the generated query live — "Edit as SQL" button opens the editor pre-populated&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Aimed at operators and on-call engineers who need to investigate quickly without knowing ClickHouse SQL.&lt;/p&gt;




&lt;h2&gt;
  
  
  10. Data Retention &amp;amp; Tiered Storage
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Cost Driver — Retention × Volume
&lt;/h3&gt;

&lt;p&gt;Elasticsearch cost is roughly: &lt;code&gt;daily_volume_GB × retention_days × cost_per_GB_per_day&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;ClickHouse changes the equation at two levels:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Compression:&lt;/strong&gt; 1TB of raw logs → ~200GB in ClickHouse (&lt;strong&gt;5x smaller&lt;/strong&gt; end-to-end; 16x on column files — ClickHouse/TextBench)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tiered storage:&lt;/strong&gt; Hot recent data on SSD, cold older data on cheap S3&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  TTL — Simple One-Line Retention
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Delete logs older than 90 days — runs automatically in background&lt;/span&gt;
&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;otel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;otel_logs&lt;/span&gt;
    &lt;span class="k"&gt;MODIFY&lt;/span&gt; &lt;span class="n"&gt;TTL&lt;/span&gt; &lt;span class="n"&gt;toDateTime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;Timestamp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="mi"&gt;90&lt;/span&gt; &lt;span class="k"&gt;DAY&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Different retention per severity.&lt;/span&gt;
&lt;span class="c1"&gt;-- ClickHouse evaluates TTL expressions in order and applies the FIRST one whose&lt;/span&gt;
&lt;span class="c1"&gt;-- WHERE clause matches a row. Put the most-specific (longest-retained) classes&lt;/span&gt;
&lt;span class="c1"&gt;-- first; the no-WHERE clause acts as the default fallback.&lt;/span&gt;
&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;otel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;otel_logs&lt;/span&gt;
    &lt;span class="k"&gt;MODIFY&lt;/span&gt; &lt;span class="n"&gt;TTL&lt;/span&gt;
        &lt;span class="n"&gt;toDateTime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;Timestamp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="mi"&gt;365&lt;/span&gt; &lt;span class="k"&gt;DAY&lt;/span&gt; &lt;span class="k"&gt;DELETE&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;SeverityText&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'CRITICAL'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;-- critical: 1 year&lt;/span&gt;
        &lt;span class="n"&gt;toDateTime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;Timestamp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt;  &lt;span class="mi"&gt;90&lt;/span&gt; &lt;span class="k"&gt;DAY&lt;/span&gt; &lt;span class="k"&gt;DELETE&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;SeverityText&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'ERROR'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;     &lt;span class="c1"&gt;-- errors:   90 days&lt;/span&gt;
        &lt;span class="n"&gt;toDateTime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;Timestamp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt;   &lt;span class="mi"&gt;7&lt;/span&gt; &lt;span class="k"&gt;DAY&lt;/span&gt; &lt;span class="k"&gt;DELETE&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;                                  &lt;span class="c1"&gt;-- default:   7 days&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Tiered Storage — Hot/Cold/Archive
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Storage policy: SSD (0-7 days) → S3 (7-90 days) → delete (90+ days)&lt;/span&gt;
&lt;span class="c1"&gt;-- Define in storage_configuration in config.xml:&lt;/span&gt;

&lt;span class="c1"&gt;-- hot:  /var/lib/clickhouse/data  (local SSD, fast)&lt;/span&gt;
&lt;span class="c1"&gt;-- cold: s3://clickhouse-cold-tier/ (S3, cheap)&lt;/span&gt;

&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;otel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;otel_logs&lt;/span&gt;
    &lt;span class="k"&gt;MODIFY&lt;/span&gt; &lt;span class="n"&gt;TTL&lt;/span&gt;
        &lt;span class="n"&gt;toDateTime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;Timestamp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="mi"&gt;7&lt;/span&gt; &lt;span class="k"&gt;DAY&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;VOLUME&lt;/span&gt; &lt;span class="s1"&gt;'cold'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;    &lt;span class="c1"&gt;-- move to S3&lt;/span&gt;
        &lt;span class="n"&gt;toDateTime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;Timestamp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="mi"&gt;90&lt;/span&gt; &lt;span class="k"&gt;DAY&lt;/span&gt; &lt;span class="k"&gt;DELETE&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;             &lt;span class="c1"&gt;-- delete&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Storage Comparison
&lt;/h3&gt;

&lt;p&gt;Real benchmark numbers from ClickHouse/TextBench (identical hardware: AWS m6i.8xlarge, OTEL log data):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dataset&lt;/th&gt;
&lt;th&gt;Elasticsearch&lt;/th&gt;
&lt;th&gt;ClickHouse&lt;/th&gt;
&lt;th&gt;Ratio&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1B rows&lt;/td&gt;
&lt;td&gt;245 GB&lt;/td&gt;
&lt;td&gt;49 GB&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;5x smaller&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10B rows&lt;/td&gt;
&lt;td&gt;~1.2 TB&lt;/td&gt;
&lt;td&gt;~245 GB&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;5x smaller&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;50B rows&lt;/td&gt;
&lt;td&gt;12 TB&lt;/td&gt;
&lt;td&gt;2.4 TB&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;5x smaller&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Column file compression&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;16x&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Why Elasticsearch Uses 5x More Storage — Component Breakdown
&lt;/h3&gt;

&lt;p&gt;The 5x gap is not just compression. It comes from four on-disk structures that Elasticsearch maintains but ClickHouse either handles more efficiently or doesn't need at all (at 50B rows):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Storage Component&lt;/th&gt;
&lt;th&gt;Elasticsearch&lt;/th&gt;
&lt;th&gt;ClickHouse&lt;/th&gt;
&lt;th&gt;Why the difference&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Columnar storage (doc_values / column files)&lt;/td&gt;
&lt;td&gt;5.02 TiB&lt;/td&gt;
&lt;td&gt;1.92 TiB&lt;/td&gt;
&lt;td&gt;CH chains codecs per column (Delta+ZSTD, GCD) vs ES generic compression&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Inverted index&lt;/td&gt;
&lt;td&gt;3.37 TiB&lt;/td&gt;
&lt;td&gt;515 GiB&lt;/td&gt;
&lt;td&gt;CH inverted index is designed for analytics granules, not Lucene doc IDs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Stored fields (&lt;code&gt;_source&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;3.00 TiB&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;ES stores original JSON to reconstruct documents; CH reconstructs from columns directly&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Points + norms (BKD trees, relevance scoring)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;617 GiB&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;ES maintains numeric range indexes and per-doc relevance weights; CH uses sparse primary index (320 MiB)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;12.01 TiB&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;2.43 TiB&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;5x smaller&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;The two biggest drivers of Elasticsearch's overhead:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;_source&lt;/code&gt; (3 TiB) — a near-complete second copy of all log data stored as compressed JSON so Elasticsearch can reconstruct the original document on retrieval. ClickHouse has no equivalent because it reconstructs rows directly from individual column files.&lt;/li&gt;
&lt;li&gt;Points + norms (617 GiB) — BKD-tree numeric range indexes and per-document relevance scoring metadata. Useful for web search ranking; irrelevant for observability queries. ClickHouse's entire sparse primary index for 50B rows is 320 MiB.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Ingestion speed (50B rows):&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;System&lt;/th&gt;
&lt;th&gt;Time&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;ClickHouse&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Under 4 hours&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Elasticsearch&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;~5 days&lt;/strong&gt; (after pipeline tuning)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Estimated storage at this scale:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;Elasticsearch&lt;/th&gt;
&lt;th&gt;ClickHouse&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;90-day retention&lt;/td&gt;
&lt;td&gt;~9TB on SSD&lt;/td&gt;
&lt;td&gt;~1.8TB on SSD (or much less on S3)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1-year retention&lt;/td&gt;
&lt;td&gt;~36TB&lt;/td&gt;
&lt;td&gt;~7TB on S3 (cheap)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Storage cost/year&lt;/td&gt;
&lt;td&gt;~$100K+&lt;/td&gt;
&lt;td&gt;~$2-5K&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  11. AI Layer — Natural Language Over Logs and Traces
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Unique Advantage
&lt;/h3&gt;

&lt;p&gt;Since ClickHouse already hosts the Agentic AI Platform (LibreChat + Qwen + MCP), observability data in the same cluster is &lt;strong&gt;automatically queryable via plain English&lt;/strong&gt;. No additional setup.&lt;/p&gt;

&lt;h3&gt;
  
  
  Example AI Queries Over Logs
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;User&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;"Which services had the most errors in the last hour?"&lt;/span&gt;
&lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="k"&gt;SQL&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;ServiceName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;otel_logs&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;SeverityText&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'ERROR'&lt;/span&gt;
        &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="nb"&gt;Timestamp&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="n"&gt;HOUR&lt;/span&gt; &lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;ServiceName&lt;/span&gt; &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;

&lt;span class="k"&gt;User&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;"Show me all traces where ClickHouse queries took more than 500ms today"&lt;/span&gt;
&lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="k"&gt;SQL&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;TraceId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ServiceName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Duration&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="n"&gt;e6&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;ms&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;SpanAttributes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'db.statement'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="k"&gt;sql&lt;/span&gt;
        &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;otel_traces&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;SpanAttributes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'db.system'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'clickhouse'&lt;/span&gt;
        &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;Duration&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;500000000&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="nb"&gt;Timestamp&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;today&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;Duration&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;

&lt;span class="k"&gt;User&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;"Which users were affected by the trading-service errors between 2pm and 3pm today?"&lt;/span&gt;
&lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="k"&gt;SQL&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;LogAttributes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'user_id'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;kyc_status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;errors&lt;/span&gt;
        &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;otel_logs&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;mart_users&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;LogAttributes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'user_id'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;username&lt;/span&gt;
        &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ServiceName&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'trading-service'&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;SeverityText&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'ERROR'&lt;/span&gt;
        &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;Timestamp&lt;/span&gt; &lt;span class="k"&gt;BETWEEN&lt;/span&gt; &lt;span class="n"&gt;today&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;toIntervalHour&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;14&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;today&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;toIntervalHour&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;errors&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;mart_users&lt;/code&gt; is the consolidated user mart from the existing data platform — KYC status, country, account tier, registration date — not part of this observability migration. The point is that observability data (logs/traces in &lt;code&gt;otel.*&lt;/code&gt;) and business data (&lt;code&gt;marts.*&lt;/code&gt;) live in the &lt;strong&gt;same ClickHouse cluster&lt;/strong&gt; and join in a single SQL statement. Replace &lt;code&gt;mart_users&lt;/code&gt; with whatever your equivalent users table is.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The last query is &lt;strong&gt;impossible in Kibana&lt;/strong&gt; — it crosses observability data (logs) with business data (users). In ClickHouse, it's one SQL query the AI generates automatically.&lt;/p&gt;

&lt;h3&gt;
  
  
  Business Glossary Additions for Observability
&lt;/h3&gt;

&lt;p&gt;Add to &lt;code&gt;business_glossary.yaml&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Observability terms&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;logs"&lt;/span&gt; &lt;span class="s"&gt;= otel_logs WHERE SeverityText = 'ERROR'&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;slow&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;query"&lt;/span&gt; &lt;span class="s"&gt;= otel_traces WHERE SpanAttributes['db.system'] IN ('clickhouse','postgresql') AND Duration &amp;gt; &lt;/span&gt;&lt;span class="m"&gt;500000000&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;HTTP&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;5xx"&lt;/span&gt; &lt;span class="s"&gt;= otel_logs WHERE LogAttributes['http.status_code'] LIKE '5%'&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;trade&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;service"&lt;/span&gt; &lt;span class="s"&gt;= ServiceName = 'trading-service'&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;trace"&lt;/span&gt; &lt;span class="s"&gt;= otel_traces WHERE TraceId = '&amp;lt;id&amp;gt;'&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;latency"&lt;/span&gt; &lt;span class="s"&gt;= Duration / 1e6 (milliseconds)&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;p99"&lt;/span&gt; &lt;span class="s"&gt;= quantile(0.99)(Duration) / 1e6&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;rate"&lt;/span&gt; &lt;span class="s"&gt;= countIf(SeverityText='ERROR') / count() * &lt;/span&gt;&lt;span class="m"&gt;100&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  12. Migration Plan — Zero Downtime Cutover
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Option A — Standard OTEL Collector (Recommended for Simple Setups)
&lt;/h3&gt;

&lt;p&gt;If you run a small number of centralized OTEL Collectors (one per environment), the standard approach is sufficient — edit the collector config YAML to add the ClickHouse exporter alongside the existing Elasticsearch exporter for dual-write. No extra tooling needed.&lt;/p&gt;

&lt;h3&gt;
  
  
  Option B — BindPlane (Recommended for Large Collector Fleets)
&lt;/h3&gt;

&lt;p&gt;If you run OTEL Collectors on many individual servers/services, &lt;strong&gt;BindPlane&lt;/strong&gt; is worth considering. It is a centralized management platform for OTEL Collector fleets — instead of editing YAML configs on each server manually, you manage all collector configurations from one dashboard.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;Without BindPlane&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="s"&gt;Edit otel-collector.yaml on server &lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;
  &lt;span class="s"&gt;Edit otel-collector.yaml on server &lt;/span&gt;&lt;span class="m"&gt;2&lt;/span&gt;
  &lt;span class="s"&gt;Edit otel-collector.yaml on server N&lt;/span&gt;
  &lt;span class="s"&gt;Restart each collector&lt;/span&gt;
  &lt;span class="s"&gt;...&lt;/span&gt;

&lt;span class="na"&gt;With BindPlane&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="s"&gt;Add ClickHouse destination once in BindPlane UI&lt;/span&gt;
  &lt;span class="s"&gt;Roll out to entire fleet in one click&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What BindPlane adds for this migration:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Capability&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Central config management&lt;/td&gt;
&lt;td&gt;One change pushes to all collectors instantly&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Dual-write in one click&lt;/td&gt;
&lt;td&gt;Route to Elasticsearch AND ClickHouse simultaneously without touching individual collectors&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Service-by-service cutover&lt;/td&gt;
&lt;td&gt;Route &lt;code&gt;trading-service&lt;/code&gt; logs to ClickHouse first, validate, then add more services gradually&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Severity-based routing&lt;/td&gt;
&lt;td&gt;Route ERROR logs to Elasticsearch (keep during validation), INFO/DEBUG to ClickHouse only&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Safe fleet rollout&lt;/td&gt;
&lt;td&gt;Progressive rollout with automatic rollback on failure&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;130+ sources and destinations&lt;/td&gt;
&lt;td&gt;Supports standard OTEL ClickHouse exporter as destination&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;BindPlane for self-managed ClickHouse:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;BindPlane's native "ClickStack destination" connects to ClickHouse Cloud managed product. For self-managed ClickHouse, use the standard OTEL &lt;code&gt;clickhouseexporter&lt;/code&gt; as a generic OTLP destination in BindPlane — same outcome, slightly more manual config. Example BindPlane destination config:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# BindPlane destination config for self-managed ClickHouse&lt;/span&gt;
&lt;span class="na"&gt;destination&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;otlp&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;clickhouse-self-managed&lt;/span&gt;
  &lt;span class="na"&gt;config&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;endpoint&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tcp://clickhouse.internal:9000&lt;/span&gt;
    &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;x-clickhouse-database&lt;/span&gt;
        &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;otel&lt;/span&gt;
    &lt;span class="na"&gt;tls&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;insecure&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Reference:&lt;/strong&gt; &lt;a href="https://clickhouse.com/blog/bindplane-faster-otel-migrations-to-clickstack" rel="noopener noreferrer"&gt;https://clickhouse.com/blog/bindplane-faster-otel-migrations-to-clickstack&lt;/a&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  Phase 1 — Deploy &amp;amp; Validate (Week 1-2)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Task&lt;/th&gt;
&lt;th&gt;Details&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Deploy ClickHouse schema&lt;/td&gt;
&lt;td&gt;Create otel_logs, otel_traces, otel_metrics tables&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Add ClickHouse exporter to OTEL Collector&lt;/td&gt;
&lt;td&gt;Dual-write: send to both Elasticsearch AND ClickHouse (via YAML edit or BindPlane)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Set up Grafana&lt;/td&gt;
&lt;td&gt;Install ClickHouse datasource, build core dashboards&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Validate data parity&lt;/td&gt;
&lt;td&gt;Compare row counts, spot-check log content&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Test trace waterfall&lt;/td&gt;
&lt;td&gt;Pick 5-10 real TraceIds, verify waterfall in Grafana matches Kibana&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Phase 2 — Parallel Run (Week 3-4)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Task&lt;/th&gt;
&lt;th&gt;Details&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Run both stacks simultaneously&lt;/td&gt;
&lt;td&gt;Elasticsearch + ClickHouse receive same data&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Migrate dashboards&lt;/td&gt;
&lt;td&gt;Rebuild all Kibana dashboards in Grafana&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Migrate alerts&lt;/td&gt;
&lt;td&gt;Recreate all Kibana alerts in Grafana Alerting&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Train teams&lt;/td&gt;
&lt;td&gt;Grafana walkthrough for each team&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Build AI log queries&lt;/td&gt;
&lt;td&gt;Add observability terms to business glossary&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gradual service cutover (optional)&lt;/td&gt;
&lt;td&gt;Use BindPlane to route one service at a time to ClickHouse-only, validate, expand&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Phase 3 — Cutover (Week 5)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Task&lt;/th&gt;
&lt;th&gt;Details&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Confirm all dashboards working in Grafana&lt;/td&gt;
&lt;td&gt;Sign-off from each team&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Remove Elasticsearch exporter from OTEL Collector&lt;/td&gt;
&lt;td&gt;Single line config change (or one click in BindPlane)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Verify ClickHouse-only flow&lt;/td&gt;
&lt;td&gt;24-hour monitoring window&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cancel Elasticsearch subscription&lt;/td&gt;
&lt;td&gt;After 48-hour clean run&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Phase 4 — Optimise (Week 6+)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Task&lt;/th&gt;
&lt;th&gt;Details&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Tune TTL and tiered storage&lt;/td&gt;
&lt;td&gt;Configure S3 cold tier based on actual usage patterns&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Enable AI log queries in LibreChat&lt;/td&gt;
&lt;td&gt;Add observability glossary, test with teams&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Set up cross-data dashboards&lt;/td&gt;
&lt;td&gt;Logs + business data correlation panels in Grafana&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Performance tuning&lt;/td&gt;
&lt;td&gt;Review slow queries via system.query_log, tune ORDER BY keys if needed&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Risk Mitigation During Migration
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Dual-write period:&lt;/strong&gt; Both systems receive data simultaneously for 4 weeks — no data loss risk&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rollback:&lt;/strong&gt; Removing ClickHouse exporter (or reverting BindPlane config) restores Elasticsearch-only in seconds&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No application changes:&lt;/strong&gt; OTEL SDK configuration is unchanged throughout&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gradual cutover:&lt;/strong&gt; Cut over one service at a time using BindPlane routing rules if preferred&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  13. Cost Analysis
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Current Elasticsearch Cost Breakdown
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Annual Cost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Elasticsearch managed service (compute)&lt;/td&gt;
&lt;td&gt;~$200K&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Storage (SSD, replicated)&lt;/td&gt;
&lt;td&gt;~$100K&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Licensing (Elastic managed / premium)&lt;/td&gt;
&lt;td&gt;~$50K&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Engineering time (ops, tuning, ILM)&lt;/td&gt;
&lt;td&gt;~$50K&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~$400K&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  ClickHouse Cost (Self-Hosted)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Annual Cost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;ClickHouse nodes (2x m7g.4xlarge, Graviton3)&lt;/td&gt;
&lt;td&gt;~$18K&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Storage SSD (hot, last 7 days)&lt;/td&gt;
&lt;td&gt;~$3K&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;S3 (cold, 7-90 days)&lt;/td&gt;
&lt;td&gt;~$2K&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ClickHouse Keeper (3x t3.small for consensus)&lt;/td&gt;
&lt;td&gt;~$2K&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Engineering time (minimal ops)&lt;/td&gt;
&lt;td&gt;~$10K&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~$35K&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  ClickHouse Cost (ClickHouse Cloud — Recommended Start)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Annual Cost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;ClickHouse Cloud (auto-scaling)&lt;/td&gt;
&lt;td&gt;~$36-60K&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;S3 tiered storage&lt;/td&gt;
&lt;td&gt;Included&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Engineering time&lt;/td&gt;
&lt;td&gt;~$5K (fully managed)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~$41-65K&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Savings Summary
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;Annual Cost&lt;/th&gt;
&lt;th&gt;Saving vs Elasticsearch&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Current (Elasticsearch)&lt;/td&gt;
&lt;td&gt;$400K&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ClickHouse Cloud&lt;/td&gt;
&lt;td&gt;$41-65K&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$335-359K saved (84-90%)&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ClickHouse Self-Hosted&lt;/td&gt;
&lt;td&gt;$35K&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$365K saved (91%)&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Additional Value (Not Counted in Savings)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;AI queries over logs — eliminates ad-hoc log digging by engineering (~20 hrs/month)&lt;/li&gt;
&lt;li&gt;Cross-data correlation — compliance/risk can correlate errors with affected users instantly&lt;/li&gt;
&lt;li&gt;Longer retention — at ClickHouse costs, retain 1 year vs 90 days for same budget&lt;/li&gt;
&lt;li&gt;Unified cluster — observability + business data in one system, one operations team&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  14. Risk Assessment
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Risk&lt;/th&gt;
&lt;th&gt;Likelihood&lt;/th&gt;
&lt;th&gt;Impact&lt;/th&gt;
&lt;th&gt;Mitigation&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Full-text search gaps&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;95% of queries are structured; bloom filters cover keyword search&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Data loss during migration&lt;/td&gt;
&lt;td&gt;Very Low&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;4-week dual-write window eliminates risk&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Grafana learning curve&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;Grafana is widely used; team familiarity is high&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ClickHouse cluster instability&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;2-replica HA, Keeper for consensus, daily S3 backups&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OTEL Collector overload&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;Batch processor + memory limiter configured; scale collector horizontally&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Schema changes in new service&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;MergeTree handles new columns gracefully; OTEL schema is stable&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cold data access latency (S3)&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;S3 queries are slower but acceptable for historical lookups&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  15. Success Metrics
&lt;/h2&gt;

&lt;h3&gt;
  
  
  30-Day Targets (Post Cutover)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Target&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;All dashboards migrated to Grafana&lt;/td&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Elasticsearch subscription cancelled&lt;/td&gt;
&lt;td&gt;Done&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Log query latency (aggregation)&lt;/td&gt;
&lt;td&gt;&amp;lt; 500ms for last 24h queries&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Trace lookup latency&lt;/td&gt;
&lt;td&gt;&amp;lt; 2 seconds for TraceId lookup&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Data compression vs Elasticsearch&lt;/td&gt;
&lt;td&gt;&amp;gt; 5x smaller footprint&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Alert parity&lt;/td&gt;
&lt;td&gt;All Kibana alerts recreated in Grafana&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  90-Day Targets
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Target&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Annual cost reduction&lt;/td&gt;
&lt;td&gt;&amp;gt; $300K vs Elasticsearch baseline&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AI log queries active&lt;/td&gt;
&lt;td&gt;Teams using LibreChat for log analysis&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cross-data queries&lt;/td&gt;
&lt;td&gt;At least 5 dashboards joining logs + business data&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Retention extended&lt;/td&gt;
&lt;td&gt;From 90 days to 180+ days (same cost)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Engineering time saved&lt;/td&gt;
&lt;td&gt;20+ hrs/month (no more ad-hoc log queries)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  16. Reference Links
&lt;/h2&gt;

&lt;h3&gt;
  
  
  ClickHouse Observability
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Resource&lt;/th&gt;
&lt;th&gt;URL&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;ClickHouse Observability docs&lt;/td&gt;
&lt;td&gt;&lt;a href="https://clickhouse.com/docs/use-cases/observability" rel="noopener noreferrer"&gt;https://clickhouse.com/docs/use-cases/observability&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Observability solution guide&lt;/td&gt;
&lt;td&gt;&lt;a href="https://clickhouse.com/docs/use-cases/observability/overview" rel="noopener noreferrer"&gt;https://clickhouse.com/docs/use-cases/observability/overview&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Do you still need Elasticsearch for log analytics?&lt;/td&gt;
&lt;td&gt;&lt;a href="https://clickhouse.com/blog/elasticsearch-log-analytics-clickhouse" rel="noopener noreferrer"&gt;https://clickhouse.com/blog/elasticsearch-log-analytics-clickhouse&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Building an Observability solution with ClickHouse — Part 1 (Logs)&lt;/td&gt;
&lt;td&gt;&lt;a href="https://clickhouse.com/blog/storing-log-data-in-clickhouse-fluent-bit-vector-open-telemetry" rel="noopener noreferrer"&gt;https://clickhouse.com/blog/storing-log-data-in-clickhouse-fluent-bit-vector-open-telemetry&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Full-text search on object storage&lt;/td&gt;
&lt;td&gt;&lt;a href="https://clickhouse.com/blog/clickhouse-full-text-search-object-storage" rel="noopener noreferrer"&gt;https://clickhouse.com/blog/clickhouse-full-text-search-object-storage&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;gitTrends — 10B GitHub events with text index (reference impl)&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/ClickHouse/gitTrends" rel="noopener noreferrer"&gt;https://github.com/ClickHouse/gitTrends&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  OpenTelemetry + ClickHouse
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Resource&lt;/th&gt;
&lt;th&gt;URL&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;OTEL Collector ClickHouse exporter&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/exporter/clickhouseexporter" rel="noopener noreferrer"&gt;https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/exporter/clickhouseexporter&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OTEL Collector contrib repo&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/open-telemetry/opentelemetry-collector-contrib" rel="noopener noreferrer"&gt;https://github.com/open-telemetry/opentelemetry-collector-contrib&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OTEL ClickHouse schema reference&lt;/td&gt;
&lt;td&gt;&lt;a href="https://clickhouse.com/docs/use-cases/observability/schema-design" rel="noopener noreferrer"&gt;https://clickhouse.com/docs/use-cases/observability/schema-design&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Grafana + ClickHouse
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Resource&lt;/th&gt;
&lt;th&gt;URL&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Grafana ClickHouse datasource&lt;/td&gt;
&lt;td&gt;&lt;a href="https://grafana.com/grafana/plugins/grafana-clickhouse-datasource" rel="noopener noreferrer"&gt;https://grafana.com/grafana/plugins/grafana-clickhouse-datasource&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Grafana ClickHouse plugin docs&lt;/td&gt;
&lt;td&gt;&lt;a href="https://grafana.com/docs/grafana/latest/datasources/clickhouse" rel="noopener noreferrer"&gt;https://grafana.com/docs/grafana/latest/datasources/clickhouse&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Benchmarks &amp;amp; Case Studies
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Resource&lt;/th&gt;
&lt;th&gt;URL&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Cloudflare: ClickHouse for HTTP logs&lt;/td&gt;
&lt;td&gt;&lt;a href="https://blog.cloudflare.com/log-analytics-using-clickhouse" rel="noopener noreferrer"&gt;https://blog.cloudflare.com/log-analytics-using-clickhouse&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ClickHouse vs Elasticsearch log analytics benchmark&lt;/td&gt;
&lt;td&gt;&lt;a href="https://clickhouse.com/blog/elasticsearch-log-analytics-clickhouse" rel="noopener noreferrer"&gt;https://clickhouse.com/blog/elasticsearch-log-analytics-clickhouse&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Benchmark source code (reproducible)&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/ClickHouse/TextBench" rel="noopener noreferrer"&gt;https://github.com/ClickHouse/TextBench&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;BindPlane — OTEL fleet management for migrations&lt;/td&gt;
&lt;td&gt;&lt;a href="https://clickhouse.com/blog/bindplane-faster-otel-migrations-to-clickstack" rel="noopener noreferrer"&gt;https://clickhouse.com/blog/bindplane-faster-otel-migrations-to-clickstack&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Langfuse + ClickHouse (LLM observability)&lt;/td&gt;
&lt;td&gt;&lt;a href="https://clickhouse.com/blog/langfuse-llm-analytics" rel="noopener noreferrer"&gt;https://clickhouse.com/blog/langfuse-llm-analytics&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Appendix A — Technology Stack Summary
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Technology&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Instrumentation&lt;/td&gt;
&lt;td&gt;OTEL SDK (unchanged)&lt;/td&gt;
&lt;td&gt;Auto-instrument apps — zero code changes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Collection&lt;/td&gt;
&lt;td&gt;OTEL Collector + clickhouseexporter&lt;/td&gt;
&lt;td&gt;Receives and ships logs/traces/metrics to ClickHouse&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Storage&lt;/td&gt;
&lt;td&gt;ClickHouse otel database&lt;/td&gt;
&lt;td&gt;Logs, traces, metrics — compressed columnar storage&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Visualization&lt;/td&gt;
&lt;td&gt;Grafana + ClickHouse datasource&lt;/td&gt;
&lt;td&gt;Dashboards, trace waterfall, log explorer, alerting&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AI queries&lt;/td&gt;
&lt;td&gt;LibreChat + Qwen + MCP (existing)&lt;/td&gt;
&lt;td&gt;Plain English queries over logs and traces&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cold storage&lt;/td&gt;
&lt;td&gt;S3 (tiered via ClickHouse TTL)&lt;/td&gt;
&lt;td&gt;Cheap long-term retention for historical data&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;HA&lt;/td&gt;
&lt;td&gt;ClickHouse 2-replica cluster&lt;/td&gt;
&lt;td&gt;Same HA setup as the existing ClickHouse cluster&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Appendix B — Quick Reference: Kibana → Grafana Equivalents
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Kibana Feature&lt;/th&gt;
&lt;th&gt;Grafana Equivalent&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Discover (log search)&lt;/td&gt;
&lt;td&gt;Explore → Logs panel with ClickHouse query&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Dashboard&lt;/td&gt;
&lt;td&gt;Dashboard (same concept)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Visualize&lt;/td&gt;
&lt;td&gt;Panel with ClickHouse SQL query&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;APM (traces)&lt;/td&gt;
&lt;td&gt;Explore → Traces panel with ClickHouse datasource&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Alerts&lt;/td&gt;
&lt;td&gt;Grafana Alerting (same or better)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Index Lifecycle Management&lt;/td&gt;
&lt;td&gt;ClickHouse TTL (simpler — one SQL line)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;KQL (Kibana Query Language)&lt;/td&gt;
&lt;td&gt;SQL (standard, more powerful)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Lens (drag-drop charts)&lt;/td&gt;
&lt;td&gt;Grafana panel builder&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Closing thoughts
&lt;/h2&gt;

&lt;p&gt;If you're considering this migration, the decisions that matter most:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Don't skip the OTEL Aggregator pattern.&lt;/strong&gt; Agent-only loses data on ClickHouse blips. Run a couple of central aggregators with retry-on-failure — that's the production-grade choice.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use BindPlane if you have a large collector fleet.&lt;/strong&gt; Worth it for fleet-wide config rollout. For a handful of central collectors, standard YAML is fine.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Get the schema right the first time.&lt;/strong&gt; &lt;code&gt;ORDER BY (ServiceName, SeverityText, Timestamp)&lt;/code&gt; and the right CODECs are the difference between a 3× and 30× compression ratio. The schema in Section 6 has been validated at production scale.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Run dual-write for 4 weeks&lt;/strong&gt;, not 1. The gradual cutover is cheap insurance and lets you validate every dashboard/alert before cutting Elasticsearch off.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The AI layer pays for itself.&lt;/strong&gt; Plain-English log queries via LibreChat + an LLM means no more pinging engineering when the compliance team needs a one-off analysis. Once ClickHouse has the data, the AI integration is one config change.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The full schema, OTEL Collector configs, Grafana queries, migration plan, and 16 production-tested recovery runbooks live in the &lt;a href="https://github.com/rakeshtherani/clickhouse-ai-dba" rel="noopener noreferrer"&gt;companion repo on GitHub&lt;/a&gt;. The repo also includes the AI DBA MCP server (152 tools for ClickHouse operations) — if you're operating at scale, that's worth a look.&lt;/p&gt;

&lt;p&gt;If you've migrated off Elasticsearch (or are mid-migration), I'd love to compare notes. Reach out via &lt;a href="https://www.linkedin.com/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; or comment below.&lt;/p&gt;

&lt;p&gt;If this is useful to your team, the deeper architectural piece — &lt;em&gt;Building an Agentic AI Data Platform on ClickHouse&lt;/em&gt; — is coming next.&lt;/p&gt;

</description>
      <category>database</category>
      <category>infrastructure</category>
      <category>monitoring</category>
      <category>performance</category>
    </item>
  </channel>
</rss>
