<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Pavel Gajvoronski</title>
    <description>The latest articles on Forem by Pavel Gajvoronski (@pavelbuild).</description>
    <link>https://forem.com/pavelbuild</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3871429%2F9ce51312-611e-4252-8caa-275a0bfeed3b.jpg</url>
      <title>Forem: Pavel Gajvoronski</title>
      <link>https://forem.com/pavelbuild</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/pavelbuild"/>
    <language>en</language>
    <item>
      <title>I Built 23 Pages in One Day With AI. Then One API Key Almost Killed Everything</title>
      <dc:creator>Pavel Gajvoronski</dc:creator>
      <pubDate>Wed, 15 Apr 2026 10:12:48 +0000</pubDate>
      <link>https://forem.com/pavelbuild/i-built-23-pages-in-one-day-with-ai-then-one-api-key-almost-killed-everything-563e</link>
      <guid>https://forem.com/pavelbuild/i-built-23-pages-in-one-day-with-ai-then-one-api-key-almost-killed-everything-563e</guid>
      <description>&lt;p&gt;This is a build-in-public update on &lt;a href="https://github.com/Pha6ha007/Kepion" rel="noopener noreferrer"&gt;Kepion&lt;/a&gt; — an AI platform that deploys companies from a text description. &lt;a href="https://dev.to/pavelbuild/im-building-a-platform-that-deploys-ai-companies-from-a-single-sentence-32aj"&gt;First post here&lt;/a&gt;.*&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcrqimq6ken544ult0v5b.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcrqimq6ken544ult0v5b.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;"This is a build-in-public update..." &lt;/p&gt;

&lt;p&gt;Two days ago I shared the architecture. Today I want to share what actually happened when I started building — the wins, the disasters, and the numbers.&lt;/p&gt;




&lt;h2&gt;
  
  
  The disaster: 3 hours lost to a phantom API key
&lt;/h2&gt;

&lt;p&gt;I sat down at 8am ready to build. Opened my terminal. Ran GSD-2 (my build orchestrator). Got this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Error: All credentials for "anthropic" are in a cooldown window.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;My Max plan showed 3% usage. The tool said I was rate-limited. For three hours I debugged, restarted, cleared caches, filed a support ticket. The fix?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;unset &lt;/span&gt;ANTHROPIC_API_KEY
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;An old API key from a previous tool installation was silently overriding my subscription. One environment variable. Three hours gone.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The lesson: invisible defaults are the most dangerous bugs in AI tooling.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I'm sharing this because every developer building with AI agents will hit this. Your LLM provider's auth layer has more failure modes than your application code.&lt;/p&gt;




&lt;h2&gt;
  
  
  What GSD-2 actually built in one day
&lt;/h2&gt;

&lt;p&gt;Once the auth was fixed, I pointed GSD-2 at Kepion and let it work. Here's the raw output from a single day:&lt;/p&gt;

&lt;h3&gt;
  
  
  Security hardening (10 items)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Deny-by-default auth middleware&lt;/strong&gt; — every new route is blocked unless explicitly whitelisted&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Path traversal fix&lt;/strong&gt; in vault manager&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;WebSocket authentication&lt;/strong&gt; (was anonymous before)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CORS whitelist&lt;/strong&gt; replacing wildcard &lt;code&gt;*&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Password policy&lt;/strong&gt;: 12+ chars, uppercase, digit, special char&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rate limiting&lt;/strong&gt; by user email instead of IP&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Upload validation&lt;/strong&gt;: file extension whitelist, 5MB limit&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Business ownership verification&lt;/strong&gt; on all endpoints&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Session scoping&lt;/strong&gt; by user_id&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Login attempt tracking&lt;/strong&gt; with 30-minute lockout after 10 failures&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Observability (shipped)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Every HTTP request gets a &lt;code&gt;trace_id&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Every agent call becomes a span linked to the trace&lt;/li&gt;
&lt;li&gt;Slow trace detection (&amp;gt;5s)&lt;/li&gt;
&lt;li&gt;Error trace listing&lt;/li&gt;
&lt;li&gt;All persisted in SQLite&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Cost intelligence (shipped)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Per-agent, per-model, per-business cost breakdown&lt;/li&gt;
&lt;li&gt;Anomaly detection: flags agents with z-score &amp;gt; 2σ above mean&lt;/li&gt;
&lt;li&gt;Cost circuit breaker: blocks requests at configurable limits&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Team Memory (shipped)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Agents save learnings across sessions&lt;/li&gt;
&lt;li&gt;Effectiveness scoring (0.0–1.0)&lt;/li&gt;
&lt;li&gt;Auto context injection — relevant memories prepended to prompts&lt;/li&gt;
&lt;li&gt;Categories: solution, pattern, mistake, optimization&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Checkpoint &amp;amp; Replay (shipped)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Checkpoint after every chain step&lt;/li&gt;
&lt;li&gt;Resume on failure with &lt;code&gt;can_resume: true&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Dead letter queue for chains that fail after all retries&lt;/li&gt;
&lt;li&gt;Configurable retry policies: &lt;code&gt;default&lt;/code&gt;, &lt;code&gt;critical&lt;/code&gt;, &lt;code&gt;fast_fail&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Event-driven triggers (shipped)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;5 trigger types: schedule, webhook, event_pattern, vault_change, threshold&lt;/li&gt;
&lt;li&gt;4 action types: run_agent, run_chain, webhook_out, notify&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Web UI: 23 pages (shipped)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Full Next.js 16 dashboard with collapsible sidebar&lt;/li&gt;
&lt;li&gt;Dashboard, Chat, Agents, Pipelines, Businesses, Integrations&lt;/li&gt;
&lt;li&gt;Vault, Research, Patterns, YouTube, Workflows, Gate&lt;/li&gt;
&lt;li&gt;Costs, Traces, Triggers, Admin, Pricing, Account&lt;/li&gt;
&lt;li&gt;Live support chat widget with typing indicators&lt;/li&gt;
&lt;li&gt;Pricing page with 5 tiers and competitive comparison table&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Telegram bot: fully functional (shipped)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;/start&lt;/code&gt; with auto-registration and JWT token storage&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;/agents&lt;/code&gt;, &lt;code&gt;/agent&lt;/code&gt;, &lt;code&gt;/business&lt;/code&gt;, &lt;code&gt;/status&lt;/code&gt;, &lt;code&gt;/costs&lt;/code&gt;, &lt;code&gt;/help&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Free text → auto-routing to the right agent&lt;/li&gt;
&lt;li&gt;Typing indicators while agents think&lt;/li&gt;
&lt;li&gt;Auth headers on every API call&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The numbers
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Services&lt;/td&gt;
&lt;td&gt;30+&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;API endpoints&lt;/td&gt;
&lt;td&gt;40+&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Agent prompts (v3)&lt;/td&gt;
&lt;td&gt;31 × 17 sections each&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tests&lt;/td&gt;
&lt;td&gt;180+&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Web UI pages&lt;/td&gt;
&lt;td&gt;23&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Telegram commands&lt;/td&gt;
&lt;td&gt;15&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Lines changed in one day&lt;/td&gt;
&lt;td&gt;~3,000+&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;One person. One AI build tool. One day.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  What I learned
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. Security is invisible until it isn't.&lt;/strong&gt; Nobody sees path traversal protection. But without it, the first user with &lt;code&gt;../../etc/passwd&lt;/code&gt; in a vault search owns your server. I'm glad GSD-2 caught every item from the CONCERNS.md audit.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Observability changes everything.&lt;/strong&gt; Before traces, debugging a 5-agent chain was guesswork. Now I can see: request → router (2ms) → researcher (4.3s) → sentinel (1.1s) → warden (0.8s) → response. The bottleneck is always the researcher.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Cost circuit breakers are non-negotiable.&lt;/strong&gt; Without them, one hallucinating agent in a loop burns through your OpenRouter budget in minutes. Our circuit breaker has 4 levels: per-request ($2), per-agent-hourly ($10), per-business-daily ($50), platform-hourly ($100).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Team Memory is the moat.&lt;/strong&gt; Every business Kepion creates makes the next one better. Agents save what worked and what failed. Business #5 benefits from patterns discovered in businesses #1-4. This compounds. Competitors can copy the code — they can't copy the accumulated knowledge.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's next
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Autonomous Operations&lt;/strong&gt; — agents posting to Twitter, sending emails, running outreach. Every output goes through Sentinel (fact-check) and Warden (quality gate) before publishing. Quality over spam.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Full Deploy Pipeline&lt;/strong&gt; — &lt;code&gt;/deploy chess-school&lt;/code&gt; → buy domain → deploy frontend (Vercel) → deploy backend (Railway) → configure Paddle payments → live URL. One command.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Code Ownership&lt;/strong&gt; — all generated code pushes to the user's GitHub. You own everything. Kepion is the builder, not the landlord.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Questions for you
&lt;/h2&gt;

&lt;p&gt;I'm genuinely curious:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;How do you handle AI agent costs in production?&lt;/strong&gt; We built a 4-tier model routing system (Free → Budget → Performance → Premium) with auto-escalation on failure. Is anyone doing this differently?&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Team Memory vs RAG — what's your experience?&lt;/strong&gt; We went with vault-based memory with effectiveness scoring instead of pure vector search. The scoring means bad memories decay. Has anyone combined both approaches?&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;What's your threshold for "good enough" security in an MVP?&lt;/strong&gt; We went aggressive (deny-by-default, path traversal, rate limiting) before launch. Some say ship fast, secure later. Curious where others draw the line.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;




&lt;p&gt;Follow the build: &lt;a href="https://github.com/Pha6ha007/Kepion" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; | &lt;a href="https://kepion.app" rel="noopener noreferrer"&gt;kepion.app&lt;/a&gt;&lt;/p&gt;

</description>
      <category>buildinpublic</category>
      <category>agents</category>
      <category>ai</category>
      <category>architecture</category>
    </item>
    <item>
      <title>TraceHawk vs Datadog for AI Agent Monitoring in 2026</title>
      <dc:creator>Pavel Gajvoronski</dc:creator>
      <pubDate>Tue, 14 Apr 2026 08:02:08 +0000</pubDate>
      <link>https://forem.com/pavelbuild/tracehawk-vs-datadog-for-ai-agent-monitoring-in-2026-1noj</link>
      <guid>https://forem.com/pavelbuild/tracehawk-vs-datadog-for-ai-agent-monitoring-in-2026-1noj</guid>
      <description>&lt;p&gt;"I built TraceHawk after spending hours debugging why my AI agent was making 47 filesystem calls before a single GitHub call. Datadog showed me the waterfall. It didn't show me the why."&lt;/p&gt;

&lt;h1&gt;
  
  
  TraceHawk vs Datadog for AI Agent Monitoring in 2026
&lt;/h1&gt;

&lt;p&gt;I built TraceHawk after spending hours debugging why my AI agent was making 47 filesystem calls before a single GitHub call. Datadog showed me the waterfall. It didn't show me the why.&lt;/p&gt;

&lt;p&gt;This comparison covers what Datadog actually gives you for AI agent observability, where it falls short for MCP-heavy workloads, and why teams are switching to purpose-built tools like TraceHawk. I'm going to be honest about both sides — Datadog is genuinely good at some things, and acknowledging that matters more than cheerleading.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Datadog gives you for AI agents
&lt;/h2&gt;

&lt;p&gt;Datadog's LLM Observability module launched in 2024 and has matured significantly. The Python agent (v10.13.0, June 2025) added MCP client tracing — waterfall diagrams for MCP requests, automatic instrumentation for tool invocations, session correlation. If you're already a Datadog customer, this is zero additional setup.&lt;/p&gt;

&lt;p&gt;The strongest argument for Datadog is the unified view. If an LLM latency spike is caused by a downstream database slowdown, Datadog shows you both in the same trace. Your AI layer, your infrastructure, your queues — one pane of glass. That's genuinely valuable and not something purpose-built LLM tools can replicate.&lt;/p&gt;

&lt;p&gt;Datadog also has enterprise compliance sorted: SOC2 Type II, HIPAA, PCI DSS. If you're in a regulated industry, that matters.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Where Datadog genuinely wins:&lt;/strong&gt; AI as one component of a complex system you already monitor. The correlation between LLM latency and infrastructure health is something no standalone LLM tool can match.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where Datadog falls short
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The cost gap is real
&lt;/h3&gt;

&lt;p&gt;Datadog's LLM Observability is priced per event, stacked on top of existing APM costs. For teams running agents at scale — thousands of traces per day — the math gets uncomfortable fast. Enterprise contracts start at $50k/year. That's before the AI-specific add-ons.&lt;/p&gt;

&lt;p&gt;TraceHawk is $99/month flat for unlimited spans, with a 50K span/month free tier. For a startup running agents as core product, this difference is existential.&lt;/p&gt;

&lt;h3&gt;
  
  
  MCP as an afterthought
&lt;/h3&gt;

&lt;p&gt;Datadog added MCP support in June 2025 — 18 months after MCP launched. It traces MCP client sessions and tool invocations, but it's built on top of their generic APM span model. What you get: session ID, tool name, latency, error code. What you don't get:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✗ MCP server health dashboard with uptime and degradation detection&lt;/li&gt;
&lt;li&gt;✗ Per-server p50/p95 latency trends (not just per-call)&lt;/li&gt;
&lt;li&gt;✗ Error rate by server (which of your 12 MCP servers is flaky?)&lt;/li&gt;
&lt;li&gt;✗ Tool call heatmap — when during the day does each server get hammered?&lt;/li&gt;
&lt;li&gt;✗ Degraded server alerts — notify when error rate crosses a threshold&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;TraceHawk was built around MCP from day one. Every MCP tool call gets structured telemetry automatically:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"span_kind"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"MCP"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcp.server_name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"filesystem"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcp.tool_name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"read_file"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcp.tool_input"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"path"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"/workspace/src/auth.ts"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcp.output_size_bytes"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;4280&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"duration_ms"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"status"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ok"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"trace_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"3e4f5a6b..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"parent_span_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"1a2b3c4d"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Agent decisions are invisible
&lt;/h3&gt;

&lt;p&gt;Datadog shows you a trace waterfall — spans in chronological order. You can see what happened, but not why. When your agent calls the filesystem server 47 times before calling GitHub, a flat waterfall doesn't explain the decision path.&lt;/p&gt;

&lt;p&gt;TraceHawk parses parent-child span relationships into a visual decision tree: root is the task, branches are LLM decisions, leaves are tool calls. You can see exactly why the agent chose one tool over another, and what context it had at each decision point.&lt;/p&gt;

&lt;h3&gt;
  
  
  No agent session replay
&lt;/h3&gt;

&lt;p&gt;Datadog has no concept of agent session replay. TraceHawk shows a step-by-step session timeline — agent start, each LLM call with full prompt and response, each tool invocation, each MCP server response. Click any event to expand full detail. This is what you need when debugging why an agent got stuck in a loop or made an unexpected decision.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cost attribution vs token tracking
&lt;/h3&gt;

&lt;p&gt;Datadog tracks token usage. TraceHawk tracks token &lt;em&gt;costs&lt;/em&gt; — with per-model pricing tables updated as models change, per-agent cost budgets, and alerts when a specific agent is trending toward budget overage before the month ends. That's a different product than a token counter.&lt;/p&gt;




&lt;h2&gt;
  
  
  Full feature comparison
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;TraceHawk&lt;/th&gt;
&lt;th&gt;Datadog&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Price&lt;/td&gt;
&lt;td&gt;$99 / month&lt;/td&gt;
&lt;td&gt;$50k+ / year (enterprise)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Free tier&lt;/td&gt;
&lt;td&gt;50K spans/month&lt;/td&gt;
&lt;td&gt;Limited trial&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MCP-native tracing&lt;/td&gt;
&lt;td&gt;✅ Day one&lt;/td&gt;
&lt;td&gt;⚠️ Added June 2025&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MCP server health dashboard&lt;/td&gt;
&lt;td&gt;✅ Built-in&lt;/td&gt;
&lt;td&gt;❌ Not available&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Per-server error rates&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tool call heatmap&lt;/td&gt;
&lt;td&gt;✅ Time × server&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;p50 / p95 per MCP server&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Degraded server alerts&lt;/td&gt;
&lt;td&gt;✅ Slack / PagerDuty&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Agent decision tree&lt;/td&gt;
&lt;td&gt;✅ Visual&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Agent session replay&lt;/td&gt;
&lt;td&gt;✅ Step-by-step&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Prompt / response viewer&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Token cost attribution&lt;/td&gt;
&lt;td&gt;✅ Per span / budget&lt;/td&gt;
&lt;td&gt;⚠️ Token count only&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Budget alerts&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Infra correlation (APM)&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;✅ Core strength&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;APM + AI unified view&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SOC2 / HIPAA&lt;/td&gt;
&lt;td&gt;⚠️ Planned&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Self-hosted&lt;/td&gt;
&lt;td&gt;✅ Open source&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Setup time&lt;/td&gt;
&lt;td&gt;2 minutes&lt;/td&gt;
&lt;td&gt;1–2 weeks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SDK install&lt;/td&gt;
&lt;td&gt;pip install tracehawk&lt;/td&gt;
&lt;td&gt;Datadog agent&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  When to choose Datadog
&lt;/h2&gt;

&lt;p&gt;Be honest with yourself here. Datadog is the right choice if:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You already pay for Datadog and AI is a small part of your monitored system&lt;/li&gt;
&lt;li&gt;You need to correlate LLM latency with infrastructure failures — the unified view is genuinely valuable&lt;/li&gt;
&lt;li&gt;Enterprise compliance requirements today (HIPAA, PCI DSS) — TraceHawk doesn't have these yet&lt;/li&gt;
&lt;li&gt;Your AI layer is one piece of a complex distributed system you monitor with Datadog&lt;/li&gt;
&lt;li&gt;Your team has Datadog expertise and doesn't want to learn another tool&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  When to choose TraceHawk
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Your product IS the AI agent — observability needs to be deep, not broad&lt;/li&gt;
&lt;li&gt;You use MCP servers and need real visibility into per-server performance&lt;/li&gt;
&lt;li&gt;You want to understand agent decisions, not just log them&lt;/li&gt;
&lt;li&gt;Cost attribution at the span level with budget management matters&lt;/li&gt;
&lt;li&gt;You're a startup or small team ($99/mo vs $50k/yr is a real constraint)&lt;/li&gt;
&lt;li&gt;You need to be set up in 2 minutes, not 2 weeks&lt;/li&gt;
&lt;li&gt;You want the open-source option — TraceHawk is self-hostable&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Bottom line
&lt;/h2&gt;

&lt;p&gt;Datadog is a great choice if you already use it and AI is a small part of your stack. The unified infrastructure + AI view is a real advantage that purpose-built tools can't replicate. But the cost structure is built for enterprises monitoring everything, not teams whose entire product is an AI agent.&lt;/p&gt;

&lt;p&gt;If AI agents are your core product — especially if you use MCP servers — you need a tool built around them, not retrofitted for them. TraceHawk gives you MCP-native tracing, agent decision trees, session replay, and cost budgets in one place, at a fraction of the cost.&lt;/p&gt;

&lt;p&gt;The 50K span free tier covers most development and early-stage production workloads. You can instrument your first agent in 2 minutes and see the difference yourself.&lt;/p&gt;

&lt;p&gt;→ &lt;a href="https://tracehawk.dev/signup" rel="noopener noreferrer"&gt;Try TraceHawk free&lt;/a&gt; — no credit card required.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Tags: #aiagents #observability #mcp #datadog&lt;/em&gt;&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>observability</category>
      <category>mcp</category>
    </item>
    <item>
      <title>I Built a Voice AI GMAT Tutor with Long-Term Memory in 6 Weeks — Here's the Full Stack</title>
      <dc:creator>Pavel Gajvoronski</dc:creator>
      <pubDate>Mon, 13 Apr 2026 11:34:43 +0000</pubDate>
      <link>https://forem.com/pavelbuild/i-built-a-voice-ai-gmat-tutor-with-long-term-memory-in-6-weeks-heres-the-full-stack-3hod</link>
      <guid>https://forem.com/pavelbuild/i-built-a-voice-ai-gmat-tutor-with-long-term-memory-in-6-weeks-heres-the-full-stack-3hod</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6ffsa7heufftc0fhk99k.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6ffsa7heufftc0fhk99k.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;samiwise.app — live now&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;GMAT prep tutors charge $150–200 per hour. For a 3-month prep period, that's $5,000–10,000. Most people preparing for an MBA simply can't afford that — or can't find a good tutor available at 11pm when they finally have time to study.&lt;/p&gt;

&lt;p&gt;So I built &lt;strong&gt;SamiWISE&lt;/strong&gt; — a voice AI GMAT tutor that remembers every session, adapts to your weak spots, and explains material in real time using RAG over official GMAT materials. This is the story of how it was built, what I learned, and the technical decisions that made it work.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F90hiknuj6k45mvrtepjn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F90hiknuj6k45mvrtepjn.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The three things no competitor combines: voice + memory + real GMAT content&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Core Problem I Was Solving
&lt;/h2&gt;

&lt;p&gt;Every GMAT prep tool I looked at had the same fundamental issue: they start from scratch every single session. You explain your weak spots again. You get generic explanations that don't account for what confused you last Tuesday. There's no continuity.&lt;/p&gt;

&lt;p&gt;A good human tutor doesn't do this. They remember that you always mess up Data Sufficiency with inequalities. They know that analogies work better for you than abstract explanations. They track your trajectory over weeks.&lt;/p&gt;

&lt;p&gt;I wanted to build that — but accessible to everyone, available 24/7, at $49/month.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Architecture
&lt;/h2&gt;

&lt;p&gt;The system has four main layers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User (voice)
  → Deepgram STT (~1s)
  → Orchestrator Agent — Groq llama-3.3-70b (~200ms routing)
  → Specialist Agent — Claude Sonnet + RAG from Pinecone (~3-5s)
  → ElevenLabs TTS (~1s)
  → User hears response
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Total latency: 5–8 seconds. Not perfect, but feels natural — like a real tutor pausing to think.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Agent System
&lt;/h2&gt;

&lt;p&gt;The most interesting architectural decision was the multi-agent routing system.&lt;/p&gt;

&lt;p&gt;Instead of one monolithic AI tutor, there are &lt;strong&gt;five specialist agents&lt;/strong&gt; and an invisible orchestrator:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Agent&lt;/th&gt;
&lt;th&gt;Specialization&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;quantitative&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Problem Solving + Data Sufficiency&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;verbal&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Critical Reasoning + Reading Comprehension&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;data_insights&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Table Analysis, MSR, Graphics Interpretation, TPA&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;strategy&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Timing, exam psychology, study planning&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;orchestrator&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Routes messages — user never sees this&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The orchestrator runs on &lt;strong&gt;Groq&lt;/strong&gt; (llama-3.3-70b) because it needs to be fast — 200ms routing decisions. Specialist agents run on &lt;strong&gt;Claude Sonnet&lt;/strong&gt; because they need to be smart.&lt;/p&gt;

&lt;p&gt;Routing prompt returns structured JSON:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;route&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;quantitative&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;confidence&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.94&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;detected_topic&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;data sufficiency with inequalities&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;difficulty&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;hard&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;notes&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user has struggled with DS inequalities in past 3 sessions&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The user always hears the same voice — Sam. Transitions between agents are completely invisible.&lt;/p&gt;




&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5xwqlcx6gns8ab61s8k8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5xwqlcx6gns8ab61s8k8.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Every other GMAT tool treats you like a stranger on every visit. Sam carries your entire learning history.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Memory System — The Hard Part
&lt;/h2&gt;

&lt;p&gt;This is where most AI tutors fail. Building long-term memory that actually improves tutoring quality took the most iteration.&lt;/p&gt;

&lt;p&gt;After every session, a &lt;strong&gt;Memory Agent&lt;/strong&gt; runs in the background. It reads the full session transcript and extracts a structured learner profile:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kr"&gt;interface&lt;/span&gt; &lt;span class="nx"&gt;GmatLearnerProfile&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;weak_topics&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;
  &lt;span class="nx"&gt;strong_topics&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;
  &lt;span class="nx"&gt;effective_techniques&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;      &lt;span class="c1"&gt;// what explanation styles worked&lt;/span&gt;
  &lt;span class="nx"&gt;ineffective_approaches&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;    &lt;span class="c1"&gt;// what didn't land&lt;/span&gt;
  &lt;span class="nx"&gt;insight_moments&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;           &lt;span class="c1"&gt;// "aha" phrases that clicked&lt;/span&gt;
  &lt;span class="nx"&gt;common_error_patterns&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;     &lt;span class="c1"&gt;// e.g. "misreads DS question stem"&lt;/span&gt;
  &lt;span class="nx"&gt;learning_style&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;
  &lt;span class="nx"&gt;next_session_plan&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;
  &lt;span class="nx"&gt;score_trajectory&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;
  &lt;span class="nx"&gt;time_pressure_notes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This profile gets stored in Supabase as a JSON field on the User model. At the start of every session, the full profile is injected into the specialist agent's system prompt.&lt;/p&gt;

&lt;p&gt;The result: Sam says things like &lt;em&gt;"Last week you struggled with probability in DS — let's approach this one differently than before"&lt;/em&gt; without you having to explain anything.&lt;/p&gt;




&lt;h2&gt;
  
  
  RAG — What I Indexed and Why
&lt;/h2&gt;

&lt;p&gt;The knowledge base lives in &lt;strong&gt;Pinecone&lt;/strong&gt; with 7 namespaces:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;gmat-quant       → Quantitative problems and methods
gmat-verbal      → Verbal problems and methods
gmat-di          → Data Insights problems
gmat-strategy    → Strategies, timing, test psychology
gmat-focus       → GMAT Focus Edition specific content
gmat-errors      → Common error patterns
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Free sources I used:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;deepmind/aqua_rat&lt;/code&gt; — 97,467 GMAT/GRE algebra problems with rationales (Apache 2.0)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;allenai/math_qa&lt;/code&gt; — Math word problems with annotated formulas (Apache 2.0)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;mister-teddy/gmat-database&lt;/code&gt; — DS, PS, CR, SC questions in JSON (MIT)&lt;/li&gt;
&lt;li&gt;ReClor paper — 17 CR question types with examples (research)&lt;/li&gt;
&lt;li&gt;Manhattan Review free PDFs — Strategy guides openly distributed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The RAG pipeline uses &lt;code&gt;@xenova/transformers&lt;/code&gt; for embeddings (runs locally, no API cost) and retrieves top-5 chunks with reranking before passing to the specialist agent.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Tech Stack
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Frontend:     Next.js 14 + TypeScript + Tailwind CSS
Auth:         Supabase Auth
Database:     Supabase PostgreSQL + Prisma 6
Vector DB:    Pinecone (7 namespaces)
LLM Router:   Groq (llama-3.3-70b) — fast, cheap
LLM Agents:   Anthropic Claude Sonnet — smart, consistent
STT:          Deepgram (Whisper)
TTS:          ElevenLabs
Memory:       Custom Memory Agent → Supabase JSON
Payments:     Paddle (Merchant of Record, handles US tax)
Deploy:       Vercel (frontend) + Railway (agents backend)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why split Vercel + Railway?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Vercel has an 800-second serverless function limit. A 30-minute voice tutoring session would time out. Railway runs persistent containers — no limits, no cold starts for agents.&lt;/p&gt;




&lt;h2&gt;
  
  
  Practice Mode with FSRS
&lt;/h2&gt;

&lt;p&gt;Beyond voice sessions, I built a visual practice mode where users can work through GMAT questions in exam format.&lt;/p&gt;

&lt;p&gt;The interesting part: I implemented &lt;strong&gt;FSRS&lt;/strong&gt; (Free Spaced Repetition Scheduler) — the same algorithm used by Anki. After each answer, the system records:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Was it correct?&lt;/li&gt;
&lt;li&gt;How long did it take?&lt;/li&gt;
&lt;li&gt;What was the difficulty?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then it schedules the next review using an exponential forgetting curve. Questions you answered wrong come back sooner. Questions you mastered disappear for weeks.&lt;/p&gt;

&lt;p&gt;This means the practice queue automatically prioritizes your weak spots without you having to manage anything.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Study Journal
&lt;/h2&gt;

&lt;p&gt;Every session automatically updates a daily journal entry:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kr"&gt;interface&lt;/span&gt; &lt;span class="nx"&gt;StudyJournalEntry&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;date&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Date&lt;/span&gt;
  &lt;span class="nx"&gt;totalMinutes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;
  &lt;span class="nx"&gt;questionsTotal&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;
  &lt;span class="nx"&gt;accuracy&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;
  &lt;span class="nx"&gt;topicsCovered&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;
  &lt;span class="nx"&gt;errorTypes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Record&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="nx"&gt;samInsight&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;          &lt;span class="c1"&gt;// AI-generated daily summary&lt;/span&gt;
  &lt;span class="nx"&gt;milestones&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;        &lt;span class="c1"&gt;// "100 questions solved", "5 hour week"&lt;/span&gt;
  &lt;span class="nx"&gt;streakDay&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The streak counter turned out to be unexpectedly powerful for retention — users don't want to break their streak. Same psychology as Duolingo, but for GMAT prep.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Doesn't Work Yet
&lt;/h2&gt;

&lt;p&gt;Being honest about where things stand:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Voice pipeline not live yet&lt;/strong&gt; — Deepgram + ElevenLabs keys configured but need production testing with real users&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;RAG not indexed&lt;/strong&gt; — scripts are ready, Pinecone account set up, but haven't pushed the data yet&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No real users&lt;/strong&gt; — launching next week, zero feedback so far&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The architecture is built. The UI works. The agents respond correctly. Next step is connecting all the APIs and getting real people to use it.&lt;/p&gt;




&lt;h2&gt;
  
  
  Lessons Learned
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. Multi-agent routing is worth the complexity.&lt;/strong&gt;&lt;br&gt;
A single "GMAT tutor" prompt produces mediocre results across all topics. Specialist agents with deep domain prompts are significantly better. The routing overhead is minimal.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Memory quality matters more than memory quantity.&lt;/strong&gt;&lt;br&gt;
I originally tried to store everything — full transcripts, every message. The prompts became too long and performance degraded. The Memory Agent that extracts structured insights (not raw content) works much better.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Split your infrastructure early.&lt;/strong&gt;&lt;br&gt;
I almost deployed everything to Vercel. The 800-second limit would have killed voice sessions. Railway for long-running processes saved the architecture.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Free datasets are better than I expected.&lt;/strong&gt;&lt;br&gt;
The deepmind/aqua_rat dataset has 97,000 high-quality GMAT-style problems with step-by-step rationales. Apache 2.0 license. This single dataset provides more practice material than most paid prep courses.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Paddle for payments if you're targeting the US market.&lt;/strong&gt;&lt;br&gt;
They handle sales tax across all 50 states automatically. As a Merchant of Record, they handle chargebacks and disputes. The 5% + $0.50 fee is worth it for the peace of mind.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Get first 10 beta users from r/GMAT and GMAT Club&lt;/li&gt;
&lt;li&gt;Connect production APIs (Deepgram, ElevenLabs, Pinecone)&lt;/li&gt;
&lt;li&gt;Run the RAG indexing scripts&lt;/li&gt;
&lt;li&gt;Collect feedback on voice experience quality&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you're interested in trying it or have feedback on the architecture, I'd love to hear from you. The product is live at &lt;strong&gt;&lt;a href="https://samiwise.app" rel="noopener noreferrer"&gt;samiwise.app&lt;/a&gt;&lt;/strong&gt; — 7-day free trial, no credit card required.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Built with Next.js, Claude Sonnet, Groq, Pinecone, Deepgram, ElevenLabs, Supabase, Paddle, and Railway. Full stack TypeScript.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>nextjs</category>
      <category>llm</category>
      <category>webdev</category>
    </item>
    <item>
      <title>I'm Building a Platform That Deploys AI Companies From a Single Sentence</title>
      <dc:creator>Pavel Gajvoronski</dc:creator>
      <pubDate>Mon, 13 Apr 2026 08:41:01 +0000</pubDate>
      <link>https://forem.com/pavelbuild/im-building-a-platform-that-deploys-ai-companies-from-a-single-sentence-32aj</link>
      <guid>https://forem.com/pavelbuild/im-building-a-platform-that-deploys-ai-companies-from-a-single-sentence-32aj</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F08q1i53zdqulzmi4eytp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F08q1i53zdqulzmi4eytp.png" alt=" " width="800" height="662"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff398is4kfyebhsconif6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff398is4kfyebhsconif6.png" alt=" " width="800" height="525"&gt;&lt;/a&gt;I'm building an AI Company Builder — a platform where you describe a business idea in plain text, and a team of 28 AI agents researches the market, validates viability, designs the product, writes the code, creates content, and runs marketing. All autonomously.&lt;/p&gt;

&lt;p&gt;This is not a chatbot. This is not another wrapper around ChatGPT. This is a full-stack agent orchestration platform with a two-layer architecture, persistent knowledge vault, multi-model routing across 300+ models, and a built-in marketplace for buying and selling AI-powered businesses.&lt;/p&gt;

&lt;p&gt;I want to share the architecture, the tech stack, and the decisions I made — because I haven't seen anyone build exactly this combination yet.&lt;/p&gt;
&lt;h2&gt;
  
  
  The problem
&lt;/h2&gt;

&lt;p&gt;Right now, if you want to launch a business with AI, you're stitching together 5-10 tools manually: ChatGPT for strategy, Lovable for code, Jasper for content, Perplexity for research, Notion for knowledge, Zapier for automation. Each tool does one thing. None of them talk to each other. And none of them understand your business as a whole.&lt;/p&gt;

&lt;p&gt;What if one platform did it all? Not by being mediocre at everything — but by orchestrating specialized agents, each expert in their domain, all sharing context through a persistent knowledge vault?&lt;/p&gt;
&lt;h2&gt;
  
  
  The architecture: two layers, not one
&lt;/h2&gt;

&lt;p&gt;Most agent platforms put all agents on the same level. Marketing agent, coding agent, sales agent — flat list, no hierarchy. This works for simple automation but breaks down when you're building a complete business.&lt;/p&gt;

&lt;p&gt;I went with a two-layer approach:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Business layer&lt;/strong&gt; — 7 manager agents that understand your specific business. Product Manager (Max), Marketing Lead (Ivy), Sales Strategist (Sam), Financial Analyst (Finn), Customer Success (Joy), Legal Advisor (Lex), and a Business Generator (Chief) that creates the whole structure from your description. These agents know your niche, your competitors, your audience.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tool layer&lt;/strong&gt; — 21 universal agents that do the actual work. Architect (Atlas), Designer (Maya), Frontend Dev (Kai), Backend Dev (Dev), Security (Shield), Researcher (Nova), Writer (Sage), and 14 more. These don't know your business — they know their craft. The business layer delegates to them with full context.&lt;/p&gt;

&lt;p&gt;The key insight: business agents are per-business instances, tool agents are shared. If you run 3 businesses simultaneously, each has its own Max and Ivy, but they all share the same Atlas and Kai. This scales without multiplying costs.&lt;/p&gt;
&lt;h2&gt;
  
  
  Model routing: 90% quality at 7% cost
&lt;/h2&gt;

&lt;p&gt;Running everything on Claude Opus 4.6 would cost a fortune. Running everything on a cheap model would produce garbage. The answer is intelligent routing.&lt;/p&gt;

&lt;p&gt;I use OpenRouter as a gateway to 300+ models, organized in 4 tiers:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tier&lt;/th&gt;
&lt;th&gt;Models&lt;/th&gt;
&lt;th&gt;Cost/1M tokens&lt;/th&gt;
&lt;th&gt;Use case&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Free&lt;/td&gt;
&lt;td&gt;Llama 3.3 70B&lt;/td&gt;
&lt;td&gt;$0&lt;/td&gt;
&lt;td&gt;Routing, classification&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Budget&lt;/td&gt;
&lt;td&gt;DeepSeek V3, Gemini Flash&lt;/td&gt;
&lt;td&gt;$0.14-0.60&lt;/td&gt;
&lt;td&gt;Content writing, planning&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Performance&lt;/td&gt;
&lt;td&gt;MiniMax M2.7&lt;/td&gt;
&lt;td&gt;$0.30-1.20&lt;/td&gt;
&lt;td&gt;Coding, testing, debugging&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Premium&lt;/td&gt;
&lt;td&gt;Claude Sonnet/Opus 4.6&lt;/td&gt;
&lt;td&gt;$3-25&lt;/td&gt;
&lt;td&gt;Architecture, security, design&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;MiniMax M2.7 is the secret weapon here. In real-world tests, it delivers 90% of Opus quality for 7% of the cost. It found all 6 bugs and all 10 security vulnerabilities that Opus found — the fixes were just slightly less thorough. For most coding tasks, that's more than enough.&lt;/p&gt;

&lt;p&gt;The system also auto-escalates: if an agent fails 3 times on a cheaper model, it automatically upgrades to the next tier. And auto-downgrades: 10 consecutive successes on Sonnet? The system suggests trying M2.7 next time.&lt;/p&gt;

&lt;p&gt;A full project milestone that costs $50-80 on all-Opus runs $10-12 with routing. That's 80-85% savings.&lt;/p&gt;
&lt;h2&gt;
  
  
  The research stack: not just chat, actual research
&lt;/h2&gt;

&lt;p&gt;This is where I think most agent platforms fall short. They can write code and generate content — but they can't research. They don't know what's happening in the market right now.&lt;/p&gt;

&lt;p&gt;My stack includes four self-hosted research tools:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Perplexica&lt;/strong&gt; — open-source Perplexity alternative. AI-powered web search with cited sources. When Nova (researcher agent) needs to analyze a market, she searches the web through Perplexica and gets answers with real citations, not hallucinations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;SurfSense&lt;/strong&gt; — open-source NotebookLM alternative. Upload documents, chat with them, get cited answers. Hybrid search (semantic + full text). Can even generate podcasts from documents.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AnythingLLM&lt;/strong&gt; — RAG workspace for document analysis. Upload PDFs, DOCX, code files — agents query them with grounded answers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Firecrawl&lt;/strong&gt; — web scraping via MCP. Agents can scrape any URL into clean markdown, crawl entire websites, extract structured data.&lt;/p&gt;

&lt;p&gt;The combination means agents can research a market, analyze competitors, scrape their pricing pages, summarize uploaded pitch decks, and cite every claim with a real source.&lt;/p&gt;
&lt;h2&gt;
  
  
  The gate system: think before you build
&lt;/h2&gt;

&lt;p&gt;Here's what nobody else does. Before my system commits resources to building something, it analyzes whether it's worth building.&lt;/p&gt;

&lt;p&gt;You write: "Build an online chess school for kids 6-14. Analyze viability first. Only proceed if rating is above 7/10."&lt;/p&gt;

&lt;p&gt;The system runs a full analysis:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Criterion&lt;/th&gt;
&lt;th&gt;Weight&lt;/th&gt;
&lt;th&gt;Score&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Market size&lt;/td&gt;
&lt;td&gt;20%&lt;/td&gt;
&lt;td&gt;8/10&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Competition level&lt;/td&gt;
&lt;td&gt;20%&lt;/td&gt;
&lt;td&gt;7/10&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Niche uniqueness&lt;/td&gt;
&lt;td&gt;15%&lt;/td&gt;
&lt;td&gt;9/10&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Revenue potential&lt;/td&gt;
&lt;td&gt;15%&lt;/td&gt;
&lt;td&gt;8/10&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Acquisition cost&lt;/td&gt;
&lt;td&gt;15%&lt;/td&gt;
&lt;td&gt;5/10&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Channel accessibility&lt;/td&gt;
&lt;td&gt;15%&lt;/td&gt;
&lt;td&gt;8/10&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Overall&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;7.4/10&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;If it passes the threshold — development begins. If not — the system explains why and suggests modifications. "Focus on children 6-10 instead of 6-14 — less competition, higher willingness to pay. Adjusted score: 8.1/10."&lt;/p&gt;

&lt;p&gt;This saves thousands of dollars and weeks of development on ideas that won't work.&lt;/p&gt;
&lt;h2&gt;
  
  
  Persistent memory: the Obsidian Vault
&lt;/h2&gt;

&lt;p&gt;Every research finding, every architectural decision, every bug fix, every content plan — saved as markdown notes in an Obsidian-compatible vault with git version control.&lt;/p&gt;

&lt;p&gt;The vault isn't just storage. It's a living knowledge base:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Auto-indexing&lt;/strong&gt;: Vault Librarian agent (Libra) maintains indexes, tags notes, creates links between related decisions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Git history&lt;/strong&gt;: every change tracked, every note timestamped, full rollback capability&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Memory consolidation&lt;/strong&gt;: Libra periodically merges scattered notes into coherent knowledge structures&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cross-project learning&lt;/strong&gt;: insights from one project automatically available in related projects&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;After 3 months of operation, the vault contains hundreds of notes — and the system is measurably smarter. Nova doesn't re-research topics she already investigated. Atlas references past ADRs when making new architecture decisions. The knowledge compounds.&lt;/p&gt;
&lt;h2&gt;
  
  
  Event-driven architecture: everything is observable
&lt;/h2&gt;

&lt;p&gt;Every agent action emits an event to Redis pub/sub:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"agent"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Atlas"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"created_adr"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"claude-opus-4.6"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"tokens_in"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2400&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"tokens_out"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;5100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"cost_usd"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.142&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"duration_ms"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;8300&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"vault_note"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"projects/chess/decisions/ADR-001.md"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Multiple services subscribe: the audit logger saves to immutable JSONL, the cost tracker aggregates spending, the vault manager auto-saves results, and the live activity stream pushes to the Web UI via WebSocket.&lt;/p&gt;

&lt;p&gt;This gives you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Full audit trail&lt;/strong&gt; for compliance (EU AI Act, GDPR)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Real-time cost tracking&lt;/strong&gt; with ROI calculation ("Your agents saved $28,000 in equivalent human labor this month")&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Live activity feed&lt;/strong&gt; — watch your agents work in real-time&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Kill switch&lt;/strong&gt; — instantly halt all agent activity if something goes wrong&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  A2A protocol readiness
&lt;/h2&gt;

&lt;p&gt;Google's Agent-to-Agent protocol (A2A) is becoming the standard for inter-platform agent communication. 50+ partners including Salesforce, SAP, and PayPal are building on it.&lt;/p&gt;

&lt;p&gt;I'm building A2A compatibility from day one. Every agent has an Agent Card — a JSON file describing its capabilities. External agents can discover our agents, send tasks, and receive results through standardized endpoints.&lt;/p&gt;

&lt;p&gt;Why this matters: in 2027-2028, your business agents will negotiate with supplier agents, customer agents will talk to support agents across platforms, and marketing agents will coordinate campaigns with influencer agents — all machine-to-machine. Building the protocol layer now means we're ready when this arrives.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's coming next
&lt;/h2&gt;

&lt;p&gt;The full platform has 14 milestones. I'm currently on the build phase, deploying infrastructure on a Hetzner VPS with Claude Code + GSD-2 running the development process.&lt;/p&gt;

&lt;p&gt;What I'm building toward:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;YouTube content pipeline&lt;/strong&gt;: from idea to published video, fully automated&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Business Exchange&lt;/strong&gt;: marketplace for buying and selling AI-powered businesses&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cross-business learning&lt;/strong&gt;: anonymous patterns shared across all businesses on the platform&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;400+ integrations&lt;/strong&gt; via Composio: Gmail, Slack, HubSpot, Notion, Jira — one MCP server&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;7 business templates&lt;/strong&gt;: Online Education, SaaS, Agency, E-commerce, Content, Marketplace, Coaching&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Tech stack summary
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Technology&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Server&lt;/td&gt;
&lt;td&gt;Ubuntu 24.04, Hetzner CPX31&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AI Engine&lt;/td&gt;
&lt;td&gt;Claude Code CLI via OpenRouter&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Orchestration&lt;/td&gt;
&lt;td&gt;GSD-2 + Ruflo&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;API Gateway&lt;/td&gt;
&lt;td&gt;FastAPI + Redis pub/sub&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Interfaces&lt;/td&gt;
&lt;td&gt;Telegram (aiogram), React Web UI, REST API&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Research&lt;/td&gt;
&lt;td&gt;Perplexica, SurfSense, AnythingLLM, Firecrawl&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Integrations&lt;/td&gt;
&lt;td&gt;Composio (400+ apps), MCP servers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Memory&lt;/td&gt;
&lt;td&gt;Obsidian Vault + MCPVault + git&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Payments&lt;/td&gt;
&lt;td&gt;Stripe, Paddle (MoR)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Why I'm sharing this
&lt;/h2&gt;

&lt;p&gt;Two reasons. First, I genuinely believe this is where software is heading — from tools to autonomous business operators. The predictions from Dario Amodei, Sam Altman, and every major AI lab point to agents handling multi-week projects autonomously by 2028. Building the platform for this now is a bet on the near future.&lt;/p&gt;

&lt;p&gt;Second, building in public keeps me honest. If you see flaws in the architecture, I want to know. If you're building something similar, let's compare notes. If you want to be an early user — I'll be opening access soon.&lt;/p&gt;

&lt;p&gt;Follow the build: I'll be posting weekly updates here on dev.to with technical deep dives into each component.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;What would you build if you had 28 AI agents at your command?&lt;/em&gt;&lt;/p&gt;

</description>
      <category>buildinpublic</category>
      <category>agents</category>
      <category>startup</category>
      <category>architecture</category>
    </item>
    <item>
      <title>TraceHawk vs LangSmith: AI Agent Observability in 2026</title>
      <dc:creator>Pavel Gajvoronski</dc:creator>
      <pubDate>Fri, 10 Apr 2026 10:02:19 +0000</pubDate>
      <link>https://forem.com/pavelbuild/tracehawk-vs-langsmith-ai-agent-observability-in-2026-4766</link>
      <guid>https://forem.com/pavelbuild/tracehawk-vs-langsmith-ai-agent-observability-in-2026-4766</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw2waz556zuj6tk6l8zta.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw2waz556zuj6tk6l8zta.png" alt=" " width="800" height="419"&gt;&lt;/a&gt;LangSmith is the default choice for LangChain teams. But if your stack has moved beyond LangChain — or you're using MCP servers — you're working around LangSmith, not with it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Feature Comparison
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;TraceHawk&lt;/th&gt;
&lt;th&gt;LangSmith&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;MCP server name captured&lt;/td&gt;
&lt;td&gt;✅ Always&lt;/td&gt;
&lt;td&gt;⚠️ Requires manual tagging&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Per-server latency (p50/p95)&lt;/td&gt;
&lt;td&gt;✅ Built-in&lt;/td&gt;
&lt;td&gt;❌ Not tracked&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MCP error details&lt;/td&gt;
&lt;td&gt;✅ Full error + stack&lt;/td&gt;
&lt;td&gt;❌ Not available&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MCP server health dashboard&lt;/td&gt;
&lt;td&gt;✅ Built-in&lt;/td&gt;
&lt;td&gt;❌ Not available&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OTEL-native ingest&lt;/td&gt;
&lt;td&gt;✅ OTLP endpoint&lt;/td&gt;
&lt;td&gt;⚠️ LangChain-first, OTEL adapter&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LLM call tracing&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cost attribution&lt;/td&gt;
&lt;td&gt;✅ Per agent/trace/org&lt;/td&gt;
&lt;td&gt;✅ Per run&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Prompt versioning / hub&lt;/td&gt;
&lt;td&gt;⚠️ Roadmap&lt;/td&gt;
&lt;td&gt;✅ LangSmith Hub&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Agent replay timeline&lt;/td&gt;
&lt;td&gt;✅ Step-by-step&lt;/td&gt;
&lt;td&gt;✅ Run timeline&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Dataset / eval harness&lt;/td&gt;
&lt;td&gt;❌ Not in scope&lt;/td&gt;
&lt;td&gt;✅ Built-in&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Retry loop detection&lt;/td&gt;
&lt;td&gt;✅ Automatic badge&lt;/td&gt;
&lt;td&gt;❌ Not available&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OTEL dual-write re-export&lt;/td&gt;
&lt;td&gt;✅ Built-in fan-out&lt;/td&gt;
&lt;td&gt;❌ Not available&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Self-host option&lt;/td&gt;
&lt;td&gt;✅ Open source core&lt;/td&gt;
&lt;td&gt;❌ Cloud only (Enterprise)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Free tier&lt;/td&gt;
&lt;td&gt;50K spans/month&lt;/td&gt;
&lt;td&gt;Limited (Developer)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pro tier&lt;/td&gt;
&lt;td&gt;$99/month&lt;/td&gt;
&lt;td&gt;$39/month (25 seats)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Framework support&lt;/td&gt;
&lt;td&gt;Any (OTEL-compatible)&lt;/td&gt;
&lt;td&gt;LangChain/LangGraph-first&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  The core difference
&lt;/h2&gt;

&lt;p&gt;LangSmith was built to observe LangChain chains. Everything else is a wrapper around that mental model. TraceHawk was built around OpenTelemetry from day one — which means any framework, any language, and first-class support for Model Context Protocol.&lt;/p&gt;

&lt;p&gt;This isn't a criticism of LangSmith. It's the right tool if your entire stack is LangChain/LangGraph and you want deep eval/dataset tooling. The question is whether that describes your stack in 2026.&lt;/p&gt;

&lt;h2&gt;
  
  
  MCP support: built-in vs bolted on
&lt;/h2&gt;

&lt;p&gt;Model Context Protocol is now the dominant way AI agents use tools — Claude Code, LangGraph, CrewAI, OpenAI Agents SDK all support it natively. LangSmith doesn't have a concept of "MCP server" — you can log the spans manually, but there's no:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Per-server health dashboard (error rate, p95 latency, call frequency)&lt;/li&gt;
&lt;li&gt;Automatic tool name extraction from &lt;code&gt;mcp.tool_name&lt;/code&gt; attributes&lt;/li&gt;
&lt;li&gt;Server degradation alerts&lt;/li&gt;
&lt;li&gt;MCP-aware retry loop detection&lt;/li&gt;
&lt;li&gt;Agent → server dependency graph&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In TraceHawk, all of this is automatic. If you emit standard OTLP spans with &lt;code&gt;mcp.server_name&lt;/code&gt; and &lt;code&gt;mcp.tool_name&lt;/code&gt; attributes, the dashboard populates itself. No configuration required.&lt;/p&gt;

&lt;h2&gt;
  
  
  Framework independence
&lt;/h2&gt;

&lt;p&gt;LangSmith works best with LangChain. The tracing callbacks are tightly coupled to the LangChain execution model — &lt;code&gt;on_llm_start&lt;/code&gt;, &lt;code&gt;on_tool_end&lt;/code&gt;, etc. If you switch to OpenAI Agents SDK, CrewAI, or write a custom agent, you're on your own.&lt;/p&gt;

&lt;p&gt;TraceHawk uses OTLP as the ingest protocol. Any framework that emits OpenTelemetry spans works out of the box — including LangChain, LangGraph, CrewAI, OpenAI Agents SDK, Claude Code hooks, and custom agents. One endpoint, everything traces.&lt;/p&gt;

&lt;h2&gt;
  
  
  When LangSmith wins
&lt;/h2&gt;

&lt;p&gt;LangSmith has capabilities TraceHawk doesn't aim to replicate:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Prompt Hub&lt;/strong&gt; — version-controlled prompt management with deployment&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Evaluation datasets&lt;/strong&gt; — structured datasets for regression testing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LangChain-native callbacks&lt;/strong&gt; — zero-config if your stack is 100% LangChain&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LangGraph Studio integration&lt;/strong&gt; — visual graph debugging&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If your workflow is "build in LangGraph, test with eval datasets, iterate on prompts in Hub" — LangSmith is genuinely great. TraceHawk doesn't try to replace that.&lt;/p&gt;

&lt;h2&gt;
  
  
  When TraceHawk wins
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Your stack uses MCP servers (Claude Code, custom MCP, any framework)&lt;/li&gt;
&lt;li&gt;You want OTEL-native ingest without framework lock-in&lt;/li&gt;
&lt;li&gt;You need cost attribution per agent/trace/organization&lt;/li&gt;
&lt;li&gt;You want to self-host (open source core, Docker-deployable)&lt;/li&gt;
&lt;li&gt;You need retry loop detection and server health alerts&lt;/li&gt;
&lt;li&gt;You want to dual-write to Datadog/Grafana simultaneously&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Pricing
&lt;/h2&gt;

&lt;p&gt;LangSmith Developer tier is free with limited traces. Their paid plans start at $39/month for a team of 25. TraceHawk is $0 for 50K spans/month, $99/month for unlimited — no per-seat pricing, no surprise overages.&lt;/p&gt;

&lt;p&gt;For production AI agent teams, the relevant comparison is: LangSmith Plus ($99–$499/month, per-seat) vs TraceHawk Pro ($99/month flat). If your team is 5+ people, TraceHawk is cheaper.&lt;/p&gt;

&lt;h2&gt;
  
  
  The bottom line
&lt;/h2&gt;

&lt;p&gt;LangSmith is excellent if you're all-in on LangChain. TraceHawk is the right choice if you're using MCP, want framework independence, or need production-grade observability without per-seat pricing.&lt;/p&gt;

&lt;p&gt;They're not direct competitors — LangSmith is a LangChain-native eval platform that includes tracing. TraceHawk is an OTEL-native observability platform that focuses on what matters for AI agent teams in 2026: MCP visibility, cost attribution, and production alerting.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Try TraceHawk free: 50K spans/month, no credit card. &lt;a href="https://tracehawk.dev" rel="noopener noreferrer"&gt;tracehawk.dev&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>mcp</category>
      <category>observability</category>
      <category>devtools</category>
    </item>
    <item>
      <title>EU AI Act Deadline August 2026: What SMBs Need to Do Now</title>
      <dc:creator>Pavel Gajvoronski</dc:creator>
      <pubDate>Fri, 10 Apr 2026 09:55:39 +0000</pubDate>
      <link>https://forem.com/pavelbuild/eu-ai-act-deadline-august-2026-what-smbs-need-to-do-now-44ne</link>
      <guid>https://forem.com/pavelbuild/eu-ai-act-deadline-august-2026-what-smbs-need-to-do-now-44ne</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuiaepxyns53ti1o2cylf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuiaepxyns53ti1o2cylf.png" alt=" " width="800" height="419"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;Five months. That's roughly how long you have until the EU AI Act's high-risk compliance deadline on August 2, 2026.&lt;/p&gt;

&lt;p&gt;If you're running a small or medium-sized SaaS company with AI features and European customers, this deadline should be on your radar. Even if you've heard rumors about a possible extension, the legal reality is more nuanced — and more urgent — than the headlines suggest.&lt;/p&gt;

&lt;p&gt;This article gives you the full picture: what's already in force, what's coming, what the Digital Omnibus actually says, and a concrete 5-month plan to get your house in order.&lt;/p&gt;

&lt;p&gt;The Full EU AI Act Timeline: What's Already Happened&lt;/p&gt;

&lt;p&gt;The AI Act doesn't flip a single switch on one date. It phases in across multiple milestones:&lt;/p&gt;

&lt;p&gt;August 1, 2024 — Entry Into Force&lt;/p&gt;

&lt;p&gt;The regulation was published in the Official Journal. The clock started ticking, but no obligations applied yet.&lt;/p&gt;

&lt;p&gt;February 2, 2025 — Prohibited Practices + AI Literacy&lt;/p&gt;

&lt;p&gt;Two categories of rules became enforceable. First, prohibited AI practices under Article 5 — social scoring, subliminal manipulation, exploitation of vulnerabilities, certain biometric uses, and predictive policing — are now banned. Violations carry penalties of up to €35 million or 7% of global turnover.&lt;/p&gt;

&lt;p&gt;Second, AI literacy obligations under Article 4 require organizations to ensure their staff have sufficient understanding of AI systems they develop or use. The European Commission has since proposed shifting primary responsibility for AI literacy to member states and the Commission itself under the Digital Omnibus, but until that's adopted, the original obligation stands.&lt;/p&gt;

&lt;p&gt;August 2, 2025 — GPAI Rules + Governance Infrastructure&lt;/p&gt;

&lt;p&gt;This was the second major milestone. General-purpose AI (GPAI) model obligations under Article 53 became applicable, covering technical documentation, training data summaries, and copyright compliance. The GPAI Code of Practice was published in July 2025, and 26 major providers signed it, including Amazon, Anthropic, Google, IBM, Microsoft, OpenAI, and Mistral AI. Notably, Meta declined to sign.&lt;/p&gt;

&lt;p&gt;The EU governance infrastructure also came online: the AI Office, the AI Board, the Scientific Panel, and the Advisory Forum all became operational. Member states were required to designate national competent authorities.&lt;/p&gt;

&lt;p&gt;The penalty regime also took effect — market surveillance authorities can now impose fines for non-compliance. However, enforcement powers specific to GPAI model providers don't kick in until August 2, 2026.&lt;/p&gt;

&lt;p&gt;August 2, 2026 — The Big One&lt;/p&gt;

&lt;p&gt;This is when the majority of the AI Act becomes enforceable. Key elements include full compliance requirements for high-risk AI systems listed in Annex III (covering hiring, credit scoring, biometrics, education, emergency services, and more), transparency obligations under Article 50 for limited-risk systems, innovation measures including AI regulatory sandboxes (at least one per member state), and full enforcement at both national and EU levels.&lt;/p&gt;

&lt;p&gt;August 2, 2027 — High-Risk Products&lt;/p&gt;

&lt;p&gt;Rules for high-risk AI systems that are safety components of regulated products (covered under Annex I EU harmonization legislation) apply from this date. This primarily affects manufacturers of physical products like medical devices, machinery, and vehicles.&lt;/p&gt;

&lt;p&gt;The Digital Omnibus: What It Actually Says&lt;/p&gt;

&lt;p&gt;On November 19, 2025, the European Commission published the Digital Omnibus — a sweeping proposal to simplify the EU's digital regulatory framework. For the AI Act specifically, the most significant proposal involves extending the compliance timeline for high-risk systems.&lt;/p&gt;

&lt;p&gt;Here's what the Omnibus proposes for high-risk AI:&lt;/p&gt;

&lt;p&gt;Conditional delay: High-risk obligations would not apply until the Commission confirms that adequate compliance support — harmonized standards, common specifications, or guidelines — is available. Once confirmed, Annex III systems (standalone high-risk uses like hiring and credit scoring) would have 6 months to comply. Annex I systems (product safety components) would have 12 months.&lt;/p&gt;

&lt;p&gt;Backstop dates: Even if standards aren't ready, rules would apply no later than December 2, 2027 for Annex III systems and August 2, 2028 for Annex I systems.&lt;/p&gt;

&lt;p&gt;Grace period for GPAI transparency: Providers of generative AI systems placed on the market before August 2026 would get until February 2, 2027 to meet content-marking transparency obligations.&lt;/p&gt;

&lt;p&gt;SME-friendly simplifications: Simplified quality management system (QMS) requirements under Article 17 would be extended from microenterprises to all SMEs.&lt;/p&gt;

&lt;p&gt;Why You Shouldn't Rely on the Omnibus&lt;/p&gt;

&lt;p&gt;There are three critical reasons not to treat the Omnibus as a get-out-of-jail-free card.&lt;/p&gt;

&lt;p&gt;First, it's not law yet. The Omnibus is a Commission proposal that must go through trilogue negotiations with the European Parliament and Council. This process could take months, and the final text may look significantly different from the current proposal.&lt;/p&gt;

&lt;p&gt;Second, the timing is extremely tight. For the Omnibus to have any effect before August 2, 2026, it must be adopted before that date. If Parliament and Council don't agree in time, the original deadline applies as written. Multiple legal commentators have flagged this as a realistic risk.&lt;/p&gt;

&lt;p&gt;Third, the core framework stays. Even under the Omnibus, the AI Act's risk classification, prohibited practices, and obligation structure remain intact. The delay is about timing, not about watering down requirements. Every obligation you'd need to meet in August 2026 you'll still need to meet by December 2027 at the latest.&lt;/p&gt;

&lt;p&gt;The Commission itself has called this a "structural recalibration," not deregulation.&lt;/p&gt;

&lt;p&gt;The Penalty Framework&lt;/p&gt;

&lt;p&gt;For SMBs, the financial exposure is significant:&lt;/p&gt;

&lt;p&gt;Up to €35 million or 7% of global annual turnover for violations involving prohibited AI practices. Up to €15 million or 3% of turnover for violations of high-risk system obligations or GPAI provider obligations. Up to €7.5 million or 1% of turnover for providing incorrect, incomplete, or misleading information to authorities.&lt;/p&gt;

&lt;p&gt;The AI Act does include proportionality considerations for SMEs and startups (Article 99(6)), with penalties accounting for the size of the company. But "proportional" doesn't mean "zero" — it means you'll face a fine calibrated to your revenue rather than the maximum cap.&lt;/p&gt;

&lt;p&gt;Beyond direct penalties, non-compliance carries other consequences. Market surveillance authorities can order you to withdraw your AI system from the EU market. Customers — especially enterprise buyers — are increasingly asking about AI Act compliance during procurement. And a public enforcement action can do lasting reputational damage.&lt;/p&gt;

&lt;p&gt;Your 5-Month Action Plan: March to August 2026&lt;/p&gt;

&lt;p&gt;Here's a concrete month-by-month plan for an SMB getting serious about compliance now.&lt;/p&gt;

&lt;p&gt;Month 1 (March): Inventory and Classification&lt;/p&gt;

&lt;p&gt;Start by creating a complete inventory of every AI system you develop, deploy, or use. For each system, document what it does, what AI model or method it uses, what data it processes, who is affected by its outputs, and what markets it serves.&lt;/p&gt;

&lt;p&gt;Then classify each system by risk level. Map them against the 8 Annex III categories. Check Article 6(3) exceptions. Determine your role — provider, deployer, or both.&lt;/p&gt;

&lt;p&gt;This step alone gives you clarity on what applies to you. If all your systems are minimal or limited risk, your path forward is much simpler.&lt;/p&gt;

&lt;p&gt;Month 2 (April): Gap Analysis&lt;/p&gt;

&lt;p&gt;For each high-risk system, assess your current compliance status against Articles 9 through 15. For each requirement, determine whether you already have it, partially have it, or are starting from zero.&lt;/p&gt;

&lt;p&gt;Key questions to answer: Do you have a documented risk management system? How is your training data governed and documented? Do you have technical documentation that meets Annex IV's 9 sections? Are your systems logging decisions automatically? Can users understand how decisions are made? Is there meaningful human oversight? Have you tested for accuracy, robustness, and cybersecurity?&lt;/p&gt;

&lt;p&gt;Calculate a compliance score. Prioritize critical gaps — especially Articles 9 (risk management), 10 (data governance), and 11 (technical documentation), as these are the most time-consuming to address.&lt;/p&gt;

&lt;p&gt;Month 3 (May): Documentation Sprint&lt;/p&gt;

&lt;p&gt;The Annex IV technical documentation is the most labor-intensive requirement. It demands 9 sections covering your system's general description, development process, monitoring and control mechanisms, performance metrics, risk management approach, lifecycle changes, applied standards, EU declaration of conformity, and post-market monitoring plan.&lt;/p&gt;

&lt;p&gt;Start drafting. Use templates where available. If you have multiple high-risk systems, look for common elements you can reuse across documentation.&lt;/p&gt;

&lt;p&gt;Also prepare your risk management system documentation (Article 9) and data governance records (Article 10).&lt;/p&gt;

&lt;p&gt;Month 4 (June): Implementation and Testing&lt;/p&gt;

&lt;p&gt;Implement any technical measures you're missing. Common gaps include automatic logging (Article 12) — make sure your system records key decisions, inputs, and outputs. Human oversight mechanisms (Article 14) — ensure humans can effectively monitor and intervene. Transparency information (Article 13) — create clear documentation for deployers about how your system works, its limitations, and correct use.&lt;/p&gt;

&lt;p&gt;If you're a deployer of AI tools, review your vendor's documentation. Under the AI Act, deployers must verify that their high-risk AI providers have conducted conformity assessments and registered their systems.&lt;/p&gt;

&lt;p&gt;Month 5 (July): Conformity and Registration&lt;/p&gt;

&lt;p&gt;Complete your conformity assessment. For most Annex III high-risk systems, this is a self-assessment (the provider evaluates their own compliance). Only biometric identification systems used in law enforcement require third-party assessment.&lt;/p&gt;

&lt;p&gt;Prepare your EU Declaration of Conformity (Article 47). Register your high-risk AI systems in the EU database (Article 49). The AI Office is expected to provide registration tools.&lt;/p&gt;

&lt;p&gt;Run a final compliance review. Fix any remaining gaps. Brief your team.&lt;/p&gt;

&lt;p&gt;What If the Omnibus Gets Adopted?&lt;/p&gt;

&lt;p&gt;If the Digital Omnibus passes before August 2, 2026, you'll have more time — but all the work you've done still counts. You'll simply have a buffer to refine and finalize rather than rushing.&lt;/p&gt;

&lt;p&gt;If it doesn't pass, you'll be compliant on time while competitors who gambled on the extension scramble to catch up.&lt;/p&gt;

&lt;p&gt;Either way, early preparation is the winning strategy.&lt;/p&gt;

&lt;p&gt;Special Considerations for SMBs&lt;/p&gt;

&lt;p&gt;The AI Act and the Omnibus include several provisions specifically aimed at reducing the burden on smaller companies.&lt;/p&gt;

&lt;p&gt;Simplified QMS: Under the proposed Omnibus, SMEs would access simplified quality management system requirements previously available only to microenterprises.&lt;/p&gt;

&lt;p&gt;Regulatory sandboxes: Each member state must have at least one AI regulatory sandbox operational by August 2026. These provide a controlled environment where companies can test AI systems with regulatory guidance.&lt;/p&gt;

&lt;p&gt;Proportional penalties: Fines are scaled to company size and revenue.&lt;/p&gt;

&lt;p&gt;Free tools: The AI Office's Service Desk answers compliance questions. The GPAI Code of Practice provides templates. And platforms like Complyance offer self-serve classification and gap analysis at a fraction of what enterprise consultants charge.&lt;/p&gt;

&lt;p&gt;The Bottom Line&lt;/p&gt;

&lt;p&gt;The August 2, 2026 deadline is real until legislation says otherwise. The Digital Omnibus might give you extra time, but it might not pass in time. Either way, every action you take now directly reduces your compliance gap and your risk exposure.&lt;/p&gt;

&lt;p&gt;The companies that start now will be the ones with a genuine competitive advantage — able to demonstrate AI compliance to enterprise buyers, avoid regulatory penalties, and build trust with customers who increasingly care about responsible AI.&lt;/p&gt;

&lt;p&gt;Start today. Classify your AI systems for free at complyance.io. Get your risk classification, see your compliance gaps, and build your roadmap — all in one session, no sales calls required.&lt;/p&gt;

&lt;p&gt;Disclaimer: This article is for informational purposes only and does not constitute legal advice. Compliance planning should be verified with a qualified legal professional specializing in AI regulation.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>saas</category>
      <category>regulation</category>
      <category>compliance</category>
    </item>
  </channel>
</rss>
