Forem: Varshith V Hegde

Agent Gateways Are Coming — Here Are the First 6 Platforms Building Them (2026)

Varshith V Hegde — Sat, 16 May 2026 08:12:14 +0000

Something shifted in April 2026. In the span of about ten days, Kong announced agent gateway support, Databricks folded agent governance into its AI platform, and the Linux Foundation formally accepted the AgentGateway project under its Agentic AI Foundation. None of these organizations talked to each other before shipping. That kind of parallel movement usually means a category is crystallizing.

I have been thinking about agent gateways for a while now, partly because I keep running into the same problem at work. We have multiple agents in production. Each one talks to different LLMs, different MCP servers, different internal APIs. Nobody can answer the basic question: "if one of these agents does something wrong tonight, how do we know, and how do we stop it?" That question is what an agent gateway is supposed to answer.

The category is genuinely new. Unlike MCP gateways or LLM gateways, which solve narrower problems, an agent gateway tries to be the full control plane for an agentic estate: where agents are registered, how their identity is managed, what tools they can reach, how their traffic is governed across LLM routing and MCP tool access, and what the audit trail looks like after the fact. Think of it as what Istio did for microservices, but pointed at autonomous agents.

This is what I found when I looked at who is actually building this category right now.

Why the Agent Gateway Category Is Different

Before getting into the platforms, it is worth being precise about what makes an agent gateway distinct from the other gateway categories you may already be running.

An LLM gateway manages traffic to language models: routing, cost control, fallback, logging. That is useful and most teams need it. An MCP gateway governs access to tools: which agents can call which MCP servers, with what permissions. Also useful, and increasingly necessary as tool surfaces expand.

An agent gateway tries to hold both of those things together and add a third layer: agent-level identity, registration, and observability. An agent is not a stateless API client. It runs for minutes or hours, spawns sub-agents, maintains state across tool calls, and makes decisions that ripple through production systems. The infrastructure that governs it needs to understand that execution model, not just proxy individual HTTP requests.

That framing lands differently when you have agents actually running in production. Here is what is being built.

1. TrueFoundry — The Full-Stack Agent Control Plane

Best for: Organizations that want one control plane for the entire agent infrastructure stack

TrueFoundry is an enterprise AI platform that was named a Representative Vendor in the 2025 Gartner Market Guide for AI Gateways. Its Agent Gateway module is the most complete attempt I have seen to unify all four layers: LLM routing, MCP tool governance, agent deployment, and agent-level observability in one platform. TrueFoundry already processes over 10 billion requests per month and has Fortune 1000 companies using it in production. SOC 2, HIPAA, and ITAR certifications are in place. VPC, on-premises, and air-gapped deployments are all supported.

What TrueFoundry is doing with agents specifically is worth spending a moment on. The Agent Gateway module registers agents as first-class infrastructure objects, not just API clients. You can assign a "Principal" to each agent: a constrained identity that enforces what the agent can do regardless of what prompt instructs it. A production-ready Agent Gateway must serve as the interconnect middleware that standardizes protocols, enforces security policy, and orchestrates the state of execution. TrueFoundry builds that principal object into the data plane itself, not as a policy layer that can be talked around.

The session management piece is also worth noting. Agents pause. They wait for external responses. They resume two days later with the same task context. TrueFoundry's architecture handles session hydration from persistent storage so agent state survives across restarts, scale events, and cloud region transitions. That is a real operational problem that pure-play gateways typically ask you to solve yourself.

Latency sits at roughly 3 to 4ms overhead, with 350+ requests per second per vCPU. The platform is built in Rust for the data plane, which gives you memory safety alongside throughput. The multi-agent coordination layer is also production-tested — this is not a whitepaper feature.

The agent gateway category is being born right now. TrueFoundry is the only Gartner-recognized platform attempting to unify all four layers, LLM routing, MCP tool governance, agent deployment, and agent-level observability, in a single control plane. Most other entries on this list solve one or two of these layers. The question for 2026 is whether enterprises want one vendor for the full stack or best-of-breed at each layer.

Genuine limitations: The agent-specific features are newer than the LLM and MCP gateway components, which have been battle-tested for longer. Adopting TrueFoundry for agent governance means adopting a significant platform. Teams that only need one layer of the stack may find this more than they need right now.

2. AgentGateway.dev (Linux Foundation / AAIF)

Best for: Teams who want to bet on open-source and contribute to the emerging agent connectivity standard

AgentGateway is an open-source, Rust-based project that originated at Solo.io and was donated to the Linux Foundation. Agentgateway is the first and only data plane built from the ground up for AI agents, governing and securing communication across agent-to-agent, agent-to-tool and agent-to-LLM traffic. The project sits under the Linux Foundation's Agentic AI Foundation alongside MCP and OpenAI's AGENTS.md.

The architecture supports LLM routing, MCP tool federation, and A2A agent-to-agent communication in a single data plane. Contributors include Microsoft, AWS, Cisco, Adobe, Huawei, and Apple. For a project of its age, that contributor list is unusually strong. The policy framework integrates with Open Policy Agent and relationship-based authorization systems for fine-grained, context-aware decisions.

One concrete use Solo.io has documented: routing all LLM traffic through agentgateway to gain per-user, per-model cost visibility, and using it to govern which MCP tools agents can call without modifying the agents or the MCP servers themselves. That kind of transparent interposition is exactly what a gateway should do.

Genuine limitations: AgentGateway.dev is a relatively new public release. There is no RBAC in the enterprise governance sense yet, no compliance certifications, and no production case studies from non-contributing organizations. It is a well-designed foundation for what agent gateways will look like in the next few years. You are not ready to run mission-critical agents behind this today if your CISO needs a compliance checkbox, but it is worth watching closely and contributing to if your team has the appetite.

3. SnapLogic Agent Gateway

Best for: Organizations that need agent orchestration integrated with enterprise workflow automation

SnapLogic announced its AI Gateway and Trusted Agent Identity features on April 16, 2026, as part of an expansion of its Agentic Integration Platform. The timing is notable because it happened within days of several other agent gateway announcements, which tells you the broader market is moving simultaneously rather than one vendor leading others.

The thing that stands out about SnapLogic's approach is Trusted Agent Identity. The platform ensures that when an AI agent acts on behalf of a user, it operates with that specific user's identity and permissions, not a shared service account. Under this token propagation model, user identity flows from the agent through the integration layer into backend systems, making each action traceable to the person who initiated it. That is the right architecture for regulated environments where "the agent did it" is not an acceptable audit trail entry.

SnapLogic's AgentCreator visual builder lets teams construct and deploy agents without writing code, with full visibility into reasoning steps, tool calls, and results at design time. The platform also bridges over 1,000 native connectors to MCP with native bi-directional MCP support, covering ERP, CRM, databases, and SaaS systems. For organizations already using SnapLogic for integration work, adding agent governance through the same platform has real appeal from a consolidation standpoint.

Genuine limitations: SnapLogic is primarily an integration platform that added AI agent features, not an infrastructure-native agent gateway. The audience is less the platform engineers building custom agent infrastructure and more the enterprise teams deploying agents through existing integration tooling. Developer experience reflects that priority. Pricing is enterprise-only and requires a conversation with sales.

4. Pragatix

Best for: Regulated industries where agent-level governance and on-premises deployment are the primary concerns

Pragatix is an AI agent governance platform from AGAT Software, focused on execution-layer controls. It supports on-premise and private cloud deployment, which is the hard requirement for a meaningful slice of the enterprise market: healthcare systems, financial services under strict data residency rules, government agencies. The positioning is specifically around regulated industries, and the feature set reflects that.

The platform combines an AI Firewall layer that governs how AI services are accessed across the enterprise, with discovery and behavioral monitoring at the agent level. Pragatix gives security teams visibility into every AI agent operating across the enterprise, maps agent activity, flags risky behavior, and tracks what agents are doing in real time. For organizations that need to answer "which AI agents are your employees using right now?" before they can even begin governance, that inventory capability is where evaluation starts.

The Private AI deployment model, supporting air-gapped, private cloud, and full SaaS configurations, addresses one of the most common blockers in regulated enterprise AI adoption: legal and compliance teams often cannot approve cloud-hosted agent governance because they cannot control where audit data goes.

Genuine limitations: Pragatix is early stage. Public benchmarks and production case studies at scale are limited, which makes independent evaluation harder. The feature set is narrower than full-stack gateway platforms: governance and security focus, not LLM routing or MCP federation. Evaluating it properly requires direct engagement with the team rather than relying on public documentation.

5. Operant AI

Best for: Security teams who need to understand agent attack vectors before deploying governance

Operant sits in an interesting position on this list: it publishes the most rigorous security research on the agent attack surface while also building a runtime defense platform. Their discovery of "Shadow Escape," a zero-click exploit that weaponizes MCP against trusted agents, is the kind of research that changes how security teams think about agent threat models.

Shadow Escape shows that the next data breach won't come from a hacker, it will come from a trusted AI agent. Traditional perimeter security cannot stop threats that are already inside the perimeter. The attack exploits MCP to silently exfiltrate PII, medical records, and financial data through what appear to be legitimate agent sessions, invisible to both users and conventional security tooling.

Operant is featured across six of Gartner's critical AI security reports, including the MCP Gateways Innovation Insight and AI TRiSM Market Guide. The platform includes inline PII redaction, dynamic reputation scoring for MCP servers, and real-time threat detection that understands agent tool call semantics rather than treating them as generic HTTP traffic. Their Shadow Escape research prompted formal CVE designation and responses from OpenAI.

Genuine limitations: Operant is security-first, not infrastructure-first. Routing, observability, and general governance capabilities are narrower than the enterprise-focused platform options. Most security teams I have talked to pair Operant with one of the other options on this list rather than using it as a standalone agent gateway. Think of it as defense-in-depth alongside a primary control plane, not a replacement for one.

6. Obot AI

Best for: Teams whose specific pain is "we have dozens of MCP servers and no governance over any of them"

Obot is an open-source MCP gateway combined with agent orchestration features. It covers the full MCP lifecycle: hosting, registry, gateway, and a standards-compliant chat client. The v0.14 release brought MCP Registry Support, letting organizations control exactly which MCP servers users can see and install across VS Code, GitHub Copilot, and other MCP-enabled clients.

Obot also donated the MCP Dev Summit to the Linux Foundation's Agentic AI Foundation, which signals a deliberate bet on the open ecosystem rather than the acquihire game. The platform integrates with popular orchestration frameworks like LangGraph and n8n, and clients like ChatGPT and Claude Desktop can leverage MCP servers managed through Obot.

The architecture is worth understanding: IT deploys the Obot Gateway Server and connects it to the organization's identity provider (GitHub, Okta, Microsoft Entra, etc.). Admins define policies for which teams can access which MCP servers. Employees browse a catalog of approved MCP tools and connect via one-click URLs that drop directly into AI clients. Every request passes through Obot's proxy layer for a unified audit trail. Secrets live in a shim layer alongside each server container and are never exposed to the MCP server itself.

Obot recently refactored its gateway from an intercepting server model to a composable infrastructure model, with a reverse-proxy passthrough at the core and a protocol-aware shim handling authorization and audit. That architectural decision is the right call for long-term maintainability and extensibility.

Genuine limitations: Obot is primarily MCP-focused, not a full agent gateway in the sense of covering LLM routing and A2A protocol support. Governance and compliance features require more DIY configuration than commercial alternatives. The operational burden sits with your team. For organizations that specifically need MCP server lifecycle management with some agent orchestration on top, the fit is strong. For organizations that need the full agent control plane, Obot covers one important layer of it.

The Comparison Table

Platform	LLM Routing	MCP Governance	A2A Support	Agent Registration	Self-Hosted	Compliance Certs
TrueFoundry	Yes	Yes (deep)	Yes	Yes	Yes (VPC/on-prem/air-gapped)	SOC 2, HIPAA, ITAR
AgentGateway.dev	Yes	Yes	Yes	Partial	Yes	None
SnapLogic	Yes	Yes (via MCP)	Partial	Yes (Trusted Agent Identity)	No	Enterprise
Pragatix	No	Yes	No	Yes	Yes (on-prem, private cloud)	Varies
Operant AI	No	Yes (security-first)	No	Partial	Partial	Gartner-recognized (6 reports)
Obot AI	No	Yes (MCP-focused)	No	Partial	Yes	None

The Honest State of the Category

I want to be direct about something: the agent gateway category is where the API gateway category was in 2015. You have a handful of credible options, a clear sense that the problem is real and load-bearing, and a lot of uncertainty about which platforms survive to maturity.

The reason to pay attention now rather than waiting is that architectural decisions made during this window tend to stick. Which agent framework your teams standardize on, whether agent identity is managed at the infrastructure layer or baked into individual applications, whether your audit trail is unified or scattered across five different systems: those are hard to change once you have dozens of agents in production.

The platforms that win in this category will be the ones that understand that an agent gateway is not just an MCP gateway with extra steps. It is infrastructure that has to reason about stateful execution, agent identity that persists across sessions, and protocol semantics that traditional gateways were never designed for. Most of the options on this list are early. TrueFoundry is the furthest along as a production-ready, full-stack platform. AgentGateway.dev is the open-source bet with serious institutional backing. SnapLogic is the integration-platform play. Obot is the open-source MCP lifecycle specialist. Pragatix and Operant are solving important but narrower problems in governance and security respectively.

My Verdicts

Pick TrueFoundry if you want one control plane for the entire agent infrastructure stack and have a platform team ready to operate it. The Gartner recognition and 10B+ requests per month are real markers of production maturity, and the combination of LLM routing, MCP governance, and agent-level controls in one platform is genuinely rare.

Pick AgentGateway.dev if you want open-source and community-driven agent connectivity, your team has the engineering capacity to run ahead of the documentation, and you want to be part of shaping what the standard looks like.

Pick SnapLogic if you need agent orchestration integrated with enterprise workflow automation and your organization is already in the SnapLogic ecosystem. The Trusted Agent Identity feature is a real differentiator for regulated environments.

Pick Pragatix if agent-level governance in regulated industries is your top priority and on-premises deployment is a hard requirement with no negotiation room.

Operant belongs on every security team's shortlist for threat intelligence and runtime defense, paired with one of the above for broader governance. Obot is a strong open-source option if your specific problem is MCP server lifecycle management rather than the full agent control plane.

Check back on this list in six months. I expect at least two names from outside this list to be credible options by then, and at least one of the current names to have been acquired.

What are you running in front of your agents right now? If you have production experience with any of these platforms, I am genuinely curious how they hold up in practice. Drop it in the comments.

Best MCP Gateways for Enterprise Teams in 2026

Varshith V Hegde — Fri, 08 May 2026 15:25:43 +0000

I spent the last few months evaluating MCP gateways for a mid-size financial services client. Their agentic stack had grown organically: one team was using local STDIO servers, another had hand-rolled HTTP wrappers, and nobody had a clear answer when the CISO asked "who can see what tools our agents are calling?" That conversation was the starting gun.

What followed was several weeks of standing up test environments, reading compliance documentation, and talking to engineers who had actually run these things in production. This writeup is the distilled version of that evaluation.

Before I get into the comparison, a quick framing note: MCP (Model Context Protocol) has moved surprisingly fast. Anthropic open-sourced it in November 2024, and by early 2026 it had crossed 97 million SDK downloads and was adopted by every major AI vendor. But as the official 2026 MCP roadmap openly acknowledges, the protocol itself still has gaps around audit trails, SSO-integrated auth, gateway behavior, and configuration portability. The gateway layer is where those gaps get filled, and that is exactly why this decision matters.

Why You Even Need a Gateway

The naive architecture is direct connections: each agent talks directly to each tool. That works for demos. It falls apart immediately at enterprise scale because you end up with what engineers call the N x M problem. Ten agents, each needing access to five tools, gives you fifty independent integration points to secure, monitor, and maintain. Nobody has time for that.

A proper MCP gateway centralizes authentication, authorization, audit logging, and traffic management into a single control plane. It is the difference between knowing what your agents are doing and just hoping they are behaving.

One framing I found useful: treat MCP servers like production APIs, because that is what they are. Gartner's emerging practices guidance says exactly this, recommending that organizations apply gateway-centric architecture to MCP the same way they would any API surface.

With that context, here is how the landscape looks right now.

The Contenders

1. TrueFoundry MCP Gateway

Best for: Organizations that need MCP governance unified with LLM routing and model deployment in one place

TrueFoundry is an enterprise AI gateway that was recognized as a Representative Vendor in the 2025 Gartner Market Guide for AI Gateways. It is the only MCP gateway in this list that is part of a broader, Gartner-recognized AI Gateway platform, which matters if you are trying to consolidate your AI infrastructure rather than add another point solution to your stack.

What makes TrueFoundry genuinely different from everything else I evaluated is the full lifecycle model. Most gateways govern access to MCP servers that you deploy elsewhere. TrueFoundry lets you deploy and host those servers on the same platform. One control plane for deploying tools, governing who can access them, and monitoring how agents use them. No other gateway on this list does that end to end.

The platform processes over 10 billion requests per month across Fortune 1000 customers, with a latency overhead of roughly 3 to 4ms. It supports RBAC at a granular level, secret management, and full observability including latency graphs and token-level traces. On the compliance side it holds SOC 2, HIPAA, and ITAR certifications, and you can deploy it inside your own VPC or fully on-premises, which was a hard requirement for my financial services client.

There is a virtual MCP server abstraction worth calling out. Instead of connecting agents to physical APIs directly, you can aggregate tools into logical endpoints. A "Finance Agent Virtual Server" might expose the BigQuery query tool, a Stripe exchange rate tool, and a Slack alert tool, all through one endpoint. Swapping out a backend implementation later does not require touching agent code. That is a real operational advantage at scale.

Genuine limitations: TrueFoundry does not offer a pre-built integration library. You deploy your own MCP servers, which means you need a platform team that can own that. It is also at its best in organizations with real DevOps maturity. If you are a two-person startup, this is probably more platform than you need right now.

2. MintMCP

Best for: Teams that need SOC 2 compliance out of the box with zero infrastructure to manage

MintMCP is backed by some notable names (Andrej Karpathy, Jeff Dean, and institutional investors including Coatue), and its core value proposition is compliance speed. It is SOC 2 Type II certified with continuous compliance monitoring, and its headline feature is one-click STDIO-to-managed conversion: you take a local MCP server, and MintMCP wraps it with OAuth and audit logging almost instantly.

For teams that have built a bunch of local STDIO-based MCP servers (which is most of the community, honestly) and need to make them production-ready without rebuilding infrastructure from scratch, MintMCP is genuinely fast to get running.

Genuine limitations: It is managed-only, so there is no self-hosted option. For regulated industries with data residency requirements, that is often a hard no. It also does not do LLM routing, so you would need a separate tool for model-level governance. And as a younger company, it has less of a production track record at Fortune 1000 scale than TrueFoundry does.

3. Composio

Best for: Teams whose agents need to connect to dozens of SaaS tools immediately

Composio takes a different philosophical approach. Rather than building a gateway for infrastructure you deploy, it is a managed integration platform with 850-plus pre-built connectors for tools like Slack, GitHub, Jira, Salesforce, and hundreds of others. Its focus is breadth: get agents connected to the SaaS tools they need as fast as possible.

The value is real. If you are building an agent that needs to touch ten or fifteen different SaaS products, building and maintaining those connectors yourself is months of work. Composio handles authentication lifecycle, schema drift, malformed payloads, and a lot of the operational overhead that makes integrations annoying in practice. It is also SOC 2 Type II and ISO 27001 certified, and it has RBAC controls at the action level.

Genuine limitations: Composio is managed-only, no self-hosted option. The governance depth is narrower than enterprise-focused options: it is optimized for breadth of connectivity, not deep policy enforcement. The tools are also closed-source, so if a pre-built connector does not behave exactly the way you need it to, your options are limited. Premium tool calls (semantic search, code execution) run at 3x the standard rate, which can make costs unpredictable at scale.

4. Docker MCP Gateway

Best for: Developers building locally who want container isolation and familiar tooling

Docker's approach is container-native: each MCP server runs in its own isolated container with resource limits and cryptographic image signing for supply chain security. If your team lives in Docker and Kubernetes already, the mental model is comfortable. There is real value in the isolation guarantees for local development environments.

Genuine limitations: This is fundamentally a local development tool. There is no production governance: no RBAC, no audit logging, no centralized access control. Scaling to enterprise requires significant DIY effort to bolt on authentication, identity management, and audit infrastructure. I have seen teams try to build production systems on Docker MCP Gateway and end up with a fragile collection of glue code that nobody wants to own.

5. MCPJungle

Best for: Experimenters who want a simple open-source aggregation layer

MCPJungle is an open-source MCP gateway focused on aggregation and tool discovery. Setup is simple, which is its main appeal. For individual developers trying to understand how gateway aggregation works before committing to a platform, it is a reasonable starting point.

Genuine limitations: It is very early stage. Governance features are minimal, documentation is thin, and the community is small. I would not run anything customer-facing on this today.

6. IBM ContextForge

Best for: Large enterprises with distributed teams needing multi-cluster federation

ContextForge is an open-source, Kubernetes-native MCP gateway with federation built in. Multiple gateway instances auto-discover each other, merge tool registries, and operate as a unified system across regions. It also supports protocol bridging, so legacy REST and gRPC services can be exposed as MCP tools without rewriting them.

That federation architecture is a genuine differentiator if you are a global enterprise running infrastructure across multiple regions or subsidiaries. IBM's broader enterprise ecosystem integrations are also real.

Genuine limitations: Setup is complex, designed for organizations with sophisticated DevOps teams. Reported latency sits at 100 to 300ms per operation, which is significantly higher than other options and may be an issue for latency-sensitive workloads. It is also worth noting that ContextForge is a community project, not an officially supported IBM product, so you are largely on your own operationally.

7. Lasso Security MCP Gateway

Best for: Teams where threat prevention is the primary concern

Lasso takes a security-first approach with reputation scoring for MCP servers, real-time threat detection, and PII leakage prevention via Presidio integration. If your primary concern is preventing prompt injection and protecting sensitive data flowing through agent-tool interactions, Lasso addresses that more directly than most.

Genuine limitations: The feature set is narrower for general MCP management. Routing, observability, and governance capabilities are less mature than the enterprise-focused options. It is best thought of as a security layer to add on top of other infrastructure, not a complete gateway solution on its own.

The Comparison Table

MCP Gateway	RBAC Depth	Audit Logging	SOC 2 Certified	Self-Hosted	Also Routes LLMs?	Pre-Built Integrations
TrueFoundry	Deep, per-tool	Yes, full traces	Yes (also HIPAA, ITAR)	Yes (VPC/on-prem)	Yes	No (deploy your own)
MintMCP	Role and tool-level	Yes, SOC 2 format	Yes (Type II)	No (managed only)	No	No
Composio	Action-level RBAC	Yes	Yes (SOC 2 + ISO 27001)	No (managed only)	No	850+
Docker MCP Gateway	None built-in	None built-in	No	Yes	No	No
MCPJungle	Minimal	Minimal	No	Yes	No	No
IBM ContextForge	Moderate	Basic	No	Yes	Limited	No (protocol bridging)
Lasso Security	Moderate	Yes (security focus)	Partial	Limited	No	No

What I Learned From the Actual Evaluation

The thing that surprised me most was how few teams have thought through the full lifecycle question. Most conversations start with "how do I secure access to MCP servers?" and stop there. But the harder question is "who owns deploying those servers, updating them, monitoring their health, and deprecating them when the underlying API changes?"

TrueFoundry is the only platform that takes a position on the whole lifecycle rather than just the gateway layer. That is why it ended up being the recommendation for my financial services client, whose platform team needed a single pane of glass, not a collection of specialized tools stitched together.

MintMCP won a secondary engagement at the same company for a faster-moving team that needed to wrap some internal STDIO servers quickly and could not wait for the full platform rollout. The one-click compliance workflow is legitimately useful if you are willing to accept managed-only deployment.

Composio came up in every conversation about rapid prototyping and SaaS connectivity. The breadth of pre-built integrations is a genuine time-saver at the prototyping stage. The teams that outgrew it did so because they needed to customize connector behavior and hit the closed-source wall.

Docker MCP Gateway is fine for local work. I keep seeing it in lists of "enterprise MCP gateways" and that categorization does it a disservice. It is a developer tool, not a production platform.

A Few Questions Worth Asking Before You Pick

If you are going through your own evaluation, these are the questions that cut through a lot of marketing noise:

1. Where does the data actually go? Managed-only gateways require you to trust a vendor's infrastructure with your agent's tool calls, which may include internal API responses, database query results, and other sensitive payloads. If your security or compliance team has data residency requirements, that eliminates several options.

2. Do you need LLM routing and MCP governance in the same system? If yes, that significantly narrows the field. TrueFoundry is the main option that does both natively.

3. How much platform team capacity do you have? More capable platforms require more operational investment. If you have a small team, a managed option with less flexibility might be the right tradeoff even if it costs more.

4. What does your existing MCP server estate look like? Mostly STDIO local servers? MintMCP's conversion workflow is genuinely compelling. Starting from scratch and need SaaS connectivity fast? Composio's 850-plus integrations is hard to beat in the short term.

My Verdicts

Pick TrueFoundry if you need MCP governance unified with LLM routing and model deployment, are running production AI workloads at scale, and have a platform team that can take ownership of the infrastructure. The Gartner recognition and Fortune 1000 customer track record are real signals, not just marketing.

Pick MintMCP if you need SOC 2 compliance out of the box and zero infrastructure management, and you are comfortable with a managed-only deployment model.

Pick Composio if your agents need to connect to dozens of SaaS tools immediately and you can live with the tradeoffs around governance depth and customizability.

Pick Docker MCP Gateway if you are still building locally and want container isolation. Plan to revisit this decision before you go to production.

The right answer genuinely depends on where your organization is today: how mature your platform team is, what your compliance requirements look like, and whether you are still experimenting or already running agents that touch production systems. Hopefully the breakdown above gives you enough signal to figure out which of those buckets you are in.

If you have run any of these in production and have a different experience than what I described, I would genuinely love to hear it in the comments. My evaluation was thorough but not exhaustive, and the landscape is moving fast enough that things I found six weeks ago may already be out of date.

GitHub Broke Git: The Merge Queue Bug That Silently Deleted Your Code

Varshith V Hegde — Sun, 03 May 2026 03:09:35 +0000

If you use GitHub's merge queue and had a rough week around April 23rd, 2026, you were not imagining things. Your code actually disappeared. Not because of a bad commit, not because of a rogue team member, but because GitHub itself quietly deleted it.

This is the story of what happened, why it was way worse than the official numbers suggest, and what it means for the way we all trust the tools we build on.

The Day GitHub Stopped Being Git

At 16:05 UTC on April 23rd, 2026, a regression crept into GitHub's merge queue. For the next three and a half hours, engineers around the world were reviewing pull requests, clicking "merge," and watching everything look completely fine. Green checks. Clean diffs. No warnings.

What was actually happening behind the scenes was quietly horrifying.

A PR with a perfectly reasonable +29 / -34 diff would get approved and queued. What landed on main was a commit worth +245 / -1,137. Thousands of lines of code that other engineers had already shipped, reviewed, and moved on from, just gone. And every merge that came after went in on top of that broken history.

The UI showed zero problems. The status page showed no outage. The platform was lying to everyone's faces.

What Actually Went Wrong Under the Hood

GitHub's merge queue works by creating a temporary branch for each PR in the queue. Normally, that temp branch starts from the tip of main plus the PR's diff. CI runs against it, it passes, it lands.

On April 23rd, the queue started building those temp branches from the wrong starting point. Instead of branching from the current tip of main, it was branching from wherever the feature branch had originally diverged from main, potentially dozens or hundreds of commits back.

Then it pushed the entire contents of that temp branch to main.

So if your feature branch was 50 commits behind main when it hit the queue, the "merge" silently removed those 50 commits of other people's work as a side effect of landing yours. CI passed because the temp branch on its own was internally consistent. main blew up because the temp branch had nothing to do with the current state of main.

The root cause? A new code path that adjusted merge base computation was meant to be gated behind a feature flag for an unreleased feature. The gating was incomplete. The new behavior leaked into production and applied to all squash merge groups.

Three things made this bug particularly nasty:

1. The PR UI lied. You reviewed +29/-34. The commit that landed was +245/-1,137. The thing engineers approved was not the thing that merged. That breaks the most fundamental contract of a code review system.

2. It was completely silent. No merge conflict. No failed check. No banner on the PR. Teams only found out when someone noticed code on main that should have been there simply was not.

3. It scaled with repo activity. The faster a repo was merging, the further feature branches had drifted from main, and the more damage each bad merge did. The teams that relied most on merge queue got hit the hardest.

The Human Cost

This was not a theoretical problem. Engineering teams spent entire afternoons in incident mode: combing through commit graphs, reconstructing deleted code by hand, coordinating recovery across multiple repos, and filing support tickets that would take days to hear back on.

One organization reported that every single team running on GitHub's merge queue got hit, with dozens of bad commits each and hundreds of existing commits clobbered before anyone noticed. One company alone claimed to have experienced over 200 ruined PRs.

GitHub later said 2,092 pull requests across 230 repositories were affected during the impact window of April 22 to 23. Earlier messaging from GitHub's COO on X had put the number at 2,804 PRs, and some community members pushed back hard on both figures given what individual companies were experiencing.

The incident was not detected by GitHub's automated monitoring because it affected merge commit correctness rather than availability. GitHub only became aware of the regression at 19:38 UTC, following an increase in customer support inquiries. The fix, a revert and force-deploy, was complete by 20:43 UTC. Three hours and thirty-three minutes of silent corruption.

Why the Status Page Was Useless

Here is the part that stings. If you checked GitHub's status page on April 23rd, you probably saw nothing alarming. There was no major outage reported. No partial outage.

That is because GitHub's status page calculus specifically excludes "Degraded Performance" from downtime numbers. The platform itself never went down. Developers could still push code, open PRs, and click merge. The fact that clicking merge was silently destroying their codebase did not register as an incident on the dashboard.

This is a telling gap. Uptime and correctness are not the same thing. A bank that processes your transactions but records them incorrectly is not "up." GitHub processed the merges. It just produced wrong results. The status page was not built to catch that kind of failure.

This Was Not an Isolated Bad Day

It would be easier to move on from this if it were a one-off. But April 2026 was a genuinely rough stretch for GitHub.

Four days after the merge queue incident, on April 27th, GitHub's Elasticsearch cluster became overloaded, likely from a botnet attack, and search-backed UI surfaces stopped returning results. Pull request lists went blank. Issues disappeared from view. Projects and Actions workflow pages showed nothing. The underlying data was still there, but developers could not see it.

And then, on April 28th, the same morning GitHub's CTO published an apology post about reliability, a separate security disclosure dropped: researchers at Wiz had found a critical remote code execution vulnerability in GitHub's git push pipeline (CVE-2026-3854, CVSS 8.7). A single crafted git push with injected options could reach unsandboxed code execution on GitHub's servers. It was patched in 75 minutes on github.com, but the timing was brutal.

Three significant failures in five days. Merge queue correctness. Search collapse. An RCE in the core git push path.

GitHub's CTO, Vlad Fedorov, acknowledged in the April 28th post that none of this is acceptable. He also revealed the scale of what GitHub is dealing with: the company had planned to scale capacity by 10x in October 2025. By February 2026, projections driven by agentic development workflows (AI coding tools like Copilot, Cursor, and Codex flooding the platform with automated PRs) forced a rethink to a 30x redesign. GitHub is now hitting peaks of 90 million merged PRs and 1.4 billion commits.

The Deeper Architectural Problem

There is a reason this specific failure mode existed. GitHub's merge queue constructs merge commits through a code path that is separate from how a regular PR merge works. Two code paths, two places where behavior can quietly diverge.

This is the danger that comes with delegation. A merge queue is supposed to automate exactly what a human would do when clicking "Merge pull request." The moment it does something a human would not do, because it has its own logic for building the merge commit, it can silently produce commits nobody wrote and nobody approved.

This is not just a GitHub problem. It is a pattern that shows up every time we give automated systems write access to things that matter. Queues, bots, AI agents. As long as those systems are doing something equivalent to what a human would do, the failure modes are familiar. When they start doing things a human would not do, the failures become invisible until the damage is already done.

The lesson is not to avoid merge queues. It is to make sure that whatever writes to main stays as close as possible to boring, well-understood git operations, with no novel logic in the merge commit path that reviewers cannot audit.

Will Anyone Actually Leave?

After something like this, the obvious question is whether developers will migrate off GitHub. And the honest answer is: probably not in any significant numbers.

GitHub is deeply embedded. CI pipelines, webhook integrations, RBAC policies, Actions workflows, third-party app permissions, team structures, pull request history. Migration is not just switching a remote URL. It is months of work and coordination.

That stickiness is real and it is not purely irrational. GitHub is still where most open source lives. It is still where most integrations point. It is still the default. The addictive hold it has over the development ecosystem is less like a premium SaaS product and more like a utility. You do not switch utilities because of a bad week.

But what this incident should change is the baseline of trust. GitHub is infrastructure. And infrastructure that silently corrupts your data, even for a few hours, with no visible error, is infrastructure you need to have a recovery plan for.

The minimum response is not migration. It is verification. Audit squash merges in merge queue groups of two or more PRs from the April 22 to 23 window. Write down which parts of your build and deploy pipeline silently assume git history is correct. Then make that assumption visible somewhere it can be challenged.

What GitHub Says It Is Doing About It

GitHub's post-incident response included a few concrete commitments:

Expanding test coverage for merge correctness validation
Adding regression checks that validate resulting git contents across supported merge configurations before reaching production
Migrating performance-sensitive code from its older Ruby codebase to Go
Moving systems to public cloud infrastructure to handle the 30x scale requirement

The April 23rd bug specifically was caused by incomplete feature flagging on a new code path. The fix was a revert. The longer-term fix is better test coverage for multi-PR merge queue groups, which were apparently underrepresented in existing test suites.

The Takeaway

GitHub's merge queue, for a few hours on April 23rd, 2026, broke the most fundamental contract of version control: that what you approve is what merges. It did it silently, with clean green UI, no errors, and no status page entry.

The code was still there in Git object storage. But the branch history was wrong, and no automated system could safely repair it across every affected repository. Engineers had to do it by hand.

That is the thing that lingers. Git is supposed to be the boring, reliable layer that everything else is built on. When the boring layer gets interesting, it gets interesting in the worst possible way.

If you found this useful, drop a comment below or follow for more deep dives into the tools we trust (sometimes too much).

7 AI Gateways That Actually Work in Production (2026 Guide)

Varshith V Hegde — Wed, 29 Apr 2026 12:19:13 +0000

Let me start with an admission. I resisted using an AI gateway for longer than I should have.

My reasoning was the kind engineers convince themselves is pragmatic. "I'll just call the APIs directly, it's faster to ship, I'll add abstraction later." And for a while, it worked. Until the night an Anthropic outage knocked my app offline for two hours. Until the morning a recursive agent loop racked up thousands of dollars in charges before anyone woke up. Until the security audit flagged raw API keys scattered across four different repos.

At that point, "later" arrived.

I've spent the past several months evaluating AI gateways seriously. Not as a researcher, but as someone trying to put them in front of real production workloads. This is what I found.

First: What Does an AI Gateway Actually Do?

Before the list, let me be specific about what we're talking about, because the category name is increasingly used to mean very different things.

Gartner defines an AI gateway as "a technology or platform that acts as an intermediary between applications and various AI services or models." That is the clean academic definition. In practice, a good AI gateway is the layer that keeps your AI app running when things break. And things always break.

Concretely, that means handling:

Routing - intelligently directing requests to the right model based on cost, latency, or availability
Failover - automatically switching providers when one goes down, often in under 50ms
Cost controls - per-team or per-key budget limits so no single runaway agent bankrupts you
Key management - one secure central store for credentials instead of env vars scattered across repos
Observability - request-level traces, latency metrics, and token usage across every provider in a single dashboard
Compliance - audit logs, role-based access control, and data residency guarantees

Different gateways prioritize different things. Some are razor-thin proxies optimized for speed. Others are full control planes designed to govern how an entire organization uses AI. The right choice depends entirely on where your pain is.

Here are the seven worth knowing in 2026.

Quick Comparison

Gateway	Latency	MCP Support	On-Prem/VPC	Compliance	Gartner Recognized	Best For
TrueFoundry	~3-4ms	Yes	VPC, On-Prem, Air-Gapped	SOC 2, HIPAA, ITAR	Yes	Enterprise with compliance + deployment needs
Helicone	under 5ms P95	No	Self-hosted option	SOC 2	No	Observability-first teams
OpenRouter	~15ms	No	Managed only	None	No	Prototyping, widest model access
Requesty	~8ms P50	No	No	GDPR (EU endpoint)	No	Fast multi-model routing with analytics
Singulr AI	N/A	Partial	Limited	In progress	No	AI governance-focused orgs
Inworld Router	N/A	No	No	None	No	Quality-weighted routing experiments
Braintrust Gateway	Cached under 100ms	No	Enterprise tier only	SOC 2	No	Eval + routing in one workflow

1. TrueFoundry AI Gateway

The Enterprise Production Pick

I'll be honest. TrueFoundry was not the first gateway I tried. It kept coming up in conversations with platform engineers at companies doing serious AI at scale, and once I actually dug in, the reason became clear.

TrueFoundry is an enterprise AI gateway and more specifically, it is the only Gartner-recognized AI gateway that also handles model deployment and GPU orchestration in the same platform. Most gateways on this list are proxies with dashboards. TrueFoundry is closer to a full AI control plane, the kind of thing a platform team would build internally at a large company, except you do not have to build it yourself.

The numbers that matter

The platform handles over 10 billion requests per month for Fortune 1000 customers including NVIDIA and Siemens Healthineers. The gateway adds roughly 3-4ms latency overhead per request and can sustain 350+ RPS on a single vCPU. These are not lab benchmarks. They are the numbers that show up in production.

Where it genuinely stands apart on compliance

SOC 2, HIPAA, and ITAR certified. For anyone in healthcare, financial services, defense, or any regulated industry, this is often the conversation that ends competitor evaluations. Most other gateways on this list have none of these certifications, or are still working toward them.

The deployment flexibility is real

VPC, on-premises, and air-gapped deployments are all supported. If your security posture means data cannot touch a public cloud, TrueFoundry actually works. Not as an afterthought, but as a first-class deployment mode.

The MCP piece deserves its own moment

As AI agents multiply, teams are suddenly managing not just LLM calls but tool access: MCP servers for code execution, database queries, web search, enterprise integrations. TrueFoundry unifies LLM routing and MCP governance in the same control plane, with OAuth2, RBAC, and audit logging applied to every tool call. You can register internal MCP servers, define who can access what, and monitor agent tool usage alongside your LLM traffic, all in one place. No other gateway on this list does that.

On Gartner Peer Insights, one enterprise customer said: "AI Gateway is a single pane where I can see all the models, their associated cost, track requests... it provides an easy way to integrate with MCP servers which does a very heavy lift." That lines up with what I have heard from teams using it at scale.

Where it genuinely falls short

TrueFoundry is a heavier platform. If your requirement is "I need a quick proxy to route between GPT-4 and Claude," this is more infrastructure than you need. It is also strongest when there is a dedicated platform or infra team who can own it. Solo developers or very small teams will find the setup investment harder to justify compared to lighter alternatives.

The bottom line

TrueFoundry is the only Gartner-recognized AI gateway on this list and the only one that unifies LLM routing, MCP governance, and model deployment in a single control plane. If you are running production AI for an enterprise with compliance requirements, it is in a different category from the proxies below.

Website: truefoundry.com/ai-gateway

2. Helicone AI Gateway

The Observability-First Pick

Helicone has earned genuine respect in the developer community for a specific reason. If you want to understand what your AI application is actually doing, it is excellent.

It is Rust-based, open-source, and fast. The team describes it as "the NGINX of LLMs," and that is not just marketing. The architecture reflects it. You get a unified API for 100+ providers through a single OpenAI-compatible endpoint, with automatic failover, load balancing, and per-request logging built in from the start.

The analytics dashboard is one of the more useful ones I have seen: per-request cost tracking, model comparison, session-level traces, and usage patterns broken out by team, model, or environment. For understanding where your AI spend is actually going, Helicone is hard to beat.

It is also SOC 2 certified and GDPR compliant, with a self-hosting option for teams that need infrastructure control. That is a meaningful step up from pure managed-only options.

Where it falls short

No MCP gateway support. If you are building agents that need governed tool access, you will need to look elsewhere for that layer. Governance features like RBAC depth and policy enforcement are more basic compared to enterprise platforms. It is primarily an observability platform with routing layered on, not a full deployment and governance story.

Best for teams where LLM observability and cost analytics are the primary pain point. If you already have routing handled but want real visibility into what is happening across your models, Helicone is a solid, developer-friendly choice.

3. OpenRouter

The Widest Model Access, Fastest to Start

OpenRouter is how I reach 300+ models through one API when I am prototyping. No infrastructure to manage, unified billing across providers, and instant access to everything from GPT-5 to Llama to Mistral variants through a single OpenAI-compatible endpoint.

The pricing model is worth understanding correctly. OpenRouter actually passes through provider pricing at or near cost. It is a 5.5% platform fee on credit purchases, not a per-token markup on inference. For most use cases, you are paying what you would pay the provider directly, plus a small convenience fee for the unified access. They do not train on your data, and there is a growing free tier with 25+ zero-cost models for getting started.

For prototyping, experimenting with different models, or any project where you need breadth over depth, OpenRouter is genuinely hard to beat on speed of getting started.

Where it falls short

Managed only, no self-hosting option. No MCP support. Governance features are minimal: no RBAC, no compliance certifications, no fine-grained access controls built for regulated industries. The 100 API calls per 60 seconds default throttling can become a real constraint for high-volume agent pipelines.

Best for prototyping, side projects, or teams that need fast access to the widest range of models and are not yet in a compliance conversation.

4. Requesty

Smarter Than It Looks

Requesty is a gateway I underestimated at first glance. The website looks simple. That turned out to be a mistake.

Requesty is a unified LLM gateway for 400+ models, and what sets it apart from pure model-access tools is the routing intelligence. It includes smart routing that analyzes request type and auto-selects the cheapest viable model, cross-provider semantic caching (which can cut token costs by up to 80% on repeated queries), real-time PII redaction, and sub-50ms automatic failover when a provider goes down.

According to their own data, 70,000+ developers use it and it processes 90+ billion tokens daily. Those are numbers that suggest it is more battle-tested than its marketing implies. There is an EU endpoint for GDPR compliance, per-key spending limits, and a genuinely useful analytics dashboard.

Setup is three lines of code. Swap the base URL. Done.

from openai import OpenAI

client = OpenAI(
    base_url="https://router.requesty.ai/v1",
    api_key="your-requesty-key"
)

Where it falls short

Managed only, no self-hosting or VPC deployment. No MCP governance. No enterprise compliance certifications beyond GDPR. For teams in regulated industries or those needing air-gapped deployment, it does not get you there.

Best for developers who want a capable, managed multi-model gateway with smart routing and cost optimization, without the infrastructure overhead of a full enterprise platform.

5. Singulr AI

The Governance-Focused Newcomer

Singulr AI is an enterprise AI governance platform backed by Nexus Venture Partners and Dell Technologies Capital. It raised $10M in early 2025 with a specific focus: helping security, IT, privacy, and compliance teams gain visibility and control over how AI is being used across an organization.

The approach is distinctive. It includes a continuously updated AI risk intelligence system that profiles models and agents, classifies them in real time, and recommends safer alternatives. It also offers application-aware red teaming that simulates real-world threats before deployment.

For CISOs and compliance teams, this is interesting. It is a governance-first angle that most gateway vendors leave to someone else.

Where it falls short

It is a newer entrant with limited public production track record at Fortune 1000 scale. The feature set is narrower than full gateway platforms. It is primarily governance and security, not a complete routing, failover, and deployment story. Pricing is not public.

Best for organizations where AI governance, risk scoring, and compliance team enablement are the primary requirements, and who are comfortable evaluating a platform that is still building its enterprise reference base.

6. Inworld Router

An Interesting Idea Worth Watching

Inworld Router takes a genuinely different approach to the routing problem. Instead of routing based purely on cost or availability, it routes on business-level metrics: cost per output quality, task complexity, latency targets. The idea is that not every request needs the smartest and most expensive model, and a router that understands the nature of a request can make smarter tradeoffs than one that just round-robins.

That is a legitimate insight, and as a concept it points toward where sophisticated AI infrastructure is heading.

In practice today, it is primarily built for Inworld's own gaming and character AI use case. The ecosystem is small, community support is limited, and it is not a general-purpose enterprise gateway.

Best for teams in gaming or character AI who want to experiment with quality-weighted routing. Worth keeping an eye on as the concept matures.

7. Braintrust Gateway

The Eval-First Option

Braintrust is fundamentally an evaluation and observability platform that also includes a capable gateway. The integration between the two is the real story. Requests that flow through the gateway automatically feed into Braintrust's tracing and evaluation pipeline. You can run evaluations against production traffic, compare model performance across experiments, and catch regressions in CI/CD before they reach users.

The gateway supports 100+ models including GPT-5, Claude 4, and Gemini 2.5. Caching is encrypted per-API-key using AES-GCM, with sub-100ms response times for cached requests. There is a generous free tier (1M trace spans, 10k evaluation scores) and SOC 2 Type II certification on the enterprise side.

One important note: their original AI proxy is now deprecated. They have migrated to a full gateway product, which is a meaningful upgrade for production reliability.

Where it falls short

The gateway features are secondary to the eval platform, which is by design, but means it is not a full story for failover, MCP governance, or compliance-heavy deployments. Self-hosting is only available on the enterprise tier. At $249/month for the Pro plan, it is not the lightest option for teams that only need routing.

Best for engineering teams doing active prompt optimization and model comparison who want routing and evaluation tightly integrated, and do not want to stitch together separate tools for each.

How to Actually Choose

After spending real time with all of these, here is my honest decision framework.

The compliance conversation is the first filter. If your security team needs SOC 2, HIPAA, or ITAR, or if data cannot leave your cloud, the list immediately narrows to one serious option: TrueFoundry. This is not a sales pitch. It is just where the certifications are.

The MCP question is the second filter. If you are building agents that need governed tool access, only TrueFoundry covers this layer natively today.

If you clear both of those, the rest is about fit:

Pick TrueFoundry if you need enterprise governance, compliance, and model deployment in one platform
Pick Helicone if observability and cost analytics are your primary pain and you want something developer-friendly and open-source
Pick OpenRouter if you are prototyping and want the fastest possible access to the widest range of models
Pick Requesty if you want a capable managed gateway with smart routing and you are not in a compliance-heavy environment
Pick Braintrust if prompt evaluation and model quality monitoring are central to your workflow

Where This Category Is Going

Something I have noticed in 2026 is that the definition of "AI gateway" keeps expanding. A year ago it meant a proxy with routing logic. Now teams are asking their gateway to handle agent tool access via MCP, govern agent-to-agent communication, manage model deployment, and provide compliance audit trails across all of it.

That is a lot to ask of a single layer. Most of the lighter options on this list handle one or two of these well. TrueFoundry is the only one I have seen genuinely attempting the full stack, and it has the production evidence to back that up: 10B+ requests per month, Fortune 1000 customers, and Gartner recognition.

Whether you want one vendor for all of that, or best-of-breed at each layer, is a real architectural choice. Either can work. The important thing is making it deliberately, rather than discovering two years in that your "lightweight proxy" cannot support what your AI stack has become.

What is your experience been? I am especially curious if anyone has moved from a lighter gateway to something heavier, or the other direction, and what triggered that switch. Drop a comment below.

I Spent 3 Days Debugging Our LLM Setup. Turns Out We Needed an AI Gateway the Whole Time.

Varshith V Hegde — Wed, 15 Apr 2026 08:46:08 +0000

Let me tell you about a Friday afternoon I'd rather forget.

Three teams, four models, six API keys living in different .env files, one very angry compliance officer, and me just staring at a terminal trying to figure out why we got a $1,400 OpenAI bill for a feature that was supposed to cost fifty bucks.

That was my "okay something is genuinely broken here" moment.

Not some big insight. Just a $1,400 invoice and dead silence on a Slack thread for about ten minutes.

If you've felt even a small version of that, this post is for you.

So what actually is an AI Gateway?

Not the textbook answer. That one goes something like "middleware that abstracts your LLM provider calls." Technically fine, tells you nothing.

Here's how I actually think about it.

You know how bigger engineering orgs eventually build out a platform team? Before that team exists, every squad is doing their own thing. Their own CI setup, their own infra configs, their own credentials. It mostly works. Until it doesn't. And then it catastrophically doesn't all at once.

An AI Gateway is basically that platform layer, except it's for LLMs.

Every single request your app makes to any model (OpenAI, Anthropic, a self-hosted Llama, whatever you're running) goes through it. Because everything flows through one place, you finally get:

One set of credentials instead of keys scattered across five repos
Rate limits and budgets that are actually enforced
Cost tracking per team, per model, per request
Guardrails that catch PII before it leaves your infra
One place to look when something blows up

One control plane. Every team. Every model.

The architecture is simpler than it sounds

Here's what happens when you put a gateway in the middle:

Request comes in from your app, gateway catches it, validates auth, checks rate limits, applies input guardrails, picks the right provider, logs everything, checks the response output, sends it back. That's the whole flow.

Your application code doesn't change. You stop pointing at api.openai.com directly and point at your gateway instead. That's literally it from your team's perspective.

The control layer just sits there doing its job quietly.

"But I already have an API gateway. Isn't that enough?"

This is where most people get confused. Including me when I first looked into this.

Quick answer: no. Here's why.

Your API gateway (Kong, AWS API Gateway, Nginx, take your pick) understands traffic. It knows Team A sent 10,000 HTTP requests. It can enforce rate limits, handle auth tokens. That's useful.

Your AI gateway understands what's actually inside those requests. It knows Team A sent 4.2 million tokens to GPT-4o, it cost $84, average latency was 340ms, and 3 of those requests triggered the PII guardrail.

One sees requests. The other sees meaning. That's not a small difference.

For stateless REST APIs, a regular API gateway is totally fine. For LLM workloads where tokens equal money and every prompt is a potential compliance issue, you need something that actually speaks the language.

Do you actually need one right now though?

Let me skip the usual "it depends" and be direct.

You're probably fine without one if:

One team, one model, one use case
Nobody is asking about costs yet
Zero compliance requirements
It's a POC or side project

Don't add infrastructure you don't need. Raw SDK calls are fast to ship. Keep it simple when simple works.

You've outgrown the simple setup if:

Multiple teams are calling models independently with no visibility into what they're doing
Swapping providers requires actual code changes
Someone from legal or security or finance asked a question you couldn't answer
You've had an API key accidentally committed to a public repo (or almost did)
You can't answer "what did we spend on AI last month, by team?" without going on a scavenger hunt through billing dashboards

That last point is genuinely the biggest tell. If someone asks that question and you have to go digging, you already needed this.

What actually pushes teams over the edge

It's never one thing. It's always a pile of smaller things that suddenly feel heavy together.

DevOps realizes they can't track spend because keys are everywhere. Someone commits a key to a public repo. A team uses GPT-4 Turbo for tasks that GPT-4 Mini handles just fine, and you find out after they've burned $2K. Compliance asks for an audit trail and you have nothing.

Each of those individually, fine, you deal with it. All of them stacking up at the same time? That's when the "simple" setup reveals it was never actually simple. You were just deferring the complexity.

What a production gateway actually looks like

Okay enough talking around it. Here's what it gives you in practice, using TrueFoundry as the concrete example.

One API key across all providers

Your teams stop touching raw OpenAI or Anthropic keys entirely. One key, routed through the gateway, with access to every approved model. Rotate it in one place. Done.

Per-team budgets with real enforcement

Not "we log it and send you a Slack alert." Actual hard limits. Team hits their monthly budget, the next request gets rejected with a clear error. No surprise bills, no awkward retros about where the spend went.

Automatic failover

OpenAI goes down. It happens. Your app doesn't go down with it because requests automatically route to Anthropic or your self-hosted model. No code changes. No one gets paged. It just keeps working.

Full request tracing

Every prompt, every response, every token count, every cost attribution. Logged and queryable. Pull a request from six months ago and reconstruct exactly what happened. This feature alone has saved me more debugging time than I can measure.

Guardrails that actually run everywhere

PII filtering, prompt injection detection, custom output policies. You define the rule once and it applies across every team and every model. No per-team implementation, no "oops we forgot to add the check in this service."

Runs inside your own environment

VPC, on-prem, air-gapped. Data doesn't leave your infra. SOC 2, HIPAA, GDPR compliant. If your compliance team has ever asked "but where does the data actually go," this is finally a clean answer.

Performance-wise it handles 350+ RPS on a single vCPU with sub-3ms latency so you're not adding meaningful overhead to your request path.

TrueFoundry is in the 2026 Gartner Market Guide for AI Gateways and processes 10B+ requests per month for companies like Siemens Healthineers, NVIDIA, Resmed, and Automation Anywhere. Mentioning it not as a flex but as a sense of scale.

The question that actually helped me decide

Forget "do I need an AI gateway."

Ask this instead: when does the cost of NOT having one start to exceed the cost of setting one up?

For most teams that crossover happens way earlier than expected. For us it wasn't one event. It was the accumulation. The audit trail we didn't have. The $1,400 bill nobody could explain. The near-miss with a key in a public repo.

Setting up TrueFoundry honestly took less time than the post-mortem meeting for that billing incident.

Try TrueFoundry free at truefoundry.com (no credit card required, deploys on your cloud in under 10 minutes).

What does your current setup look like? Still on raw SDK calls or have you already hit the wall? Drop a comment, genuinely curious where people are when they start asking this question.

The Great Claude Code Leak of 2026: Accident, Incompetence, or the Best PR Stunt in AI History?

Varshith V Hegde — Wed, 01 Apr 2026 02:29:18 +0000

TL;DR: On March 31, 2026, Anthropic accidentally shipped the entire source code of Claude Code to the public npm registry via a single misconfigured debug file. 512,000 lines. 1,906 TypeScript files. 44 hidden feature flags. A Tamagotchi pet. And one very uncomfortable question: was it really an accident?

1. What Actually Happened

The Root Cause: One Missing Line in `.npmignore`

This is both the most embarrassing and most instructive part of the story. Let me walk through the technical chain of events.

When you publish a JavaScript/TypeScript package to npm, your build toolchain (Webpack, esbuild, Bun, etc.) optionally generates source map files, which have a .map extension. Their entire purpose is debugging: they bridge the gap between the minified, bundled production code and your original readable source. When a crash happens, a source map lets the stack trace point to your actual TypeScript file at line 47 rather than main.js:1:284729.

Source maps are strictly for internal debugging. They should never ship to users.

The way you exclude them from npm packages is with an .npmignore file, or a files field in package.json. Here's the mistake in plain English:

# What Claude Code's .npmignore should have had:
*.map
dist/*.map

# What it apparently had:
# (nothing about .map files)

That's it. That's the whole disaster.

But it gets worse. The source map didn't contain the source code directly. It referenced it, pointing to a URL of a .zip file hosted on Anthropic's own Cloudflare R2 storage bucket. A publicly accessible one, with no authentication required.

So the full chain looked like this:

npm install @anthropic-ai/claude-code
  → downloads package including main.js.map (59.8 MB)
    → .map file contains URL pointing to src.zip
      → src.zip is hosted publicly on Anthropic's R2 bucket
        → anyone can download and unzip 512,000 lines of TypeScript

Two separate configuration failures, stacked on top of each other.

As software engineer Gabriel Anhaia put it in his deep dive: "A single misconfigured .npmignore or files field in package.json can expose everything."

The Bun Factor

There's a third layer. Anthropic acquired the Bun JavaScript runtime at the end of 2025, and Claude Code is built on top of it. A known Bun bug (issue #28001, filed on March 11, 2026) reports that source maps are served in production builds even when the documentation says they shouldn't be.

The bug was open for 20 days before this happened. Nobody caught it. Anthropic's own acquired toolchain contributed to exposing Anthropic's own product.

2. The Timeline

00:21 UTC — March 31, 2026
Malicious axios versions (1.14.1 / 0.30.4) appear on npm
with an embedded Remote Access Trojan. Unrelated to Anthropic,
but catastrophically bad timing.

~04:00 UTC
Claude Code v2.1.88 is pushed to npm. The 59.8 MB source map
ships with it. The R2 bucket containing all source code is live
and publicly accessible.

04:23 UTC
Chaofan Shou (@Fried_rice), an intern at Solayer Labs,
tweets the discovery with a direct download link.
16 million people descend on the thread.

Next 2 hours
GitHub repositories spring up. The fastest repo in history
to hit 50,000 stars does it in under 2 hours.
41,500+ forks proliferate. DMCA requests begin.

~08:00 UTC
Anthropic pulls the npm package from the registry.
Issues the "human error, not a security breach" statement
to VentureBeat, The Register, CNBC, Fortune, Axios, Decrypt.

Same day
A Python clean-room rewrite appears, legally DMCA-proof.
Decentralized mirrors on Gitlawb go live with the message:
"Will never be taken down."
The code is permanently in the wild.

By the Numbers

Metric	Value
Lines of code exposed	512,000+
TypeScript files	1,906
Source map file size	59.8 MB
GitHub forks (peak)	41,500+
Stars on fastest repo	50,000 in 2 hours
Hidden feature flags	44
Claude Code ARR	$2.5 billion
Anthropic total ARR	$19 billion
Views on original tweet	16 million

3. SECURITY ALERT: The axios RAT

Stop. Read this before anything else if you updated Claude Code that morning.

Coinciding with the leak, but entirely unrelated to it, was a real supply chain attack on npm. Malicious versions of the widely-used axios HTTP library were published:

axios@1.14.1
axios@0.30.4

Both contain an embedded Remote Access Trojan (RAT). The malicious dependency is called plain-crypto-js.

If you ran npm install or updated Claude Code between 00:21 UTC and 03:29 UTC on March 31, 2026:

# Check your lockfiles immediately:
grep -r "1.14.1\|0.30.4\|plain-crypto-js" package-lock.json
grep -r "1.14.1\|0.30.4\|plain-crypto-js" yarn.lock
grep -r "1.14.1\|0.30.4\|plain-crypto-js" bun.lockb

If you find a match:

Treat the machine as fully compromised
Rotate all credentials, API keys, and secrets immediately
Perform a clean OS reinstallation
File incident reports for any organizational data

Going forward, Anthropic has designated the Native Installer as the recommended installation method:

curl -fsSL https://claude.ai/install.sh | bash

The native installer uses a standalone binary that doesn't rely on the npm dependency chain.

4. What Was Inside: The Full Breakdown

The leaked codebase is the src/ directory of Claude Code, the "agentic harness" that wraps the underlying Claude model and gives it the ability to use tools, manage files, run bash commands, and orchestrate multi-agent workflows. This is not the model weights (those weren't exposed), but in many ways this is more strategically valuable.

The Architecture

The Tool System (~40 tools, ~29,000 lines)

Claude Code isn't a chat wrapper. It's a plugin-style architecture where every capability is a discrete, permission-gated tool:

BashTool — shell command execution with safety guards
FileReadTool, FileWriteTool, FileEditTool
WebFetchTool — live web access
LSPTool — Language Server Protocol integration for IDE features
GlobTool, GrepTool — codebase search
NotebookReadTool, NotebookEditTool — Jupyter support
MultiEditTool — atomic multi-file edits
TodoReadTool, TodoWriteTool — task tracking

Each tool has its own permission model, validation logic, and output formatting. The base tool definition alone spans 29,000 lines.

The Query Engine (46,000 lines)

Labeled "the brain of the operation" in Gabriel Anhaia's analysis. It handles all LLM API calls and response streaming, token caching and context management, multi-agent orchestration, and retry logic.

The Memory Architecture

This is what competitors will study most carefully. Anthropic built a solution to "context entropy," the tendency for long-running AI sessions to degrade into hallucination as the context grows. Their answer is a three-layer memory system:

Layer 1: MEMORY.md
  → A lightweight index of pointers (~150 chars per entry)
  → Always loaded in context
  → Stores LOCATIONS, not data

Layer 2: Topic Files
  → Actual project knowledge, fetched on-demand
  → Never fully in context simultaneously

Layer 3: Raw Transcripts
  → Never re-read fully
  → Only grep'd for specific identifiers when needed

The key insight is what they call Strict Write Discipline. The agent can only update its memory index after a confirmed successful file write. This prevents the agent from polluting its context with failed attempts. The agent also treats its own memory as a "hint" and verifies facts against the actual codebase before acting, rather than trusting its stored beliefs.

5. Hidden Features Anthropic Never Meant to Ship

KAIROS: Always-On Autonomous Agent

KAIROS (from the Ancient Greek for "the right moment") is mentioned 150+ times in the source. It's an unreleased autonomous background daemon mode that runs background sessions while you're idle, executes a process called autoDream for nightly memory consolidation, merges disparate observations, removes logical contradictions, and converts vague insights into verified facts. It also has a special Brief output mode designed for a persistent assistant and access to tools regular Claude Code doesn't have.

Think of it as Claude Code actively maintaining its understanding of your project while you sleep, not just sitting there waiting.

ULTRAPLAN: 30-Minute Remote Planning Sessions

ULTRAPLAN offloads a complex planning task to a remote Cloud Container Runtime (CCR) session running Opus, gives it up to 30 minutes to think, and lets you approve the result from your phone or browser. When approved, a special sentinel value __ULTRAPLAN_TELEPORT_LOCAL__ brings the result back to your local terminal. Remote cloud-powered reasoning, delivered locally.

Coordinator Mode: Multi-Agent Orchestration

One Claude spawning and managing multiple worker Claude agents in parallel. The Coordinator handles task distribution, result aggregation, and conflicts between worker outputs. It's infrastructure for AI teams, not just AI assistants.

BUDDY: The Part Nobody Expected

The most talked-about find, not for its strategic implications but because it's genuinely fun.

buddy/companion.ts implements a full Tamagotchi-style AI pet that lives in a speech bubble next to your terminal input.

Species (18 total, hidden via String.fromCharCode() arrays):
duck, dragon, axolotl, capybara, mushroom, ghost, nebulynx...

Rarity tiers:
Common > Uncommon > Rare > Epic > Legendary
1% shiny chance, independent of rarity

Stats:
DEBUGGING / PATIENCE / CHAOS / WISDOM / SNARK

Determined by:
Mulberry32 PRNG seeded from your userId hash + salt 'friend-2026-401'
(Same user always gets the same buddy species -- deterministic)

Claude generates a custom name and personality description for your buddy on first hatch. There are sprite animations and a floating heart effect. The planned rollout window in the source code: April 1-7, 2026.

Someone at Anthropic is clearly having a very good time.

Anti-Distillation: Poisoning Competitor Training Data

In claude.ts (lines 301-313), a flag called ANTI_DISTILLATION_CC, when enabled, sends anti_distillation: ['fake_tools'] in API requests. This tells the server to inject decoy tool definitions into the system prompt. The idea: if a competitor is recording Claude Code's API traffic to train their own model, the fake tool definitions corrupt that training data.

There's a second mechanism in betas.ts (lines 279-298): server-side connector-text summarization. When enabled, the API buffers the assistant's reasoning between tool calls, returns only summaries, and cryptographically signs them. Competitors recording traffic get the summaries, not the full reasoning chain.

As Alex Kim notes in his analysis: "Anyone serious about distilling from Claude Code traffic would find the workarounds in about an hour of reading the source. The real protection is probably legal, not technical."

Frustration Detection via Regex

Found in userPromptKeywords.ts:

/\b(wtf|wth|ffs|omfg|shit(ty|tiest)?|dumbass|horrible|awful|
piss(ed|ing)? off|piece of (shit|crap|junk)|what the (fuck|hell)|
fucking? (broken|useless|terrible|awful|horrible)|fuck you|
screw (this|you)|so frustrating|this sucks|damn it)\b/

A multi-billion-dollar AI company is detecting user frustration with a regex. The Hacker News thread lost it. To be fair though, it's faster, cheaper, and more predictable than running an LLM inference every time to check if the user is angry at the tool.

250,000 Wasted API Calls Per Day

The most candid internal admission in the entire codebase. From autoCompact.ts (lines 68-70):

"BQ 2026-03-10: 1,279 sessions had 50+ consecutive failures 
(up to 3,272) in a single session, wasting ~250K API calls/day globally."

The fix was three lines: MAX_CONSECUTIVE_AUTOCOMPACT_FAILURES = 3. After 3 consecutive compaction failures, it just stops trying. Sometimes good engineering is knowing when to give up.

6. The "Capybara" Model Confirmed

The leak didn't expose Claude's model weights, but it did expose multiple references to Anthropic's next major model family. Internal codenames: Capybara (also referred to as Mythos in a separate leaked document from the prior week).

The beta flags in the source reference specific API version strings for Capybara, suggesting it's well beyond concept stage. Security researcher Roy Paz from LayerX Security, who reviewed the code for Fortune, indicated it will likely ship in fast and slow variants with a significantly larger context window than anything currently on the market.

These references also confirmed the existence of undercover.ts, a module that actively instructs Claude Code to never mention internal codenames like "Capybara" or "Tengu" when used in external repositories. There's a hard-coded NO force-OFF — you can force Undercover Mode on, but you cannot force it off. In external builds, the function gets dead-code-eliminated entirely.

The implication raised in the Hacker News thread: AI-authored commits from Anthropic employees in open source repos will have no indication an AI wrote them. The tool actively conceals its own involvement.

7. Alternative Theory: Was This Anthropic's PR Play?

I'm not saying I believe this. I'm saying the circumstantial evidence is strange enough that it deserves to be stated clearly.

Anthropic is the self-proclaimed "safety-first AI lab." They're racing for developer mindshare against OpenAI (better brand) and Google (better distribution). Claude Code is their breakout product. They're preparing for an IPO. And they'd just made themselves unpopular with the developer community ten days earlier by sending legal threats to OpenCode for using their internal APIs.

So let's look at what this "leak" actually did for Anthropic.

Exhibit A: The April Fools' Timing

The leak occurred on March 31, the day before April 1st. The Buddy/companion system had a planned rollout window of April 1-7 coded directly into the source. The "leak" gave developers a sneak peek at what was about to launch anyway. Was this a controlled preview dressed up as an accident?

Exhibit B: The Bun Bug Nobody Fixed

Anthropic acquired Bun. They own the runtime. The bug causing source maps to ship in production was filed 20 days before the leak and was still open. If you own the runtime and its bug tracker, and that bug causes your own code to leak... why hadn't anyone internally marked it as critical?

Exhibit C: The Undercover Mode Irony

Claude Code has an entire subsystem called Undercover Mode, purpose-built to prevent internal codenames from leaking through AI-generated content. They built AI-powered leak prevention into the product. Then humans accidentally shipped the entire source code. The gap between their AI safety engineering and their human release engineering is either tragic or theatrical.

Exhibit D: The OpenCode Reputation Reversal

Ten days before the leak, Anthropic sent cease-and-desist letters to OpenCode, a popular third-party tool. The developer community was furious. The narrative was "Anthropic is acting like a gatekeeping megacorp."

Then a "leak" happens that shows Anthropic's impressive engineering to the world, makes them look like the underdog, generates three days of breathless coverage about KAIROS, BUDDY, and ULTRAPLAN, and completely reversed developer sentiment. Within 48 hours, developers went from "Anthropic sucks" to "holy shit look what Anthropic is building."

Exhibit E: The Permanent Mirror Problem

Anthropic filed DMCA takedowns. GitHub complied immediately. But the decentralized mirror at Gitlawb, with a public message saying "Will never be taken down," has been live since day one. Anthropic has a legal team, deep pockets, and relationships. A serious legal effort could make life difficult for every mirror operator. They chose not to go that hard.

Exhibit F: The "Second Leak in a Week" Pattern

This wasn't Anthropic's first incident that week. A draft blog post about the Capybara/Mythos model had "accidentally" been publicly accessible just days before, as Fortune reported on Thursday. Two high-profile "leaks" in five days, both generating enormous excitement about Anthropic's upcoming roadmap, both very conveniently timed.

The Counter-Arguments (Why It's Probably Just Incompetence)

To be fair:

Strategic roadmap exposure is genuinely damaging. Cursor, Copilot, and Windsurf now know exactly what Anthropic has already built and what's nearly ready to ship. That's real competitive intelligence permanently in the public domain.

The IPO narrative cuts both ways. "We shipped our source code to npm" is not a line you want in your S-1.

The axios RAT timing. Nobody would engineer a PR stunt to overlap with an active malware attack on npm. That part made a bad news day significantly worse for anyone who updated Claude Code that morning, and there's no upside to being associated with a supply chain attack.

The most likely answer is plain human error. A misconfigured .npmignore. A known Bun bug nobody had marked as critical. A public R2 bucket that should have been private. Three configuration failures that compounded into a disaster.

The PR outcome though? Undeniably good. The strategic damage? Real but survivable. The timing? Genuinely strange.

Draw your own conclusions.

8. Why DMCA Won't Fix This

DMCA takedowns work on centralized platforms. GitHub complied within hours. But the code spread to places that are harder to reach.

Gitlawb, with its explicit "Will never be taken down" message, operates outside the DMCA's practical reach. The Python port that appeared the same day was declared DMCA-proof by The Pragmatic Engineer's Gergely Orosz, who noted the rewrite is a new creative work that violates no copyright. There's also the AI copyright question: Anthropic's own CEO has implied that significant portions of Claude Code were written by Claude. The DC Circuit upheld in March 2025 that AI-generated work doesn't carry automatic copyright. If Anthropic's copyright claim over Claude-authored code is legally murky, the entire takedown strategy weakens.

And then there are torrents. Content once on the internet at scale doesn't come back.

The practical reality: 512,000 lines of Claude Code are permanently in the wild, regardless of what any court decides.

9. What This Means For You

If you're using Claude Code: Update immediately past v2.1.88 and use the native installer going forward (curl -fsSL https://claude.ai/install.sh | bash). If you updated via npm between 00:21 and 03:29 UTC on March 31, do the axios/RAT check above.

If you're building AI coding tools: The leaked source is now the most detailed public documentation of how to build a production-grade AI agent harness that exists. The three-layer memory architecture, the permission system, the tool plugin design, the multi-agent coordination patterns. It's all there, already analyzed by thousands of developers. The bar for what "production-grade" means just got documented in detail.

If you're at Anthropic: The code is out. KAIROS, ULTRAPLAN, and BUDDY are already built. Ship them. The community already knows they're coming. Turn the leak into a launch.

10. Lessons for Every Dev Team

This incident is a clear example of how release pipeline failures compound. Regardless of your opinion on Anthropic, every team should run through this checklist:

# 1. Audit your .npmignore / package.json "files" field
cat .npmignore
# Do you explicitly exclude *.map, dist/*.map, *.d.ts.map?

# 2. Check if source maps ship in your production build
ls dist/ | grep "\.map$"
# If you see anything: your bundler config needs review

# 3. Audit your cloud storage permissions
# Are any buckets referenced in your build artifacts publicly accessible?

# 4. Check your build toolchain for known bugs
# If you're on Bun, check issue #28001 status

# 5. Review your npm publish workflow
npm pack --dry-run
# Review EVERY file that would be published before actually publishing

The line that came out of the Hacker News thread: "Your .npmignore is load-bearing. Treat it like a security boundary."

Conclusion

Here's what we know for certain: a misconfigured .npmignore and a public cloud storage bucket exposed 512,000 lines of Claude Code, the code spread instantly and is now permanently in the wild, the leak revealed a technically impressive product with a compelling feature roadmap, and Anthropic's brand among developers bounced back remarkably fast.

What we'll probably never know: whether anyone inside Anthropic saw the Bun bug and made a judgment call, whether the April Fools' timing of the BUDDY rollout was coincidence, and whether Anthropic's relative restraint on DMCA enforcement is legal strategy or resource allocation.

What's not in question is that the engineering inside Claude Code is genuinely impressive. The memory architecture, the anti-distillation mechanisms, the multi-agent coordination, the DRM-at-the-HTTP-layer attestation. This is a serious piece of software doing things that are actually hard.

Accident or not, the world now knows what Anthropic is capable of building.

And maybe that was the point.

References

Source	Link
Alex Kim's technical deep-dive	alex000kim.com
VentureBeat — Full breakdown + axios RAT warning	venturebeat.com
The Register — Anthropic's official statement	theregister.com
Fortune — Strategic analysis + Capybara confirmation	fortune.com
Decrypt — DMCA analysis + permanent mirror situation	decrypt.co
CNBC — Revenue figures + company response	cnbc.com
Axios — Feature flag breakdown + roadmap analysis	axios.com
DEV.to (Gabriel Anhaia) — Architecture walkthrough	dev.to
Kuberwastaken/claude-code GitHub	github.com
Hacker News thread	news.ycombinator.com
Bun bug #28001	github.com/oven-sh/bun
CyberSecurityNews — Supply chain attack details	cybersecuritynews.com

If this was useful, drop a reaction. If you spot anything I got wrong, leave it in the comments.

Using Claude Code with Any LLM: Why a Gateway Changes Everything

Varshith V Hegde — Fri, 13 Mar 2026 03:30:00 +0000

I've been using Claude Code for a while now, and if you're a developer who has added it to your daily workflow, you probably know the feeling. It's genuinely good. It reads your codebase, runs commands, modifies files, and helps implement features right from your terminal without you having to context-switch constantly.

But at some point, most developers hit the same wall I did: what if I want to use a different model?

What if GPT-4o handles your specific codebase better? What if Gemini's larger context window is exactly what you need for that massive legacy project? What if you're spending more on API calls than you should be, and you know some of those simpler tasks could run on a cheaper model just fine?

Out of the box, Claude Code only talks to Anthropic. That's just how it works. And while Anthropic's models are genuinely strong, being locked into a single provider means you're trading flexibility for convenience. This guide is about getting both.

The Real Friction Points

Before jumping into the solution, it helps to be specific about what problems we're actually solving.

Model flexibility. Different models have different strengths. Claude Sonnet is excellent for most coding tasks, but you can't know it's the best tool for every job without being able to test alternatives. Without a gateway, you can't experiment without completely switching tools.

Cost management. Claude Code burns through tokens quickly during an active session. Complex architectural work and boilerplate generation are not the same job, and pricing them identically doesn't make much sense. Routing simpler requests to a more affordable model can cut costs significantly without affecting output quality where it matters.

Compliance and data routing. If you work in fintech, healthcare, or any regulated industry, you've likely dealt with requirements around where your data goes. Routing all API traffic through your own infrastructure before it reaches any external provider is often non-negotiable.

Observability. This one gets overlooked a lot. How many tokens does a typical Claude Code session consume? What's your actual cost per feature shipped? Without request logging, you're genuinely guessing.

Why Bifrost

Bifrost is an open-source LLM gateway built by Maxim AI to route, manage, and optimize requests between your application and multiple model providers. It's Apache 2.0 licensed, self-hostable, and supports 20+ providers including OpenAI, Anthropic, Google Gemini, AWS Bedrock, Azure, Mistral, Cohere, Groq, and more.

A few things that make it stand out technically:

Performance that doesn't get in the way. At 5,000 requests per second, Bifrost adds less than 15 microseconds of internal overhead per request. At production scale, that's essentially nothing.

Zero-config startup. A single npx command launches the gateway, and everything else is configurable through a web UI.

Built-in fallbacks and load balancing. If a provider fails or rate-limits you, Bifrost automatically routes to a backup. Traffic can also be distributed across multiple keys or providers using weighted rules.

Semantic caching. Repeated or semantically similar queries can be served from cache, which reduces both latency and cost for workflows with a lot of repetitive prompting.

Full observability out of the box. Prometheus metrics, request tracing, token usage, latency, and a built-in web dashboard are all included.

The architecture is straightforward:

Claude Code  -->  Bifrost (localhost:8080)  -->  Any LLM Provider

Claude Code uses an environment variable called ANTHROPIC_BASE_URL to know where to send API requests. Normally it points to https://api.anthropic.com. You point it at Bifrost instead. Bifrost accepts requests in Anthropic's Messages API format, translates them to whichever provider you've configured, and translates the response back. Claude Code never knows the difference.

No code changes. No patching. One environment variable.

What We'll Cover

Setting up and configuring Bifrost with multiple LLM providers
Integrating Claude Code with the gateway
Running Claude Code with any model
Configuring routing rules, fallbacks, and budgets
Integrating MCP tools
Using built-in observability and monitoring

Part 1: Setting Up Bifrost

Step 1: Install Bifrost

Create a project folder, open it in your editor, and run:

npx -y @maximhq/bifrost -app-dir ./my-bifrost-data

The -app-dir flag tells Bifrost where to store all its data. Bifrost will start listening on port 8080.

If you prefer Docker:

docker pull maximhq/bifrost
docker run -p 8080:8080 -v $(pwd)/data:/app/data maximhq/bifrost

The -v flag mounts a volume so your configuration persists across container restarts.

Step 2: Create Your Config File

Inside your ./my-bifrost-data folder, create a config.json file. This defines which providers Bifrost can route to, enables request logging, and sets up database persistence:

{
  "$schema": "https://www.getbifrost.ai/schema",
  "client": {
    "enable_logging": true,
    "disable_content_logging": false,
    "drop_excess_requests": false,
    "initial_pool_size": 300,
    "allow_direct_keys": false
  },
  "providers": {
    "openai": {
      "keys": [
        {
          "name": "openai-primary",
          "value": "env.OPENAI_API_KEY",
          "models": [],
          "weight": 1.0
        }
      ]
    },
    "anthropic": {
      "keys": [
        {
          "name": "anthropic-primary",
          "value": "env.ANTHROPIC_API_KEY",
          "models": [],
          "weight": 1.0
        }
      ]
    },
    "gemini": {
      "keys": [
        {
          "name": "gemini-primary",
          "value": "env.GEMINI_API_KEY",
          "models": [],
          "weight": 1.0
        }
      ]
    }
  },
  "config_store": {
    "enabled": true,
    "type": "sqlite",
    "config": {
      "path": "./config.db"
    }
  },
  "logs_store": {
    "enabled": true,
    "type": "sqlite",
    "config": {
      "path": "./logs.db"
    }
  }
}

The "value": "env.OPENAI_API_KEY" syntax tells Bifrost to read actual keys from environment variables rather than storing them in the file. Your secrets stay out of version control.

Step 3: Set Your API Keys

export OPENAI_API_KEY="your-openai-api-key"
export ANTHROPIC_API_KEY="your-anthropic-api-key"
export GEMINI_API_KEY="your-gemini-api-key"

Step 4: Start the Gateway

Stop any previously running Bifrost instance, then start it again with the app directory flag:

npx -y @maximhq/bifrost -app-dir ./my-bifrost-data

Open http://localhost:8080 in your browser. You'll see the Bifrost dashboard where all configuration and monitoring lives.

Part 2: Connecting Claude Code to Bifrost

Step 1: Install Claude Code

npm install -g @anthropic-ai/claude-code

Step 2: Point Claude Code at Bifrost

Set these two environment variables in the same terminal session where you'll run Claude Code:

export ANTHROPIC_BASE_URL="http://localhost:8080/anthropic"
export ANTHROPIC_API_KEY="dummy-key"

The dummy-key part is a bit counterintuitive at first. Claude Code requires this variable to be set before it will run, but Bifrost handles actual authentication to providers using the keys you configured earlier. You can put any non-empty string here.

Step 3: Run Claude Code with Any Model

Start Claude Code and specify whichever model you want to use:

claude --model openai/gpt-4o

To route to other providers, use the provider prefix pattern:

openai/gpt-4o
openai/gpt-4o-mini
gemini/gemini-2.5-pro
groq/llama-3.1-70b-versatile
mistral/mistral-large-latest
anthropic/claude-sonnet-4-20250514
ollama/llama3

Run a quick sanity check by asking something simple like "Hello there" to confirm requests are flowing through correctly.

Part 3: Routing Rules, Fallbacks, and Budgets

Once Claude Code is connected, you can start using Bifrost's routing features to get more control over how requests are handled.

Weighted Routing Across Providers

Virtual Keys in Bifrost let you define routing logic that applies automatically. Navigate to Governance > Virtual Keys, create a key, and configure your routing weights:

{
  "name": "dev-routing",
  "budget": {
    "max_budget": 100,
    "budget_duration": "monthly"
  },
  "providers": [
    { "provider": "openai", "model": "gpt-4o", "weight": 0.7 },
    { "provider": "anthropic", "model": "claude-sonnet-4-20250514", "weight": 0.3 }
  ]
}

This routes 70% of requests to GPT-4o and 30% to Claude Sonnet, with a hard monthly cap of $100. Once the budget is exhausted, Bifrost stops routing automatically. For teams, this replaces a lot of manual cost monitoring.

Automatic Fallbacks

When a provider goes down or you hit a rate limit, Bifrost works down a fallback list until a request succeeds:

{
  "model": "openai/gpt-4o",
  "fallbacks": [
    { "provider": "anthropic", "model": "claude-sonnet-4-20250514" },
    { "provider": "gemini", "model": "gemini-2.5-pro" }
  ]
}

Your coding session continues without any manual intervention when a provider has issues.

Part 4: MCP Tool Integration

If you're using Model Context Protocol servers for filesystem access, web search, database queries, or custom integrations, Bifrost supports those too. Configure them once in Bifrost, and they become available to any model routing through it.

Step 1: Add MCP Configuration to Bifrost

Update your config.json to include MCP server definitions. Here's an example with filesystem access:

{
  "$schema": "https://www.getbifrost.ai/schema",
  "client": {
    "enable_logging": true,
    "disable_content_logging": true,
    "drop_excess_requests": false,
    "initial_pool_size": 300,
    "allow_direct_keys": false
  },

  "mcp": {
    "client_configs": [
      {
        "name": "filesystem",
        "connection_type": "stdio",
        "stdio_config": {
          "command": "npx",
          "args": ["-y", "@modelcontextprotocol/server-filesystem", "/tmp"]
        },
        "tools_to_execute": ["*"],
        "tools_to_auto_execute": [
          "read_file",
          "list_directory",
          "create_file",
          "delete_file"
        ]
      }
    ],
    "tool_manager_config": {
      "max_agent_depth": 10,
      "tool_execution_timeout": 300000000000,
      "code_mode_binding_level": "server"
    }
  }
}

Restart Bifrost and navigate to the MCP catalog page in the web UI to confirm the filesystem server shows as connected.

Step 2: Add Bifrost as an MCP Server in Claude Code

claude mcp add --transport http bifrost http://localhost:8080/mcp

Step 3: Verify with a Real Task

Restart Claude Code and try a task that exercises the MCP tools. For example:

Create a simple calculator program in Python.

It should support addition, subtraction, multiplication, and division.
The user should input two numbers and an operation, and the program should print the result.

Then follow up with:

Analyze this repository and create a README.md explaining how the project works.
Include the project architecture and instructions for running it locally.

If the MCP integration is working, Claude Code will read your files, create new ones, and interact with your filesystem through Bifrost's tool injection.

Part 5: Observability and Monitoring

This is the part that surprised me most when I first set it up.

Every request that passes through Bifrost is logged with full detail: the input prompt, the response, which model handled it, latency, and cost. The web interface at http://localhost:8080/logs provides:

Real-time streaming of requests and responses
Token usage tracking per request
Latency measurements
Filtering by provider, model, or conversation content
Full request and response inspection

For individual developers, it's useful for understanding your actual usage patterns. For teams, it becomes a proper audit trail. You can see which models are being used most, where the expensive requests are coming from, and whether your routing rules are actually behaving as expected.

Bifrost also exposes Prometheus metrics for teams that want to integrate this data into existing monitoring pipelines.

Is This Worth Setting Up?

If you're a solo developer who uses Claude Code occasionally and doesn't have any compliance or cost concerns, the default setup is probably fine.

But if any of the following are true, a gateway is worth the time:

You want to test how different models perform on your specific workload
You're managing API costs across a team
Your organization has requirements around data routing or infrastructure control
You want actual visibility into your AI usage rather than end-of-month billing surprises
You use MCP tools and want them available across multiple model providers without reconfiguring each time

Bifrost being open source and self-hosted means your prompts and responses stay on your own infrastructure. For teams working on proprietary codebases, that's a meaningful difference from routing everything directly to a third-party API.

Get started:

ContractCompass: Your AI Contract Analyst That Actually Speaks Human

Varshith V Hegde — Sun, 08 Feb 2026 12:34:11 +0000

This is a submission for the Algolia Agent Studio Challenge: Consumer-Facing Conversational Experiences

What I Built

ContractCompass is an AI-powered contract analysis tool that turns legal jargon into plain English through natural conversation. Think of it as having a friendly lawyer friend who can review your contract over coffee, except this friend never gets tired, works 24/7, and doesn't charge $400/hour.

The Problem

Most people sign contracts they don't fully understand. Employment agreements, rental leases, SaaS terms—they're all written in dense legal language that assumes you went to law school. By the time you realize that "perpetual, irrevocable, worldwide license" means the company owns your weekend projects forever, you've already signed away your rights.

According to research, over 90% of people don't read terms and conditions before accepting them. It's not laziness. These documents are genuinely incomprehensible to the average person. A typical employment contract might be 15 pages of legal clauses that take hours to parse, assuming you even know what to look for.

The Solution

ContractCompass solves this through dialogue-based AI interaction. Instead of drowning you in legal analysis reports, it lets you have a natural conversation about your contract:

"What are the red flags here?"
"Can you explain this termination clause like I'm five?"
"Is this non-compete actually enforceable?"
"What should I negotiate before signing?"

The AI agent responds in real-time with contextual answers grounded in a curated database of contract clauses, powered by Algolia Agent Studio's semantic search capabilities.

Key Capabilities

Conversational AI Interface - Chat naturally with the agent. No forms, no checkboxes, just questions and answers.

Intelligent Risk Detection - Every clause gets analyzed and scored on a three-tier system (Low, Medium, High risk) with visual indicators.

Plain English Translations - Legal jargon becomes "here's what this actually means for you."

Industry Comparisons - The agent explains whether clauses are standard practice or unusual outliers worth negotiating.

Rich Visual Analysis - For deep dives, the agent generates structured analysis cards with prevalence bars, red flag lists, and detailed reasoning.

Demo

Live Demo: https://contractcompass.varshithvhegde.in/

No login required. No credit card. Just upload a contract (or try one of the built-in samples) and start asking questions.

How It Feels to Use

1. Upload is effortless

Drag and drop a PDF, paste text, or click one of the sample contracts. I've included four pre-loaded examples:

Friendly Startup Offer (Low Risk) - A well-balanced employment agreement with fair terms
Red Flag Employment Contract (High Risk) - Includes unilateral salary cuts, 24-month lock-in, overbroad IP assignment, 3-year non-compete, and $500K liquidated damages
Predatory Rental Agreement (High Risk) - Non-refundable deposits, tenant pays for ALL repairs, no-notice landlord entry, uncapped rent increases
Reasonable SaaS Agreement (Low Risk) - Standard business terms with mutual protections

2. The interface splits

Your contract appears on the left for reference, chat on the right. You can always scroll back to check what clause the AI is talking about.

3. Suggested prompts guide you

Six smart buttons help you get started:

Full Risk Analysis
Find red flags
Explain in plain English
What should I negotiate?
Compare to standards
Is this enforceable?

4. Streaming responses feel natural

The AI types back to you in real-time, token by token, like a real conversation. No waiting for a complete response to load. You see the analysis unfold naturally.

How I Used Algolia Agent Studio

Algolia Agent Studio is the intelligence engine that makes ContractCompass possible. Here's how it powers the entire conversational experience.

The Index: A Knowledge Base of Contract Clauses

I created an Algolia index called contract_clauses containing 50+ curated contract clauses across four contract types (employment, rental, SaaS, freelance). Each record includes:

clause_text - The full text of the clause
clause_type - Category (termination, compensation, non-compete, etc.)
contract_type - Employment, rental, SaaS, or freelance
industry - Tech, real estate, or general
prevalence_score - A 0-1 score indicating how common this clause is
risk_level - Low, medium, or high
plain_english - Simple explanation for non-lawyers
red_flags - List of concerning aspects
standard_version - What a fair version would look like
legal_implications - Real-world impact of accepting the clause

For example, a "predatory non-compete" clause record looks like this:

{
  "objectID": "emp-nc-003",
  "clause_text": "Employee agrees not to work for any competing business...",
  "clause_type": "non_compete",
  "contract_type": "employment",
  "industry": "tech",
  "prevalence_score": 0.2,
  "risk_level": "high",
  "plain_english": "You can't work for competitors for 3 years across all of North America",
  "red_flags": ["Unreasonably broad geographic scope", "Excessive duration"],
  "standard_version": "Typically 6-12 months within 50 miles of office",
  "legal_implications": "May prevent you from working in your field"
}

How Retrieval Powers the Conversation

When a user uploads a contract and starts asking questions, here's what happens behind the scenes:

1. Semantic Clause Matching

The Algolia agent retrieves semantically similar clauses from the index to provide context-aware responses. For example, if someone asks "Is this non-compete fair?", the agent:

Identifies the non-compete clause in the uploaded contract
Searches the index for similar non-compete clauses
Compares the uploaded clause against standard versions
Explains whether it's typical or unusually restrictive

2. Contract Type Detection

The agent automatically identifies the type of contract (employment, rental, SaaS, etc.) based on the language and clauses present, then adjusts its analysis accordingly. An employment contract gets compared against employment standards, not rental standards.

3. Prevalence-Based Risk Assessment

Using the prevalence scores from the indexed data, the agent can say things like:

"This termination clause is standard. About 95% of tech employment contracts include similar terms"
"This security deposit policy is unusual. Only 15% of rental agreements make deposits non-refundable"

4. Standard Version Recommendations

When a clause is problematic, the agent doesn't just say "this is bad." It shows what a fair version would look like:

"The current non-compete restricts you for 3 years across North America. A standard tech industry non-compete is typically 6-12 months within 50 miles of the office."

Making the Agent Conversational

The key to making ContractCompass feel natural was teaching the agent to think like a helpful friend, not a legal robot. I crafted prompts that guide it to:

Speak like a human - Use simple language. Avoid legal jargon unless explaining it. Be conversational but professional.

Be honest about risks - If a clause is predatory, say so clearly. Don't sugarcoat problematic terms.

Ground everything in data - Always search the contract_clauses index for similar examples. Compare the user's clause against standard versions and explain how it differs from typical industry practice.

Provide actionable advice - Don't just identify problems. Suggest what to negotiate and how to approach it.

This approach ensures every response is both friendly and useful, backed by real contract data rather than generic advice.

Why Fast Retrieval Matters

Algolia's speed and semantic search capabilities are critical to making ContractCompass feel like a real conversation rather than a clunky Q&A bot.

Speed Creates Natural Dialogue

When someone asks "What are the red flags in this contract?", they expect an answer within seconds, not minutes. Algolia's sub-50ms search latency means the agent can:

Retrieve relevant clause examples instantly
Stream responses token-by-token without lag
Handle follow-up questions in the same conversation thread
Maintain context across multiple queries

If retrieval took 5-10 seconds per query, users would lose patience. The conversation would feel broken. Fast retrieval makes the experience feel fluid and natural.

Contextual Retrieval Enables Nuanced Analysis

Algolia's semantic search doesn't just match keywords. It understands meaning. This is crucial for contract analysis because:

Legal language varies widely

A non-compete clause might say "Employee shall not engage in competitive activities" or "You agree not to work for rival companies." These are semantically similar but textually different. Algolia's vector-based search matches them both.

Users ask in natural language

Someone might ask "Can they really fire me for any reason?" which should match clauses about "at-will employment" or "termination without cause." Semantic search bridges this gap.

Context matters

A liability cap of $100K might be reasonable in a $10K/year SaaS contract but predatory in a $500K enterprise agreement. By retrieving similar contracts in the same industry and price range, the agent provides context-aware analysis.

Retrieval Grounds Responses in Real Data

One of the biggest risks with AI agents is hallucination. Making up plausible-sounding but incorrect information. By grounding every response in retrieved data from the curated index, ContractCompass avoids this problem.

When the agent says "This non-compete is unusually restrictive," it's not guessing. It's comparing the uploaded clause against the prevalence scores and standard versions in the index. When it explains what a fair clause looks like, it's showing you actual examples from the database.

This retrieval-augmented generation (RAG) approach makes the agent both reliable and trustworthy.

The Impact on User Experience

From a user perspective, fast contextual retrieval translates to:

Confidence in the analysis - "This isn't just an AI's opinion, it's based on real contract data"

Immediate answers - "I can get my questions answered in real-time without waiting"

Conversational flow - "It feels like talking to a human expert who knows contract law"

Actionable insights - "I now know exactly what to negotiate before signing"

Without Algolia's speed and semantic capabilities, ContractCompass would be a generic chatbot that gives vague, unhelpful advice. With them, it's a genuinely useful tool that empowers people to understand and negotiate their contracts.

Technical Architecture

Frontend (React + TypeScript)

The interface is built with React 18 and TypeScript for type safety. I chose a modern stack that prioritizes performance and developer experience:

UI Library: Tailwind CSS + shadcn/ui components for a clean, professional look
State Management: React hooks for local state (no complex state library needed)
Markdown Rendering: react-markdown for rich text in chat responses

AI Agent (Algolia Agent Studio)

The chat interface calls Algolia Agent Studio directly from the frontend. This direct integration means:

Real-time streaming responses that appear token-by-token
No backend proxy needed for chat, which reduces latency
Full conversation history sent with each request for contextual follow-ups

Search Index (Algolia)

The contract_clauses index contains 300+ curated clauses. Each clause is enriched with metadata (prevalence scores, risk levels, plain English explanations) that the agent uses to provide contextual analysis.

PDF Processing

When users upload PDFs, the text extraction happens server-side using Google Gemini 2.5 Flash. The flow is:

User uploads PDF via drag-and-drop
PDF converts to base64 on the client
Base64 data sent to serverless function
Function calls Gemini API for text extraction
Extracted text returns to the frontend
Text loads into chat interface for analysis

Backend (Serverless Functions)

Four serverless functions handle specific tasks:

extract-pdf - PDF text extraction using Gemini
analyze-contract - Clause parsing and analysis
search-clauses - Direct Algolia index queries
seed-algolia - Index population with curated data

Design Decisions

Split-Screen Layout

I chose a split-screen design (contract on left, chat on right) because users need to reference the original text while discussing it. It feels more collaborative, like reviewing a document with someone. Mobile users get a stacked layout that still works well.

Color-Coded Risk Levels

Risk levels use universal color psychology:

Green - Safe, standard terms
Amber - Caution, worth discussing
Red - Danger, likely problematic

These colors are consistent across risk badges, prevalence bars, and analysis cards. You can glance at a clause and immediately understand its risk level.

Suggested Prompts

Not everyone knows what questions to ask about a contract. The six suggested prompts serve as:

Onboarding - Showing users what's possible
Efficiency - Common questions answered with one click
Discovery - Revealing features users might not know about

Streaming Responses

Token-by-token streaming makes the AI feel more human and less like a loading bar. It also provides immediate feedback that the system is working. Users don't stare at a blank screen wondering if anything is happening.

Challenges and Learnings

Challenge 1: Balancing Legal Accuracy with Accessibility

Legal language exists for precision. Simplifying it risks losing important nuances. I solved this by:

Providing both the original clause text and plain English side-by-side
Including detailed "legal implications" sections for those who want depth
Being honest about limitations (the disclaimer reminds users this isn't legal advice)

Challenge 2: Handling Diverse Contract Formats

Contracts vary wildly in structure. Some are 2 pages, others are 50. Some use headers, others are wall-to-wall text. The PDF extraction with Gemini handles this by:

Preserving structure where possible
Extracting text even from scanned/image PDFs
Cleaning up formatting artifacts

Challenge 3: Preventing AI Hallucination

Early versions sometimes invented red flags that didn't exist. The solution was retrieval-augmented generation. Every analysis is now grounded in retrieved clause data from the index. The agent can only reference what it finds in the search results.

Challenge 4: Making Risk Scores Meaningful

A simple "high risk" label isn't actionable. I added:

Prevalence scores - "Only 20% of contracts include this"
Standard versions - "Here's what fair looks like"
Specific red flags - "This clause is concerning because..."

These additions turn a vague warning into specific, actionable information.

Export Analysis as PDF

Let users download a full risk report they can share with lawyers or keep for their records. Make it official and presentable.

What I Learned

Building ContractCompass taught me that the best AI tools don't feel like AI tools. They feel like helpful conversations with knowledgeable friends. The key is combining:

Fast, semantic search that finds the right information instantly
Thoughtful prompting that guides the AI to be helpful, not robotic
Real data that grounds responses in facts, not hallucinations
Clear design that makes complex information accessible

Algolia Agent Studio made the first part possible. The rest was about understanding what people actually need when facing a contract: clarity, confidence, and actionable advice.

Conclusion

ContractCompass demonstrates how conversational AI powered by fast, semantic search can democratize access to legal understanding. By combining Algolia Agent Studio's retrieval capabilities with a thoughtfully designed user experience, it transforms contract analysis from an intimidating expert task into an accessible conversation.

The key insight: people don't need to become lawyers to understand their contracts. They just need the right questions answered in language they can understand, backed by real data about what's standard and what's not.

Try it yourself: https://contractcompass.varshithvhegde.in/

Built With

Powered by:

Algolia Agent Studio - Conversational AI with semantic search
React + TypeScript - Frontend framework
Tailwind CSS + shadcn/ui - UI components
Google Gemini - PDF text extraction

GitHub:

Varshithvhegde / contract-compass

ContractCompass 🧭

AI-Powered Contract Analysis for Non-Lawyers

Chat with AI to understand your contract. Identify risks, get plain-English explanations, and learn what to negotiate — powered by Algolia Agent Studio.

📋 Table of Contents

Overview

ContractCompass is an intelligent contract analysis tool designed to help everyday people — not lawyers — understand legal documents before they sign. Users upload or paste a contract, then have a real-time conversation with an AI agent that identifies risky clauses, explains legal jargon in plain English, and compares terms against industry standards.

The AI agent is powered by Algolia Agent Studio, which provides semantic search and retrieval of similar contract clauses…

View on GitHub

ContractCompass is not a substitute for professional legal advice. Always consult a qualified attorney for legal matters.

Why Your AI Gateway Needs MCP Integration in 2026

Varshith V Hegde — Mon, 02 Feb 2026 10:19:30 +0000

You know that feeling when you've spent three hours debugging why your AI agent can't access your database for the third time this week?

I was there last month. Five different tool integrations, each with its own authentication flow, error handling, and connection management. Want to add Slack notifications? Write another integration. Need file system access? Another one. Every integration was basically the same boilerplate with different endpoints.

Then I found the Model Context Protocol and Bifrost. It sounded too good to be true one gateway, one protocol, unlimited tools. But it actually works, and it's probably the most practical shift in AI infrastructure you'll deal with this year.

What's an AI Gateway and Why Should You Care?

Think of an AI gateway as the central hub between your apps and multiple AI providers. Instead of writing separate code for OpenAI, Anthropic, Google, and others, you connect once to the gateway, and it handles the rest.

The benefits are immediate:

Automatic failover: If one AI provider goes down, requests switch to another
Load balancing: Distribute requests across multiple API keys to avoid rate limits
Caching: Reduce costs and improve response times
Unified monitoring: One place to track all your AI interactions

Bifrost is an AI gateway built in Go that adds only 11 microseconds of latency while handling 5,000 requests per second. When you're running production AI systems, those microseconds matter.

The Model Context Protocol: USB-C for AI

Anthropic introduced MCP in November 2024. Within a year, it became the industry standard. OpenAI adopted it in March 2025. Google DeepMind followed. By December 2025, it was donated to the Linux Foundation with backing from major tech companies.

Here's why it matters: Before MCP, connecting an AI model to a new tool meant writing custom integration code. Every. Single. Time.

AI needs to search files? Custom code.
Access a database? More custom code.
Connect to Slack? Yet another integration.

This created what Anthropic called the "N×M problem" N models needing M different integrations, resulting in exponentially growing complexity.

MCP solved this with a standardized protocol. Write an MCP server once for a tool, and any MCP-compatible AI client can use it. It's like USB-C for AI systems one standard connection instead of different cables for different devices.

The Problem with Direct MCP Connections

When you connect AI models directly to MCP servers, you run into scaling problems. Every request from the AI includes all available tool definitions in its context window. Connect to five MCP servers with 100 total tools, and every single request carries those 100 tool definitions even for simple queries that don't need tools.

This creates three issues:

1. Wasted tokens: Most of your context budget goes to tool catalogs instead of actual work. A six-turn conversation with 100 tools burns 600+ tokens just on definitions.

2. Security gaps: Tools can execute without validation or approval. No audit trail, no safety checks before destructive operations.

3. Coordination overhead: Each tool call requires a separate round trip to the AI model.

How Bifrost Solves This

Bifrost integrates MCP natively into the gateway itself. You get both AI provider management and tool orchestration through a single interface.

It supports four connection types:

In-process tools: Run directly in Bifrost's memory with zero network overhead
Local MCP servers via STDIO: For filesystem operations or database queries
HTTP connections: For remote microservices
Server-Sent Events: For real-time data streams

The killer feature is Code Mode. Instead of including hundreds of tool definitions in every request, Bifrost exposes just four meta-tools:

listToolFiles() - Discover available servers
readToolFile(fileName) - Get tool signatures
getToolDocs(server, tool) - Get detailed documentation
executeToolCode(code) - Run Starlark (Python-like) code

The AI writes Starlark code that orchestrates tools inside a sandboxed environment, and tool definitions load only when needed. This reduces token usage by 50%+ when using multiple MCP servers (3+). With 8-10 MCP servers (150+ tools), you avoid wasting context on massive tool catalogs.

Getting Started: A Real Example

Let me show you how this works in practice. I'll walk through building a simple MCP server and connecting it to Bifrost.

Step 1: Start Bifrost

npx -y @maximhq/bifrost

That's it. Bifrost starts with zero configuration and opens at localhost:8080.

Step 2: Build a Simple MCP Server

I created a Flask server with three tools: getting programming jokes, inspirational quotes, and basic calculations. Here's the core:

from flask import Flask, jsonify, request
from flask_cors import CORS
import json
import random

app = Flask(__name__)
CORS(app)

jokes = [
    "Why do programmers prefer dark mode? Because light attracts bugs!",
    "Why did the developer go broke? Because he used up all his cache!"
]

@app.route('/sse', methods=['POST'])
def handle_message():
    data = request.json
    method = data.get('method')

    if method == 'initialize':
        return jsonify({
            "jsonrpc": "2.0",
            "id": data.get('id'),
            "result": {
                "protocolVersion": "2024-11-05",
                "capabilities": {"tools": {}},
                "serverInfo": {"name": "example-server", "version": "1.0.0"}
            }
        })

    elif method == 'tools/list':
        return jsonify({
            "jsonrpc": "2.0",
            "id": data.get('id'),
            "result": {
                "tools": [
                    {
                        "name": "get_joke",
                        "description": "Returns a random programming joke",
                        "inputSchema": {"type": "object", "properties": {}}
                    },
                    {
                        "name": "calculate",
                        "description": "Performs basic arithmetic",
                        "inputSchema": {
                            "type": "object",
                            "properties": {
                                "operation": {"type": "string", "enum": ["add", "multiply"]},
                                "a": {"type": "number"},
                                "b": {"type": "number"}
                            }
                        }
                    }
                ]
            }
        })

    elif method == 'tools/call':
        tool_name = data['params']['name']
        args = data['params'].get('arguments', {})

        if tool_name == 'get_joke':
            result = {"content": [{"type": "text", "text": random.choice(jokes)}]}
        elif tool_name == 'calculate':
            a, b = args['a'], args['b']
            if args['operation'] == 'add':
                answer = a + b
            else:
                answer = a * b
            result = {"content": [{"type": "text", "text": f"Result: {answer}"}]}

        return jsonify({"jsonrpc": "2.0", "id": data.get('id'), "result": result})

if __name__ == '__main__':
    app.run(port=5000)

Run it with: python mcp_server.py

Step 3: Configure Model Providers and Connect to Bifrost

Setting Up Model Providers

In the Bifrost UI at localhost:8080, navigate to Model Providers in the left sidebar. You'll see a comprehensive list of supported providers including OpenAI, Anthropic, Google, AWS Bedrock, Azure, and many others.

Click on OpenAI from the list, then click "+ Add new key" in the top-right corner.

Fill in the key configuration:

Name: Give it a descriptive name like "Production Key"
API Key: Enter your actual API key (e.g., sk-proj-...) or use an environment variable like env.OPENAI_KEY
Models: Click to select which models this key can access (e.g., gpt-4o, gpt-4o-mini)
Weight: Set to 1 for load balancing (higher weights receive proportionally more traffic)
Use for Batch APIs: Toggle this on if you want to use this key for batch operations

Click Save to add the key. You'll see it appear in your configured keys list with its weight and enabled status.

Pro tip: For production setups, add multiple API keys for the same provider. Bifrost automatically distributes requests across them to avoid rate limits. You can also add keys from different providers (e.g., OpenAI and Google) for automatic failover.

Connecting Your MCP Server

Now go to MCP Gateway in the left sidebar and click "New MCP Server":

Configuration:

Name: localmcp
Connection Type: HTTP (Streamable)
Connection URL: http://localhost:5000/sse
Ping Available for Health Check: Enable this

Bifrost immediately connects, discovers your tools, and shows them in "Available Tools."

Step 4: Use It

Here's a Python client that uses everything together:

import requests
import json

BIFROST_URL = "http://localhost:8080"

def ask_ai(message, history=None):
    if history is None:
        history = []

    history.append({"role": "user", "content": message})
    print(f"\n👤 You: {message}")

    # Send to AI via Bifrost
    response = requests.post(
        f"{BIFROST_URL}/v1/chat/completions",
        json={"model": "openai/gpt-4o", "messages": history}
    ).json()

    assistant_msg = response["choices"][0]["message"]

    # Handle tool calls
    if "tool_calls" in assistant_msg:
        print(f"🔧 AI is using {len(assistant_msg['tool_calls'])} tools...")
        history.append(assistant_msg)

        for tool_call in assistant_msg["tool_calls"]:
            # Bifrost executes the tool on your MCP server
            result = requests.post(
                f"{BIFROST_URL}/v1/mcp/tool/execute",
                json={"tool_call": tool_call}
            ).json()

            history.append({
                "role": "tool",
                "tool_call_id": tool_call["id"],
                "name": tool_call["function"]["name"],
                "content": json.dumps(result)
            })

        # Get final response
        response = requests.post(
            f"{BIFROST_URL}/v1/chat/completions",
            json={"model": "openai/gpt-4o", "messages": history}
        ).json()

        assistant_msg = response["choices"][0]["message"]

    history.append(assistant_msg)
    print(f"🤖 AI: {assistant_msg['content']}\n")
    return assistant_msg["content"], history

# Try it
ask_ai("Tell me a programming joke")
ask_ai("What is 25 times 4?")

What Just Happened?

Your script sends "What is 25 times 4?" to Bifrost
Bifrost adds your MCP tools to the AI's context
GPT-4 decides to use the calculate tool
Your script calls Bifrost's tool execution endpoint
Bifrost sends a JSON-RPC request to your Flask server
Your server calculates 25 × 4 = 100 and returns it
The result goes back to GPT-4
GPT-4 responds: "25 times 4 equals 100"

The beautiful part? Clean separation of concerns:

Your client doesn't know MCP protocol details
Bifrost handles all MCP communication
The AI doesn't know your server implementation
Your MCP server doesn't know which AI is calling it

This is the power of standardization.

Security Matters

In April 2025, researchers identified MCP security issues: prompt injection, permission combinations that could exfiltrate data, and lookalike tools.

Bifrost addresses this with a "suggest, don't execute" model by default. When an AI proposes a tool call, nothing runs automatically. Your code reviews and approves each execution. You get full audit trails for compliance.

You can configure Agent Mode for specific tools. Safe operations like reading files can auto-execute, while destructive operations require approval.

For scenarios with many MCP servers (3+), you can enable Code Mode to reduce token usage.

This configuration tells Bifrost to expose the four meta-tools instead of all tool definitions directly.

Why This Matters Now

If you're building AI systems without MCP integration in 2026, you're solving yesterday's problems. The standardization is here. The ecosystem is mature. The question isn't whether to adopt MCP, but how quickly.

Bifrost makes adoption straightforward:

Setup takes less than a minute
Web UI makes configuration visual
Open-source means you can examine and customize
Native support for multiple connection types

This is infrastructure that matters. Not because it's flashy, but because it solves real problems every organization faces when building AI systems.

Resources

Get started with Bifrost:

GitHub: https://git.new/bifrost
Documentation: https://docs.getbifrost.ai
Quick Start: https://docs.getbifrost.ai/quickstart/gateway/setting-up
Code Mode: https://docs.getbifrost.ai/mcp/code-mode
Agent Mode: https://docs.getbifrost.ai/mcp/agent-mode
MCP Overview: https://docs.getbifrost.ai/mcp/overview

Top 5 LLM Gateways in 2026: A Deep-Dive Comparison for Production Teams

Varshith V Hegde — Thu, 22 Jan 2026 01:41:56 +0000

I spent the last few weeks researching LLM gateway solutions for production teams. Here's what I found after testing five different options, talking to engineering teams running them at scale, and breaking things in my staging environment.

I didn't test every edge case. We focused on REST APIs with streaming responses, didn't test batch processing extensively, and our traffic patterns might be different from yours. But here's what I learned.

Why Production Teams Need LLM Gateways

Here's what happened when we didn't use one:

Our application relied only on OpenAI. When they had an outage last month, our entire product went down. This created problems when we had customers waiting for support.

Then there's cost. We were using GPT-4 for simple tasks that Claude Haiku could handle for one-tenth the price. One weekend of refactoring our routing logic saved us $3,000 per month.

But managing multiple providers yourself creates its own problems. You end up writing custom code for each API, normalizing their different error formats, managing API keys, building retry logic from scratch, and spending hours debugging why Anthropic's rate limit response looks different from OpenAI's.

LLM gateways solve this. One API for all providers. Automatic fallbacks. Cost tracking that works. And your application won't crash because one provider is having issues.

Here are the five gateways that impressed me.

1. Bifrost (by Maxim AI)

What it is: A high-performance LLM gateway built in Go. It's designed for speed and reliability.

Best for: Customer-facing applications where latency matters. Real-time chat, high-traffic APIs, anything where users will notice if responses are slow.

The performance numbers caught my attention first. In our synthetic load tests, Bifrost added about 11 microseconds of latency at 5,000 requests per second. When I ran the same test with LiteLLM (which is Python-based), it added around 50 microseconds.

What really sold me was the P99 latency test. At 1,000 concurrent users, LiteLLM's slowest responses hit 28 seconds. Bifrost stayed under 50 milliseconds. If you're building a chatbot, that's the difference between users staying on your application and immediately leaving.

Now, I didn't test this with burst traffic or serverless deployments - our setup is traditional Kubernetes. Your results might differ depending on your infrastructure.

What makes it different:

Smart load balancing that actually works. Bifrost was the first gateway I found that automatically routes requests based on real-time performance. It monitors which providers are healthy, routes around failures, and prevents you from hitting rate limits. Most gateways claim to do this, but Bifrost's implementation is noticeably better.

It also has cluster mode built in, so you can run multiple instances without complicated setup. And here's what surprised me - it includes SSO, audit logs, team budgets, and role-based access control without adding latency. Most gateways make you choose between features and speed. Bifrost somehow does both.

Setup:

npx -y @maximhq/bifrost

In 30 seconds you have a gateway running with a web UI. Since it uses OpenAI's API format, integrating it is just changing your base URL. I had our staging environment switched over in under 10 minutes.

Bifrost covers all the major providers - OpenAI, Anthropic, Google Vertex AI, AWS Bedrock, Azure OpenAI, Cohere, Mistral, Groq, Together AI, and Replicate. Plus they added support for any OpenAI-compatible endpoint, which means you can actually use custom or self-hosted models too.

For most production use cases, you're using one of these major providers anyway. LiteLLM does have broader coverage and a more mature open-source community - they've been around longer with more contributors and community support. If that ecosystem and maximum provider choice matters more to you than raw performance, LiteLLM is a solid pick. But for our needs, Bifrost's speed and provider coverage were enough.

Why we chose it: For our use case (high-scale, customer-facing chat), the 11 microsecond overhead was too good to pass up. The enterprise features were a bonus we didn't expect at this performance level.

Pricing: Open-source and free to self-host. Enterprise support is available.

2. LiteLLM

This is probably the most popular open-source LLM gateway. Python-based, with both an SDK and proxy server.

If you're in a Python environment or need access to niche models, this is the default choice. The provider coverage is unmatched - over 100 providers including all the major ones (OpenAI, Anthropic, Google, Azure, AWS) plus specialized options like HuggingFace, Ollama, Replicate, Anyscale, and Perplexity.

For Python developers, setup is straightforward:

from litellm import completion

response = completion(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello"}],
    api_key="your-key"
)

# Switch to Claude without changing code
response = completion(
    model="claude-4-sonnet",
    messages=[{"role": "user", "content": "Hello"}]
)

Configuration uses YAML. The documentation is thorough, and there's a strong community.

Where it breaks down: Performance at scale. LiteLLM is written in Python using FastAPI. At low to moderate traffic, it performs well. But in our load tests, the limitations showed clearly.

At 500 requests per second, P99 latency hit 28 seconds. At 1,000 requests per second, it crashed - ran out of memory and started failing requests. The Python GIL and async overhead become real bottlenecks when handling thousands of concurrent requests.

I saw this in our staging environment. At 200 requests per second, everything ran smoothly. When I simulated higher traffic (around 2,000 requests per second), LiteLLM started timing out. Memory usage increased to over 8GB, and we got cascading failures.

When to use it:

Development and testing environments
Prototyping and trying different models
Internal tools with moderate traffic (under 500 RPS)
When you need access to 100+ providers
Python-first teams where ecosystem fit matters

When to avoid it:

Customer-facing applications at scale
Real-time features where every millisecond counts
Production workloads requiring 99.9%+ uptime

The ecosystem is mature with active development, but if you're planning to handle thousands of requests per second in production, you'll likely hit performance issues.

Pricing: Fully open-source and free. You pay for hosting it yourself.

3. Portkey

Portkey is more than just a gateway - it's a full AI control plane with routing, observability, guardrails, and governance.

The observability depth is what sets it apart. Every request gets full traces showing you which user made the call, which models were tried, why they failed, which fallback was used, how long each step took, and the exact cost. This isn't just logging - it's distributed tracing for AI.

When our staging environment started using too many tokens, Portkey's traces showed us exactly which user and which prompt was causing it. That level of detail is valuable when debugging production issues.

from portkey_ai import Portkey

portkey = Portkey(
    api_key="your-portkey-key",
    virtual_key="your-provider-virtual-key"
)

response = portkey.chat.completions.create(
    messages=[{"role": "user", "content": "Hello"}],
    model="gpt-4"
)

Enterprise features:

PII detection, content filtering, prompt injection detection
SOC 2, HIPAA, GDPR compliance with full audit trails
SSO/SAML, team permissions, role-based access
Data residency controls

According to their team, they handle over 10 billion requests monthly with 99.9999% uptime. I couldn't independently verify this, but the platform felt stable during our testing.

The tradeoff: I measured latency overhead of 20-40 milliseconds when using advanced features like guardrails and detailed tracing. For a small team that just needs basic routing, Portkey is probably more than necessary. The learning curve is also steeper than simpler gateways.

Why we didn't choose it: For our use case, the added latency and complexity weren't worth the governance features we didn't need yet. But I talked to a healthcare company using Portkey specifically for PII detection. Every LLM request gets scanned for protected health information, logged with full audit trails, and only routed to HIPAA-compliant providers. For them, the compliance features justified the cost.

If you're in a regulated industry or managing AI across multiple teams with governance requirements, Portkey's observability is among the best available.

Pricing: Free tier for development | Starts at $49/month | Enterprise custom pricing

4. Kong AI Gateway

Kong's API Gateway with AI-specific features added. If you're already using Kong, this is worth looking at.

Kong brings decades of API gateway experience to LLM routing - authentication, rate limiting, security, and observability at large scale. All the infrastructure pieces that matter when running production workloads.

# Install AI Proxy plugin
curl -X POST http://localhost:8001/services/ai-service/plugins \
  --data "name=ai-proxy" \
  --data "config.route_type=llm/v1/chat" \
  --data "config.auth.header_name=Authorization" \
  --data "config.model.provider=openai" \
  --data "config.model.name=gpt-4"

AI-specific capabilities:

Unified API across OpenAI, Anthropic, AWS Bedrock, Azure AI, Google Vertex
RAG pipelines built in
PII removal across 12 languages
Content filtering and safety controls

Where this makes sense: You're already using Kong for API management. That's the primary reason to choose this. The integration with existing Kong infrastructure is seamless, and you get unified observability across all your APIs.

Where it doesn't: If you're not already on Kong, the learning curve is significant. It's built for large enterprises, not small teams needing quick deployment. We evaluated this briefly but decided it was more complexity than we needed.

Pricing: Available through Kong Konnect (managed) or self-hosted | Enterprise custom pricing

5. Helicone AI Gateway

Started as an observability platform, recently launched a Rust-based gateway. Lightweight and fast.

Built in Rust, Helicone achieves around 8ms P50 latency with sub-5ms overhead even under load, based on what their team shared with me. The gateway ships as a single 15MB binary that runs anywhere.

# Run with npx
npx @helicone/ai-gateway

# Or with Docker
docker run -p 8787:8787 helicone/ai-gateway

The observability is their core strength - request-level tracing, user tracking, cost forecasting, performance analytics, and real-time alerts. It's as comprehensive as Portkey's but with less complexity.

Flexible deployment:

Cloud-hosted (managed service)
Self-hosted (full control)
Hybrid (self-host gateway, use cloud observability)

The consideration: The gateway is newer (launched mid-2024). Core routing is solid, but some advanced enterprise features are still developing. For most teams this isn't a problem, but large enterprises might want to validate specific requirements first.

Pricing:

Gateway: Open-source and free to self-host
Observability: Starts free, then $20/month for 100,000 requests

The separation is smart - you can self-host for free and only pay for observability if you want it.

How to Choose

After evaluating these gateways, here's what I learned:

Choose Bifrost if: Performance is critical. You're handling 5,000+ requests per second, serving customer-facing features, or building real-time applications where latency matters. The 11 microsecond overhead is hard to beat.

Choose LiteLLM if: You're in a Python environment with moderate traffic (under 500 RPS). The provider coverage is unmatched - over 100 models including specialized ones. Great for development, prototyping, and internal tools.

Choose Portkey if: You're in a regulated industry needing compliance controls (HIPAA, SOC 2) or managing AI across multiple teams. The observability and governance features are excellent, but you'll pay for it in latency (20-40ms overhead).

Choose Kong if: You're already using Kong for API management. Otherwise, the learning curve probably isn't worth it unless you're a large enterprise needing infrastructure-level control.

Choose Helicone if: You want performance and observability without enterprise complexity. Good for teams with data residency requirements who want self-hosted infrastructure with cloud monitoring.

Questions?

Have you deployed LLM gateways in production? What did you choose and why? What surprised you?

Still evaluating options? I can help with specific questions about performance, integration, or cost modeling at your scale. Leave a comment below.

Daily Echo - Your Life in Motion 🎥

Varshith V Hegde — Sat, 03 Jan 2026 16:23:33 +0000

This is a submission for the DEV's Worldwide Show and Tell Challenge Presented by Mux

What I Built

Daily Echo is a private video journaling app where you record 1-minute daily video diaries. It's like having a conversation with your future self. The app helps you track your mood, reflect on your experiences, and create a visual archive of your life that you can revisit anytime.

My Pitch Video

Demo

Live App: https://dailyecho.varshithvhegde.in/
GitHub: https://github.com/Varshithvhegde/dailyecho
Varshithvhegde / dailyecho

DailyEcho - A beautiful, private video journaling app that lets you record daily video diary entries, track your mood over time, and relive your memories through immersive story modes and interactive visual walls.
🎥 Daily Echo

A beautiful, private video journaling app that lets you record daily video diary entries, track your mood over time, and relive your memories through immersive story modes and interactive visual walls.

✨ Features

🎬 Immersive Story Modes (New!)
- Memory Stories - Watch your entries in a sequential, story-like format similar to social media.
- Auto-Curated Playlists - Choose from "Recent Moments", "Moments of Joy" (happy/excited/grateful), or "Flashback" (random picks from the past).
- Smooth Navigation - Interactive progress bars, auto-advance, and gesture/keyboard support.
🧱 Echo Wall (Mosaic Mode)
- Living Visual History - A dynamic masonry grid of your life in motion.
- Living Video Tiles - Each tile plays a Mux-generated animated GIF preview simultaneously for a "Harry Potter" newspaper effect.
- Interactive Previews - Retro CRT scanline overlays and cinematic hover effects.
📹 Video Recording & Playback
- Mux-powered streaming - Professional-grade video processing and playback with adaptive streaming.
- Mux GIFs…
View on GitHub

Testing Credentials:

Email: test@gmail.com
Password: devtest

Detailed Explanation

The Story Behind It

I have a terrible memory. Seriously. Ask me what I did last Tuesday and I'll draw a blank. But I've always been fascinated by the idea of looking back at my life, especially when the end of the year rolls around and everyone's doing their "year in review" thing.

I wanted to create something that would help me remember the small moments - not just the big events, but the everyday stuff. What was I thinking about on a random Wednesday in March? How did I feel when that thing happened at work? What was going through my mind during that phase of my life?

The idea was simple: record a 1-minute video every day. Just sit down, talk to the camera like you're talking to a friend, and capture whatever's on your mind. But I didn't want it to feel like a chore. I wanted it to be something I'd actually look forward to doing.

So I built Daily Echo with features that make revisiting your memories feel magical. The Echo Wall shows all your entries as living video tiles playing simultaneously (like those moving newspapers in Harry Potter). Memory Stories let you watch your entries in sequence, almost like watching a documentary about your own life. And the Time Capsule feature shows you what you were up to exactly one month or one year ago.

It's been incredibly powerful for me personally. There's something about being able to go back and watch yourself from months ago, seeing how you've grown or changed, or just remembering moments you'd completely forgotten.

Technical Highlights

Daily Echo is built with React 18, TypeScript, and Vite on the frontend, with Tailwind CSS and shadcn/ui for the design. The backend runs on Supabase, handling PostgreSQL database, authentication, and edge functions.

What makes the app special technically:

1. Living Video Previews Everywhere

Every entry card in the timeline shows an animated GIF preview that plays automatically. When you hover over the Echo Wall (our mosaic view), you see all your memories playing at once. It creates this incredible "living history" effect that static thumbnails just can't match.

2. AI-Powered Insights

Using OpenAI's GPT-4o-mini, the app automatically analyzes your video transcripts to generate:

Two-sentence summaries of each entry
Emotional sentiment detection
Personalized daily advice based on what you talked about
Mood tracking over time

3. Immersive Story Modes

You can watch your entries in different ways:

Recent Moments: Your latest recordings in sequence
Moments of Joy: Auto-curated playlist of happy entries
Flashback: Random picks from your past

Each mode has interactive progress bars, auto-advance, and keyboard controls for a cinematic experience.

4. Gamification That Actually Matters

Achievement badges like "Zen Master" (recorded before 6 AM), "Night Owl" (recorded after 10 PM), and "Weekend Warrior" (weekend recordings) make the habit more engaging. You can track your recording streaks and see your mood variety over time.

Use of Mux

Mux is the heart and soul of Daily Echo. Here's how I'm using it:

1. Professional Video Infrastructure

When you record a video, it goes through Mux's direct upload API. No dealing with complicated encoding pipelines or storage headaches. Mux handles everything: transcoding, optimization, and adaptive streaming. The result? Your videos play smoothly on any device, any connection speed.

2. Automatic Transcription

This was a game-changer. By enabling Mux's transcription feature during upload, I get accurate text transcripts of every video entry automatically. These transcripts power the AI analysis, search functionality, and accessibility features. I didn't have to integrate a separate transcription service or worry about accuracy.

3. Animated GIF Previews

Instead of static thumbnails, every entry shows a living preview using Mux's GIF generation API. You can watch all your memories playing simultaneously in the Echo Wall view. It's like having a magical photo album where every picture moves. Mux generates these GIFs automatically from your video without any extra work on my end.

4. Reliable Streaming with Mux Player

The integrated Mux Player component handles playback with built-in caption support. It just works - no buffering issues, no format compatibility problems, no manual quality switching needed.

5. Webhook Integration

Mux's webhook system notifies my Supabase edge function when videos are ready, when transcripts are available, and if anything goes wrong. This lets me update the UI in real-time and handle the entire video lifecycle automatically.

The developer experience with Mux has been fantastic. The documentation is clear, the API is intuitive, and features like automatic transcription and GIF generation saved me weeks of development time. Instead of building video infrastructure, I could focus on making the journaling experience special.

What really impressed me: I initially thought I'd need separate services for video hosting, transcription, and preview generation. Mux does all of this out of the box, and it scales effortlessly. When a user records their 100th video, it works just as smoothly as their first.

I hope Daily Echo inspires others to start capturing their daily thoughts. Life moves fast, and our memories fade faster. Having a video archive of your own life is like having a superpower - you can literally go back in time and remember who you were and what mattered to you at any moment.

Give it a try with the test credentials above, and maybe start your own daily echo habit!

My 2025 wrap

Varshith V Hegde — Wed, 31 Dec 2025 15:16:32 +0000

2025 was a rollercoaster for me. Looking back, I can clearly divide it into two distinct halves. One that tested me, and another that transformed me.

The First Half: Climbing the Corporate Ladder

January started strong. I got promoted at work, which was amazing! Though it was mostly a position bump, I was actually leading a project as an Associate Engineer. The best part? We went from 5 days in the office to just 2 days a week. But honestly, I still chose to work from the office most days because I was so invested in the project.

Then came February, the "love month," and let's just say things didn't go as planned. I hit one of the lowest points in my life.

But you know what kept me going? I dove into spirituality and writing poems. These became my anchors during the tough times.

The Turning Point: When Everything Changed

And then came the moment that changed everything.

I had this massive challenge at work. Our tool was taking 8 minutes just to load an MF4 (MDF file). EIGHT MINUTES. And that's before any computation! We were using the asammdf package in Python, which is good, but painfully slow.

I became obsessed with solving this. I researched everything. Tried JIT compilation, which improved computation time but not the loading. Then I had this wild idea: what if I rewrote the entire package in Rust?

This wasn't just a work task anymore. This was MY mission. I worked my regular job during the day and coded this project at night. I went deep into understanding how MDF files work at the byte level, implementing a custom package specifically for our project. Countless all-nighters, endless debugging sessions, but when it finally worked? Pure magic.

10 seconds. That's all it took to load a 4GB file AND do the computation (which I also replaced with Rust). Only the UI remained in Python.

This earned me so much respect at work. Due to NDA, I can't share the code or methods (corporate life, you know), but the fact that I pulled this off still makes me proud. This was my turning point.

Reigniting the Developer Within

Huge shoutout to Jess and Ben for their weekly "What was your win this week?" posts. Reading those comments and seeing everyone's achievements? That reignited my inner developer. I wanted to be part of that energy again.

I restarted my dev.to journey, but I was rusty. I couldn't figure out what to write about.

Then I discovered DEV Challenges, and wow, what a gold mine! I could build, showcase, learn, and enjoy all at once. Every weekend became about tackling a new challenge. This was exactly what I needed to grow and fall in love with coding all over again.

Discovering New Heights (Literally!)

In the second half of the year, I found another passion. Trekking. I climbed Asia's 1st and 2nd largest monolithic rocks! Out of 9 total treks, 8 happened in the second half alone.

I also did my first solo travel to Hampi. I know it's not far, but for me, it was a huge achievement. Plus, I met some amazing people along the way!

Projects That Keep On Giving

Some of my older projects surprised me this year:

Varshithvhegde / FreeShare

FreeShare is a free online file sharing platform designed to simplify the process of sharing files without the need for any sign-up or verification.

FreeShare: File Sharing Platform

🗂️ Description

FreeShare is a file sharing platform built with React, Firebase, and Cloud Functions. This project allows users to share files easily and efficiently. It's designed for individuals and teams who need a simple and secure way to share files.

The platform provides a user-friendly interface for uploading, sharing, and managing files. With FreeShare, you can share files with others by generating a unique link, and recipients can access the files without needing to create an account.

✨ Key Features

File Sharing

Upload and share files with others via a unique link

Supports various file types, including documents, images, and videos

Security and Authentication

Secure file storage using Firebase Storage

Authentication and authorization using Firebase Authentication

User Interface

Responsive and user-friendly interface built with React and Material-UI

Easy navigation and file management

🗂️ Folder Structure

graph TD src-->components src-->App.test.js; src-->index.js; src-->reportWebVitals.js; src-->setupTests.js; public-->index.html; public-->manifest.json; public-->robots.txt;
…

View on GitHub

FreeShare is a free online file-sharing platform that needs no sign-up or verification. I built it 2 years ago in college when I was just starting out. I honestly thought it was dead, but when I checked Firebase recently... 10K total users! People are still using it, and my blog post about it still gets views. Sure, it has flaws, but hey, we all start somewhere.

Varshithvhegde / notepage

NotePage is a web application that allows you to easily share code, text, or any content using a unique link.

NotePage

NotePage is a web application that allows you to easily share code, text, or any content using a unique link. You can create new note pages by simply visiting https://notepage.vercel.app.

Features

Custom Pages: Create your own custom pages to share content with others. Just use https://notepage.vercel.app/<your-page-name> and start sharing.

Password Protection: Optionally protect your pages with a password, ensuring that only authorized users can access your content.

Real-time Collaboration: Collaborate with others in real-time. When multiple users access the same link, any changes made by one user are instantly visible to others, without requiring a page refresh.

Shareable Links: Share your pages with others by sending them the unique link.

Tech Stack

NotePage is built using the following technologies:

Angular: A powerful and popular front-end framework.

Firebase: A real-time cloud database, authentication, and hosting platform.

Angular Material: A UI component library…

View on GitHub

This was just a side project I built while learning Angular. The UI wasn't great, and it's full of bugs. But here's the thing: my office has a strict NO ChatGPT policy (we can only use a company AI), and copying/sharing text was difficult. When a friend needed to share something, I suggested notepage, and it blew up within the office! I tried improving the UI, but people love the old version, so I kept it. Now I'm almost hitting free tier limits, but I'll keep it free because projects like these taught me so much and drove me to write for the DEV community.

The DEV Community Love

And then came the end of the year. Oh wow.

I finally reached 10K followers! I remember celebrating my first 1K like it was yesterday (it was actually 2 years ago).

Varshith V Hegde

Feb 23 '23

Achieving 1K Followers on dev.to: My Journey to Success

#career #productivity #networking #mentorship

Comments 18

3 min read

I was also a Top Weekly Author twice! My DEV profile is practically part of my resume now. Whenever I'm in an interview, I proudly show it and talk about my journey. Why not? I've worked hard for this.

I participated in 5 DEV Challenges and won 2 of them. These challenges helped me grow immensely. A huge thank you to the entire DEV Team for creating such an amazing initiative!

Shoutouts

Some amazing devs/writers whose content I absolutely loved this year:

Nikoloz Turazashvili (@axrisi)Follow

Founder & CTO at Vexrail (www. vexrail.com), Axrisi (www.axrisi.com). Opened Chicos restaurant in Tbilisi, Georgia.

Dumebi OkoloFollow

Confident technical writer with frontend developer skills, marketing skills and developer relations skills. I am also a very fun person to hang around with.

Arindam Majumder Follow

Developer Advocate | Technical Writer | 600k+ Reads | Mail for Collabs

DivyaFollow

A curious lifelong learner, currently a full-time Masters student persuing Computer Science stream. Enthusiastic about development.

Looking Ahead to 2026

So yeah, the second half of 2025 was my redemption arc. I really loved this year, and my resolutions are crystal clear:

Get all my friends to participate in DEV Challenges
Level up my skills in AI and Agentic development
Travel more (maybe even an international trip!)
Blog about my journey more consistently
Start learning rock climbing (need to get in shape first 😅)
Finally get my driver's license 😭 (It was my 2025 resolution too, but I still haven't done it!)

That's all for this year, folks!

Thank you everyone for the support and love. Here's to an even better 2026!

Happy New Year to ALL! 🎉

Forem: Varshith V Hegde

Agent Gateways Are Coming — Here Are the First 6 Platforms Building Them (2026)

Why the Agent Gateway Category Is Different

1. TrueFoundry — The Full-Stack Agent Control Plane

2. AgentGateway.dev (Linux Foundation / AAIF)

3. SnapLogic Agent Gateway

4. Pragatix

5. Operant AI

6. Obot AI

The Comparison Table

The Honest State of the Category

My Verdicts

Best MCP Gateways for Enterprise Teams in 2026

Why You Even Need a Gateway

The Contenders

1. TrueFoundry MCP Gateway

2. MintMCP

3. Composio

4. Docker MCP Gateway

5. MCPJungle

6. IBM ContextForge

7. Lasso Security MCP Gateway

The Comparison Table

What I Learned From the Actual Evaluation

A Few Questions Worth Asking Before You Pick

My Verdicts

GitHub Broke Git: The Merge Queue Bug That Silently Deleted Your Code

The Day GitHub Stopped Being Git

What Actually Went Wrong Under the Hood

The Human Cost

Why the Status Page Was Useless

This Was Not an Isolated Bad Day

The Deeper Architectural Problem

Will Anyone Actually Leave?

What GitHub Says It Is Doing About It

The Takeaway

That is the thing that lingers. Git is supposed to be the boring, reliable layer that everything else is built on. When the boring layer gets interesting, it gets interesting in the worst possible way.

7 AI Gateways That Actually Work in Production (2026 Guide)

First: What Does an AI Gateway Actually Do?

Quick Comparison

1. TrueFoundry AI Gateway

The Enterprise Production Pick

2. Helicone AI Gateway

The Observability-First Pick

3. OpenRouter

The Widest Model Access, Fastest to Start

4. Requesty

Smarter Than It Looks

5. Singulr AI

The Governance-Focused Newcomer

6. Inworld Router

An Interesting Idea Worth Watching

7. Braintrust Gateway

The Eval-First Option

How to Actually Choose

Where This Category Is Going

I Spent 3 Days Debugging Our LLM Setup. Turns Out We Needed an AI Gateway the Whole Time.

So what actually is an AI Gateway?

The architecture is simpler than it sounds

"But I already have an API gateway. Isn't that enough?"

Do you actually need one right now though?

What actually pushes teams over the edge

What a production gateway actually looks like

The question that actually helped me decide

The Great Claude Code Leak of 2026: Accident, Incompetence, or the Best PR Stunt in AI History?

1. What Actually Happened

The Root Cause: One Missing Line in .npmignore

The Bun Factor

2. The Timeline

By the Numbers

3. SECURITY ALERT: The axios RAT

4. What Was Inside: The Full Breakdown

The Architecture

5. Hidden Features Anthropic Never Meant to Ship

KAIROS: Always-On Autonomous Agent

ULTRAPLAN: 30-Minute Remote Planning Sessions

Coordinator Mode: Multi-Agent Orchestration

BUDDY: The Part Nobody Expected

Anti-Distillation: Poisoning Competitor Training Data

Frustration Detection via Regex

The Root Cause: One Missing Line in `.npmignore`