<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Arindam Majumder </title>
    <description>The latest articles on Forem by Arindam Majumder  (@arindam_1729).</description>
    <link>https://forem.com/arindam_1729</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F965723%2F8c3a1bb4-eb47-4302-a280-09eedb8bc785.png</url>
      <title>Forem: Arindam Majumder </title>
      <link>https://forem.com/arindam_1729</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/arindam_1729"/>
    <language>en</language>
    <item>
      <title>Claude’s new Advisor Strategy is pretty interesting</title>
      <dc:creator>Arindam Majumder </dc:creator>
      <pubDate>Wed, 15 Apr 2026 20:07:17 +0000</pubDate>
      <link>https://forem.com/arindam_1729/claudes-new-advisor-strategy-is-pretty-interesting-9nb</link>
      <guid>https://forem.com/arindam_1729/claudes-new-advisor-strategy-is-pretty-interesting-9nb</guid>
      <description>&lt;p&gt;A lot of people building AI agents run into the same problem sooner or later.&lt;/p&gt;

&lt;p&gt;If you run the entire agent on a powerful model, it works well but the costs grow quickly. If you run everything on a cheaper model, the system stays fast and affordable but it sometimes makes weak decisions, especially when planning complex tasks or choosing tools.&lt;/p&gt;

&lt;p&gt;Anthropic recently introduced something called Advisor Strategy that tries to solve this in a simple way.&lt;/p&gt;

&lt;p&gt;Instead of using one model for everything, the agent runs on a smaller executor model like Sonnet or Haiku. That model handles the normal workflow such as calling tools, executing steps, and moving the task forward. When the agent reaches something more complex, it can consult a stronger model like Opus for guidance. The advisor reads the full context, suggests what to do next, and the executor continues the workflow.&lt;/p&gt;

&lt;p&gt;So most of the work stays cheap and fast, but the agent can still get strong reasoning when it actually needs it. It feels a lot like how a junior engineer works most of the time but occasionally asks a senior engineer for advice.&lt;/p&gt;

&lt;p&gt;I found this architecture interesting because it pushes agent systems toward multi-model setups instead of relying on a single model for everything, which seems like a direction many frameworks will probably move toward.&lt;/p&gt;

&lt;p&gt;I made a short video breaking down how the advisor strategy works and how developers can implement it in their own agents&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Production-Aware AI: Giving LLMs Real Debugging Context</title>
      <dc:creator>Arindam Majumder </dc:creator>
      <pubDate>Thu, 09 Apr 2026 05:22:32 +0000</pubDate>
      <link>https://forem.com/studio1hq/production-aware-ai-giving-llms-real-debugging-context-187g</link>
      <guid>https://forem.com/studio1hq/production-aware-ai-giving-llms-real-debugging-context-187g</guid>
      <description>&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Large language models struggle with production debugging because they do not have visibility into how code actually executes at runtime.&lt;/li&gt;
&lt;li&gt;Inputs such as logs, stack traces, and metrics provide incomplete signals, which often cause confident but incorrect conclusions about root causes.&lt;/li&gt;
&lt;li&gt;When AI reasoning is grounded in function-level runtime data collected from production systems, debugging becomes accurate, explainable, and reliable.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Large language models are increasingly used by developers to understand code, analyze failures, and assist during incident response. In controlled environments, they are effective at explaining logic and suggesting fixes. In production systems, however, their usefulness often drops sharply.&lt;/p&gt;

&lt;p&gt;A &lt;a href="https://lokalise.com/blog/blog-the-developer-delay-report/" rel="noopener noreferrer"&gt;recent survey of developers&lt;/a&gt; found that a quarter of developers spend more time debugging than writing code each week. The same survey reported that bugs and tooling failures cost teams nearly 20 working days per year in lost productivity. These numbers reflect a reality most engineering teams already experience. &lt;/p&gt;

&lt;p&gt;Production debugging takes time because failures depend on runtime factors such as traffic patterns, concurrency, queue depth, and system state that are absent in non-production environments. Most AI systems do not observe these execution conditions. They analyze code structure and reported symptoms, rather than the runtime behavior that caused the failure.&lt;/p&gt;

&lt;p&gt;In this article, we will discuss why production context is critical for AI debugging, what production-aware AI really means, and how runtime intelligence enables more accurate and trustworthy debugging outcomes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Production Issues Cannot Be Understood from Code Alone
&lt;/h2&gt;

&lt;p&gt;Code defines control flow and data handling, but production behavior is determined by runtime conditions such as traffic volume, concurrency, and system state.&lt;/p&gt;

&lt;p&gt;In production, requests arrive concurrently and compete for shared resources. As traffic increases, queues begin to accumulate work, caches evolve, and external dependencies respond with variable latency or partial failures. Together, these factors influence execution order, timing, and resource contention in ways that are not visible when reading code or running isolated tests.&lt;/p&gt;

&lt;p&gt;Many production failures arise only when specific runtime conditions are met. Race conditions appear under concurrent access. Performance regressions surface under sustained or uneven load. Retry mechanisms can magnify transient upstream failures into system-wide impact. In each case, the logic itself may be correct, while the observed failure is a result of how that logic behaves under real execution pressure.&lt;/p&gt;

&lt;p&gt;This leads to a common outcome during incident response. The code appears correct because the failure is not caused by a logical error. The root cause exists in how the code executes under real production conditions, not in how it reads in isolation.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F47gqpvmdldj288p0zzox.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F47gqpvmdldj288p0zzox.png" alt="Image1" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  How LLMs Debug Today: Strengths and Structural Limits
&lt;/h2&gt;

&lt;p&gt;Large language models assist debugging by analyzing text. They infer intent, recognize common patterns, and map symptoms to known classes of problems. This makes them effective for code review, error explanation, and reasoning about familiar failure modes.&lt;/p&gt;

&lt;p&gt;However, their understanding is entirely constrained by the inputs they receive. Without access to runtime execution data, their conclusions are based on probability rather than evidence.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Aspect&lt;/th&gt;
&lt;th&gt;What LLMs Do Well&lt;/th&gt;
&lt;th&gt;Structural Limitation&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Code understanding&lt;/td&gt;
&lt;td&gt;Explain logic, control flow, and common anti patterns&lt;/td&gt;
&lt;td&gt;Cannot observe how code executes under real load&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Input analysis&lt;/td&gt;
&lt;td&gt;Reason over logs, stack traces, and snippets&lt;/td&gt;
&lt;td&gt;Inputs represent symptoms, not full execution context&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pattern matching&lt;/td&gt;
&lt;td&gt;Identify known bug patterns and typical fixes&lt;/td&gt;
&lt;td&gt;Fails when failures are novel or environment specific&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Root cause analysis&lt;/td&gt;
&lt;td&gt;Propose plausible explanations&lt;/td&gt;
&lt;td&gt;Cannot validate causality without runtime signals&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Decision making&lt;/td&gt;
&lt;td&gt;Rank likely fixes based on training data&lt;/td&gt;
&lt;td&gt;Relies on probabilistic inference when facts are missing&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Without visibility into execution order, timing, frequency, and state, LLMs are forced to guess. The results may sound correct, but they are not grounded in how the system actually behaved.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hallucinations Are Caused by Missing Runtime Evidence
&lt;/h2&gt;

&lt;p&gt;Hallucinations in AI-assisted debugging usually appear when the system does not have enough information about what actually happened during execution. This is common in production, where AI is asked to explain failures using logs, stack traces, or small pieces of code that describe symptoms but not runtime behavior.&lt;/p&gt;

&lt;p&gt;Recent research on AI reliability shows that incorrect answers increase when important contextual details are missing. In debugging scenarios, these details include execution order, timing, system state, and how frequently specific code paths were executed. Without this information, AI systems infer causes based on likelihood rather than evidence.&lt;/p&gt;

&lt;p&gt;The same pattern appears in &lt;a href="https://arxiv.org/pdf/2505.04441" rel="noopener noreferrer"&gt;studies on AI-driven debugging and code repair&lt;/a&gt;. When models are given execution traces or feedback from real runs, fault localization and fix accuracy improve. When this runtime information is absent, models often produce explanations and fixes that appear reasonable but fail to address the real cause of the issue.&lt;/p&gt;

&lt;p&gt;Prompt refinement does not address this limitation. Clearer prompts help structure responses, but they do not introduce new facts. If execution data is missing, the model still reasons without evidence about how the system behaved.&lt;/p&gt;

&lt;p&gt;In production debugging, hallucinations are therefore expected. They occur when AI systems are asked to explain failures they cannot observe, not because the reasoning process is flawed, but because the necessary runtime evidence is absent.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Missing Context in AI Debugging Workflows
&lt;/h2&gt;

&lt;p&gt;Most AI debugging workflows rely on the same signals engineers have used for years. These signals are useful, but they describe outcomes, not execution, which creates a gap between what failed and why it failed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What AI usually receives today&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Logs:&lt;/strong&gt; Logs capture messages emitted by code paths that were explicitly instrumented. They are selective, often incomplete, and rarely reflect execution order, frequency, or timing across concurrent requests.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stack traces:&lt;/strong&gt; Stack traces show where an error surfaced, not how the system reached that state. They lack information about prior execution paths, state changes, and interactions with other components.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Metrics:&lt;/strong&gt; Metrics summarize system behavior at an aggregate level. They indicate that something is slow or failing, but they do not identify which functions caused the issue or how behavior changed over time.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;What is missing&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Function level execution behavior:&lt;/strong&gt; Which functions ran, how often they executed, and how long they took under real load conditions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Runtime performance characteristics:&lt;/strong&gt; Execution timing, concurrency effects, retries, and resource contention that emerge only during live operation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Connection between user impact and code:&lt;/strong&gt; Clear linkage between affected endpoints or workflows and the exact functions responsible for the observed behavior.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When AI reasons over incomplete signals, it cannot establish causality. Proposed fixes are derived from statistical patterns rather than observed execution, which often results in changes that compile or deploy successfully but do not resolve the underlying issue. Effective debugging requires visibility into execution behavior, not only error reports or surface-level symptoms.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgh59kapx7jr4l0k42ond.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgh59kapx7jr4l0k42ond.png" alt="Image1" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Defining Production-Aware AI
&lt;/h2&gt;

&lt;p&gt;Consider a common production incident. An API endpoint becomes slow after a deployment. Logs show no errors. Metrics show increased latency. The code itself looks unchanged or correct. An AI system reviewing this information can suggest several possible causes, such as a database query, a cache miss, or an external dependency. Each suggestion sounds reasonable, but none is confirmed.&lt;/p&gt;

&lt;p&gt;This is where production awareness matters. A production-aware AI does not rely only on aggregated metrics or isolated log lines. It reasons using information about how the system actually executed under real traffic. It can see which functions ran more often than before, where execution time increased, and which code paths were exercised during the slowdown.&lt;/p&gt;

&lt;p&gt;Production-aware AI is defined by the context it uses. It grounds reasoning in runtime behavior rather than static structure. It focuses on how functions are executed, how often they ran, and how their performance changes over time, instead of relying only on what the code looks like or what developers expect it to do.&lt;/p&gt;

&lt;p&gt;This approach changes the quality of debugging. Instead of proposing likely explanations, the AI reasons from observed execution evidence.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Function-Level Runtime Intelligence Changes AI Debugging
&lt;/h2&gt;

&lt;p&gt;Function-level runtime intelligence gives AI direct visibility into how software behaves while it is running. This visibility changes debugging from interpreting symptoms to analyzing execution.&lt;/p&gt;

&lt;p&gt;Instead of inferring behavior from secondary signals, AI can reason using execution facts collected in real time.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Function-level data as the missing signal:&lt;/strong&gt; Function-level data shows which functions executed, how frequently they ran, and how long they took under real load. This information allows AI to identify abnormal behavior at the exact point where performance or correctness changed.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Linking endpoints to execution paths:&lt;/strong&gt; Runtime intelligence connects external symptoms to internal execution. When an HTTP endpoint slows down, or a queue backs up, AI can trace the issue to the specific functions involved, rather than reasoning only at the service or request level.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Temporal awareness across deployments:&lt;/strong&gt; By comparing runtime behavior before and after a deployment, AI can identify which functions changed execution characteristics. This makes regressions visible without relying on alerts or manual comparison.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How Hud Enables Production-Aware AI
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1layxsapduf33orzdqxh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1layxsapduf33orzdqxh.png" alt="Image3" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.hud.io/" rel="noopener noreferrer"&gt;Hud&lt;/a&gt; captures function-level execution behavior directly from production systems. Instead of relying on aggregated metrics, sampled traces, or predefined alert rules, it observes how individual functions execute under real traffic, including errors and performance changes. &lt;/p&gt;

&lt;p&gt;This execution data can be consumed directly by engineers and AI systems to reason about production behavior based on observed runtime evidence.&lt;/p&gt;

&lt;p&gt;Below are the core capabilities that allow Hud to provide production-aware runtime context for AI-assisted debugging.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Runtime code sensing at the function level:&lt;/strong&gt; &lt;a href="https://docs.hud.io/docs/installation-guide" rel="noopener noreferrer"&gt;Hud acts as a runtime code sensor&lt;/a&gt;. You get continuous function-level execution data from production, without manual instrumentation or ongoing maintenance. This data reflects how code actually runs under real traffic.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Automatic detection of errors and slowdowns:&lt;/strong&gt; Hud automatically detects errors and performance degradations based on changes in runtime behavior, not static rules.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Linking user impact to code:&lt;/strong&gt; When an endpoint slows down, or a queue backs up, Hud connects that business-level symptom directly to the functions responsible. You can see which parts of the code caused the impact, not just where it surfaced.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Post-deployment behavior comparison:&lt;/strong&gt; Hud automatically detects deployments and compares function behavior across versions. You can see what changed in production after a release and identify regressions without manual diffing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Runtime context for AI debugging:&lt;/strong&gt; Hud provides a full forensic runtime context that you can use inside the IDE or pass to &lt;a href="https://docs.hud.io/docs/hud-mcp-server" rel="noopener noreferrer"&gt;AI agents through its MCP server&lt;/a&gt;. This allows AI to reason from execution evidence instead of guessing from partial signals.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/JoOhI6QF6Zs"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;p&gt;Without visibility into how code actually ran in production, AI systems reason over symptoms instead of causes, which leads to incorrect or incomplete fixes. Production systems demand runtime grounded reasoning, where function-level behavior, execution timing, and real traffic conditions are first-class inputs. &lt;/p&gt;

&lt;p&gt;When AI is given this level of visibility, hallucination decreases, and confidence aligns with correctness. Production-aware AI is therefore not an optimization, but a requirement for reliable debugging.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://docs.hud.io/docs/what-you-can-do-with-hud" rel="noopener noreferrer"&gt;Hud&lt;/a&gt; gives you function-level runtime visibility directly from production, with no configuration and no maintenance. Explore &lt;a href="https://www.hud.io/" rel="noopener noreferrer"&gt;how Hud works&lt;/a&gt;, &lt;a href="https://docs.hud.io/" rel="noopener noreferrer"&gt;read the documentation&lt;/a&gt;, or &lt;a href="https://www.hud.io/book-a-demo/" rel="noopener noreferrer"&gt;book a demo&lt;/a&gt; to see how production-aware debugging changes the way you and your AI systems understand failures.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>mcp</category>
      <category>llm</category>
    </item>
    <item>
      <title>I built a local dashboard to inspect Claude Code sessions, tokens, and costs</title>
      <dc:creator>Arindam Majumder </dc:creator>
      <pubDate>Thu, 02 Apr 2026 07:58:55 +0000</pubDate>
      <link>https://forem.com/arindam_1729/i-built-a-local-dashboard-to-inspect-claude-code-sessions-tokens-and-costs-173m</link>
      <guid>https://forem.com/arindam_1729/i-built-a-local-dashboard-to-inspect-claude-code-sessions-tokens-and-costs-173m</guid>
      <description>&lt;p&gt;I’ve been using Claude Code heavily over the last few weeks and started wondering where my tokens were actually going.&lt;/p&gt;

&lt;p&gt;Claude stores everything locally in ~/.claude/, which is great, but the data mostly sits in JSON logs. If you want to understand session usage, token costs, tool calls, or activity patterns, you basically end up digging through raw files.&lt;/p&gt;

&lt;p&gt;So I built a small tool called cc-lens.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3spc4nyf2nhk221or95m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3spc4nyf2nhk221or95m.png" alt="Image1" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I built a local dashboard to inspect Claude Code sessions, tokens, and costs&lt;br&gt;
It’s a local-first dashboard that reads your Claude Code session files and turns them into something you can actually explore.&lt;/p&gt;

&lt;p&gt;It runs entirely on your machine. It doesn't have any cloud sync, sign-ups, or telemetry.&lt;/p&gt;

&lt;p&gt;Some things it shows:&lt;/p&gt;

&lt;p&gt;• Usage overview: sessions, messages, tokens, estimated cost&lt;br&gt;
• Per-project breakdown: see which repos are burning the most tokens&lt;br&gt;
• Full session replay: inspect conversations turn-by-turn with token counts and tool calls&lt;br&gt;
• Cost &amp;amp; cache analytics: stacked charts by model and cache usage&lt;br&gt;
• Activity heatmap: GitHub-style view of when you’re using Claude the most&lt;br&gt;
• Memory &amp;amp; plan explorer: browse/edit Claude memory files and saved plans&lt;br&gt;
• Export/import: move dashboards across machines&lt;/p&gt;

&lt;p&gt;You can run it instantly with:&lt;/p&gt;

&lt;p&gt;npx cc-lens&lt;br&gt;
(or clone the repo if you prefer).&lt;/p&gt;

&lt;p&gt;Here's the &lt;a href="https://github.com/Arindam200/cc-lens/" rel="noopener noreferrer"&gt;Github Repo&lt;/a&gt;, if you want to try it out!&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claude</category>
      <category>webdev</category>
      <category>javascript</category>
    </item>
    <item>
      <title>[Boost]</title>
      <dc:creator>Arindam Majumder </dc:creator>
      <pubDate>Sat, 28 Mar 2026 13:37:31 +0000</pubDate>
      <link>https://forem.com/arindam_1729/-1p2l</link>
      <guid>https://forem.com/arindam_1729/-1p2l</guid>
      <description>&lt;div class="ltag__link--embedded"&gt;
  &lt;div class="crayons-story "&gt;
  &lt;a href="https://dev.to/studio1hq/running-llm-applications-across-providers-with-bifrost-313h" class="crayons-story__hidden-navigation-link"&gt;Running LLM Applications Across Providers with Bifrost&lt;/a&gt;


  &lt;div class="crayons-story__body crayons-story__body-full_post"&gt;
    &lt;div class="crayons-story__top"&gt;
      &lt;div class="crayons-story__meta"&gt;
        &lt;div class="crayons-story__author-pic"&gt;
          &lt;a class="crayons-logo crayons-logo--l" href="/studio1hq"&gt;
            &lt;img alt="Studio1 logo" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Forganization%2Fprofile_image%2F9405%2Ff91309c4-f670-4501-9882-79e1e70e2e96.png" class="crayons-logo__image" width="500" height="500"&gt;
          &lt;/a&gt;

          &lt;a href="/arindam_1729" class="crayons-avatar  crayons-avatar--s absolute -right-2 -bottom-2 border-solid border-2 border-base-inverted  "&gt;
            &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F965723%2F8c3a1bb4-eb47-4302-a280-09eedb8bc785.png" alt="arindam_1729 profile" class="crayons-avatar__image" width="800" height="678"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
        &lt;div&gt;
          &lt;div&gt;
            &lt;a href="/arindam_1729" class="crayons-story__secondary fw-medium m:hidden"&gt;
              Arindam Majumder 
            &lt;/a&gt;
            &lt;div class="profile-preview-card relative mb-4 s:mb-0 fw-medium hidden m:inline-block"&gt;
              
                Arindam Majumder 
                &lt;a href="/++"&gt;&lt;img alt="Subscriber" class="subscription-icon" src="https://assets.dev.to/assets/subscription-icon-805dfa7ac7dd660f07ed8d654877270825b07a92a03841aa99a1093bd00431b2.png" width="166" height="102"&gt;&lt;/a&gt;
              
              &lt;div id="story-author-preview-content-3363768" class="profile-preview-card__content crayons-dropdown branded-7 p-4 pt-0"&gt;
                &lt;div class="gap-4 grid"&gt;
                  &lt;div class="-mt-4"&gt;
                    &lt;a href="/arindam_1729" class="flex"&gt;
                      &lt;span class="crayons-avatar crayons-avatar--xl mr-2 shrink-0"&gt;
                        &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F965723%2F8c3a1bb4-eb47-4302-a280-09eedb8bc785.png" class="crayons-avatar__image" alt="" width="800" height="678"&gt;
                      &lt;/span&gt;
                      &lt;span class="crayons-link crayons-subtitle-2 mt-5"&gt;Arindam Majumder &lt;/span&gt;
                    &lt;/a&gt;
                  &lt;/div&gt;
                  &lt;div class="print-hidden"&gt;
                    
                      Follow
                    
                  &lt;/div&gt;
                  &lt;div class="author-preview-metadata-container"&gt;&lt;/div&gt;
                &lt;/div&gt;
              &lt;/div&gt;
            &lt;/div&gt;

            &lt;span&gt;
              &lt;span class="crayons-story__tertiary fw-normal"&gt; for &lt;/span&gt;&lt;a href="/studio1hq" class="crayons-story__secondary fw-medium"&gt;Studio1&lt;/a&gt;
            &lt;/span&gt;
          &lt;/div&gt;
          &lt;a href="https://dev.to/studio1hq/running-llm-applications-across-providers-with-bifrost-313h" class="crayons-story__tertiary fs-xs"&gt;&lt;time&gt;Mar 17&lt;/time&gt;&lt;span class="time-ago-indicator-initial-placeholder"&gt;&lt;/span&gt;&lt;/a&gt;
        &lt;/div&gt;
      &lt;/div&gt;

    &lt;/div&gt;

    &lt;div class="crayons-story__indention"&gt;
      &lt;h2 class="crayons-story__title crayons-story__title-full_post"&gt;
        &lt;a href="https://dev.to/studio1hq/running-llm-applications-across-providers-with-bifrost-313h" id="article-link-3363768"&gt;
          Running LLM Applications Across Providers with Bifrost
        &lt;/a&gt;
      &lt;/h2&gt;
        &lt;div class="crayons-story__tags"&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/ai"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;ai&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/llm"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;llm&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/proxy"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;proxy&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/litellm"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;litellm&lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="crayons-story__bottom"&gt;
        &lt;div class="crayons-story__details"&gt;
          &lt;a href="https://dev.to/studio1hq/running-llm-applications-across-providers-with-bifrost-313h" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left"&gt;
            &lt;div class="multiple_reactions_aggregate"&gt;
              &lt;span class="multiple_reactions_icons_container"&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/raised-hands-74b2099fd66a39f2d7eed9305ee0f4553df0eb7b4f11b01b6b1b499973048fe5.svg" width="24" height="24"&gt;
                  &lt;/span&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/exploding-head-daceb38d627e6ae9b730f36a1e390fca556a4289d5a41abb2c35068ad3e2c4b5.svg" width="24" height="24"&gt;
                  &lt;/span&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/multi-unicorn-b44d6f8c23cdd00964192bedc38af3e82463978aa611b4365bd33a0f1f4f3e97.svg" width="24" height="24"&gt;
                  &lt;/span&gt;
              &lt;/span&gt;
              &lt;span class="aggregate_reactions_counter"&gt;15&lt;span class="hidden s:inline"&gt; reactions&lt;/span&gt;&lt;/span&gt;
            &lt;/div&gt;
          &lt;/a&gt;
            &lt;a href="https://dev.to/studio1hq/running-llm-applications-across-providers-with-bifrost-313h#comments" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left flex items-center"&gt;
              Comments


              &lt;span class="hidden s:inline"&gt;Add Comment&lt;/span&gt;
            &lt;/a&gt;
        &lt;/div&gt;
        &lt;div class="crayons-story__save"&gt;
          &lt;small class="crayons-story__tertiary fs-xs mr-2"&gt;
            5 min read
          &lt;/small&gt;
            
              &lt;span class="bm-initial"&gt;
                

              &lt;/span&gt;
              &lt;span class="bm-success"&gt;
                

              &lt;/span&gt;
            
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;


</description>
      <category>ai</category>
      <category>llm</category>
      <category>proxy</category>
      <category>litellm</category>
    </item>
    <item>
      <title>Build a Semantic Movie Discovery App with Claude Code and Weaviate Agent Skills</title>
      <dc:creator>Arindam Majumder </dc:creator>
      <pubDate>Fri, 27 Mar 2026 20:45:45 +0000</pubDate>
      <link>https://forem.com/studio1hq/build-a-semantic-movie-discovery-app-with-claude-code-and-weaviate-agent-skills-30gd</link>
      <guid>https://forem.com/studio1hq/build-a-semantic-movie-discovery-app-with-claude-code-and-weaviate-agent-skills-30gd</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Versatility in agentic coding is increasing as new tools such as Model Context Protocol (MCP) servers and Agent Skills become more common. At the same time, many developers ask the same question when building AI applications: should they use MCP servers or Agent Skills? The important thing is understanding what each approach does well and choosing the one that fits your use case.&lt;/p&gt;

&lt;p&gt;In this post, we’ll explain what MCP servers and Agent Skills are and how they differ, including architecture diagrams and technical details. In the later sections, we’ll also walk through how to use &lt;a href="https://github.com/weaviate/agent-skills" rel="noopener noreferrer"&gt;Weaviate Agent Skills&lt;/a&gt; with &lt;a href="https://code.claude.com/docs/en/overview" rel="noopener noreferrer"&gt;Claude Code&lt;/a&gt; to build a “Semantic Movie Discovery” application with several useful features.&lt;/p&gt;

&lt;p&gt;Let’s get started!&lt;/p&gt;

&lt;h2&gt;
  
  
  Understanding MCP
&lt;/h2&gt;

&lt;p&gt;The &lt;a href="https://modelcontextprotocol.io/" rel="noopener noreferrer"&gt;Model Context Protocol&lt;/a&gt; (MCP) is an open standard introduced by Anthropic that enables Large Language Models (LLMs) to interact with external systems such as data sources, APIs and services. MCP provides a structured way for an &lt;a href="https://weaviate.io/agentic-ai" rel="noopener noreferrer"&gt;AI agent&lt;/a&gt; to connect to compliant tools through a single interface instead of requiring custom integrations for each service.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flqfus3ya7jofj8kchzml.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flqfus3ya7jofj8kchzml.png" alt="MCP Architecture " width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  MCP Architecture
&lt;/h3&gt;

&lt;p&gt;The MCP system operates on a client–server model and consists of three main components.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Host:&lt;/strong&gt; the application that runs the AI model and provides the environment where the agent operates.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Client:&lt;/strong&gt; the protocol connector inside the host that handles communication between the model and MCP servers.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Server:&lt;/strong&gt; an external service that exposes tools, resources, or prompts that the agent can access.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  MCP and Agentic Coding
&lt;/h3&gt;

&lt;p&gt;Before MCP, each AI tool required custom integrations for every external service it wanted to connect to. MCP simplifies this process by introducing a shared protocol that multiple agents and tools can use.&lt;/p&gt;

&lt;p&gt;Developers can now expose capabilities through an MCP server once and allow any compatible agent to access them without building separate integrations for each system.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Understanding Agent Skills&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://platform.claude.com/docs/en/agents-and-tools/agent-skills/overview" rel="noopener noreferrer"&gt;Agent Skills&lt;/a&gt;, also introduced by Anthropic, provide developers with a simple way to extend AI coding agents without running MCP servers. An Agent Skill is a structured configuration file, usually written as markdown files with YAML metadata that defines capabilities, parameter schemas and natural-language instructions describing how the agent should use those capabilities.&lt;/p&gt;

&lt;p&gt;AI tools such as Claude Code read these files at session start and load the skills directly into the agent's working context without requiring an additional runtime.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn13awyixqnmfnllmjlld.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn13awyixqnmfnllmjlld.png" alt="Agent Skills with an AI tool (Claude Code)" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  How Agent Skills Work
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;When Claude Code detects a skill file in the project directory (typically under &lt;code&gt;.claude/skills/&lt;/code&gt;), it loads the manifest into the agent's context at the beginning of the session.&lt;/li&gt;
&lt;li&gt;The skill definition describes available capabilities, how to invoke them correctly and when to prefer one approach over another. Because the instructions are written in natural language alongside parameter schemas, the agent can reason about how to use the skill.&lt;/li&gt;
&lt;li&gt;Skills are portable across repositories. If a developer commits a skill file to a repository, any collaborator who clones the project and opens it in Claude Code automatically gains access to the same capabilities without additional setup.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;MCP and Agent Skills solve different problems in agent systems. MCP provides a standardized way for AI agents to connect to external tools, APIs, databases and services through a client–server architecture with structured schemas. Agent Skills extend the agent’s capabilities through configuration files that define workflows, instructions and parameter schemas without requiring a running server.&lt;/p&gt;

&lt;p&gt;In simple terms, &lt;strong&gt;MCP enables agents to access external systems, while Agent Skills define how agents perform tasks or workflows within their environment.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Weaviate Agent Skills
&lt;/h2&gt;

&lt;p&gt;Weaviate has released an official set of &lt;a href="https://github.com/weaviate/agent-skills" rel="noopener noreferrer"&gt;Agent Skills&lt;/a&gt; designed for use with Claude Code and other compatible agent-based development environments like Cursor, Antigravity, Windsurf and more. These skills provide structured access to Weaviate vector databases, allowing agents to perform common operations such as search, querying, schema inspection, data exploration and collection management.&lt;/p&gt;

&lt;p&gt;The repository includes ready-to-use skill definitions for tasks like semantic, hybrid and keyword search, along with natural language querying through the Query Agent. It also supports workflows such as creating collections, importing data and fetching filtered results, and cookbooks. This enables agents to interact/build with Weaviate and perform multi-step retrieval and agentic tasks more effectively.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhgiqyrgy3vpbq0xxz5ej.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhgiqyrgy3vpbq0xxz5ej.png" alt="Weaviate Ecosystem Tools and Features" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Agent Skills and Vector Databases
&lt;/h2&gt;

&lt;p&gt;AI coding agents face difficulties when working with vector databases. Vector database APIs provide extensive capabilities, including basic “key–value” retrieval, single-vector near-text searches, multimodal near-image searches, hybrid BM25-plus-vector search, generative modules and multi-tenant system support. Without structured guidance, even a capable coding agent may produce suboptimal queries: correct syntax but the wrong search strategy, missing parameters or failure to use powerful features like the Weaviate Query Agent.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://weaviate.io/blog/weaviate-agent-skills" rel="noopener noreferrer"&gt;Weaviate Agent Skills&lt;/a&gt; address this by providing correct usage patterns, parameter recommendations and decision logic, enabling coding agents to generate production-ready code from their initial attempts.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Weaviate Agent Skills repository is organized into two main parts&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Facdcuqk3n68wemqdz6hj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Facdcuqk3n68wemqdz6hj.png" alt="Overview of Weaviate Agent Skills" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Weaviate&lt;/strong&gt; 𝗦𝗸𝗶𝗹𝗹 (skills/weaviate): Focused scripts for tasks such as schema inspection, data ingestion and vector search. Agents use these while writing application logic or backend code.&lt;/li&gt;
&lt;li&gt;𝗖𝗼𝗼𝗸𝗯𝗼𝗼𝗸𝘀 &lt;strong&gt;Skill&lt;/strong&gt; (skills/weaviate-cookbooks): End-to-end project examples that combine tools such as FastAPI, Next.js and Weaviate to demonstrate full application workflows.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Weaviate Agent Skills work with several development environments, including Claude Code, Cursor, GitHub Copilot, VS Code and Gemini CLI. When connected to a Weaviate Cloud instance, agents can directly interact with database modules and perform search, data management and retrieval tasks.&lt;/p&gt;

&lt;p&gt;To evaluate how effective Weaviate Agent Skills really are, let’s build a small project and see how they accelerate RAG and agentic application development with Claude Code.&lt;/p&gt;

&lt;h2&gt;
  
  
  Building the Semantic Movie Discovery Application
&lt;/h2&gt;

&lt;p&gt;We will build a &lt;strong&gt;Movie Discovery App&lt;/strong&gt; that takes a natural-language description and returns the most semantically similar movies from a Weaviate collection. In the process, we will explore Weaviate capabilities such as multimodal storage, named vector search, generative AI (RAG) and the Query Agent in action with Claude Code, showing how these Agentic tools help you build applications faster.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Prerequisites&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://www.python.org/downloads/" rel="noopener noreferrer"&gt;Python 3.10&lt;/a&gt; or higher&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://docs.weaviate.io/weaviate/quickstart" rel="noopener noreferrer"&gt;Weaviate Cloud&lt;/a&gt; – Create a free cluster and obtain an API key.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.themoviedb.org/" rel="noopener noreferrer"&gt;TMDB API key&lt;/a&gt; – Used to fetch movie metadata&lt;/li&gt;
&lt;li&gt;OpenAI API key – Required for &lt;a href="https://weaviate.io/rag" rel="noopener noreferrer"&gt;RAG&lt;/a&gt; features.&lt;/li&gt;
&lt;li&gt;Access to &lt;a href="https://code.claude.com/docs/en/quickstart" rel="noopener noreferrer"&gt;Claude Code&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://nodejs.org/en/download" rel="noopener noreferrer"&gt;Node.js 18+&lt;/a&gt; and npm – Required to run the Next.js frontend&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Step 1: Project Setup
&lt;/h3&gt;

&lt;p&gt;Create a &lt;strong&gt;movie-discovery-app&lt;/strong&gt; folder&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;mkdir&lt;/span&gt; &lt;span class="n"&gt;movie&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;discovery&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;app&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Create and activate a  &lt;strong&gt;Python virtual environment&lt;/strong&gt; in the folder&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;movie-discovery-app py &lt;span class="nt"&gt;-m&lt;/span&gt; venv venv &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;source &lt;/span&gt;venv&lt;span class="se"&gt;\S&lt;/span&gt;cripts&lt;span class="se"&gt;\a&lt;/span&gt;ctivate.bat 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Install Python dependencies&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;weaviate-client&lt;span class="o"&gt;==&lt;/span&gt;4.20.1 fastapi uvicorn[standard] openai weaviate-agents&amp;gt;&lt;span class="o"&gt;=&lt;/span&gt;1.3.0 requests python-dotenv
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Install Node.js dependencies for the frontend&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;frontend &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; npm &lt;span class="nb"&gt;install&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, create a &lt;code&gt;.env&lt;/code&gt; file at the project root. Add the following parameters to configure &lt;strong&gt;Weaviate Agent Skills with Claude Code&lt;/strong&gt;, along with your &lt;strong&gt;OpenAI API key&lt;/strong&gt; and &lt;strong&gt;TMDB API key&lt;/strong&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;WEAVIATE_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;your&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;cluster&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;host&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;without&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;https&lt;/span&gt;
&lt;span class="n"&gt;WEAVIATE_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;your&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;api&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;
&lt;span class="n"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;your&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;openai&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;
&lt;span class="n"&gt;TMDB&lt;/span&gt; &lt;span class="n"&gt;API&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;your&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;tmdb&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;api&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After signing up for Weaviate, click the &lt;strong&gt;Create Cluster&lt;/strong&gt; button to start a new cluster for your use.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feo4cx6bxr7o7xkbqyu1j.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feo4cx6bxr7o7xkbqyu1j.png" alt="Image1" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Click &lt;strong&gt;“How to Connect”&lt;/strong&gt; to view the required Weaviate connection parameters.&lt;/p&gt;

&lt;p&gt;Now that everything is set up, we can connect Weaviate Cloud with &lt;strong&gt;Claude Code&lt;/strong&gt; by running &lt;code&gt;claude&lt;/code&gt; in your project terminal:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz9y0xh1tmthf9gp5hilm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz9y0xh1tmthf9gp5hilm.png" alt="Claude Code screnshot" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Use the following prompt in your Claude terminal:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;Write and run &lt;span class="sb"&gt;`check_modules.py`&lt;/span&gt; that connects using &lt;span class="sb"&gt;`weaviate.connect_to_weaviate_cloud`&lt;/span&gt;with &lt;span class="sb"&gt;`skip_init_checks=True`&lt;/span&gt;, loads credentials from &lt;span class="sb"&gt;`.env`&lt;/span&gt; with &lt;span class="sb"&gt;`python-dotenv`&lt;/span&gt;,
and prints the full JSON list of enabled Weaviate modules.
Run it with &lt;span class="sb"&gt;`venv/Scripts/python check_modules.py`&lt;/span&gt;."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 2: Create A Weaviate Collection and Import Sample Movie Data
&lt;/h3&gt;

&lt;p&gt;In this step, we create a Weaviate collection and import the movie dataset into Weaviate.  The dataset contains movie metadata sourced from the TMDB API. Each entry includes: &lt;em&gt;title, overview, release_date, poster_url, popularity, and other important movie fields&lt;/em&gt;. You can import a JSON or CSV dataset directly into Weaviate.&lt;/p&gt;

&lt;p&gt;Run this prompt to retrieve the dataset from the TMDB API and save it to a file named &lt;em&gt;movies.json&lt;/em&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;Create a TMDB dataset JSON file, movies.json, that contains 100 movie metadata and poster URLs directly from the TMDB API. 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Afterwards, &lt;a href="https://github.com/weaviate/agent-skills/blob/main/skills/weaviate/references/import_data.md" rel="noopener noreferrer"&gt;Weaviate Import Skills&lt;/a&gt; creates a Weaviate collection and imports the data from &lt;em&gt;movies.json&lt;/em&gt; into the Weaviate database. Claude code activates Weaviate to perform this action when prompted with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;Import &lt;span class="sb"&gt;`movie.json`&lt;/span&gt; into a new Weaviate collection called Movie
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnbeb2l8quvgqtbfmbzt7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnbeb2l8quvgqtbfmbzt7.png" alt="Claude Code" width="800" height="267"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Then the data is imported&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuihrumms8ofngypte6vi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuihrumms8ofngypte6vi.png" alt="Terminal Output" width="800" height="236"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: Building the FastAPI Backend and Next.js Frontend with Weaviate Cookbooks
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/weaviate/agent-skills/blob/main/skills/weaviate-cookbooks/references/frontend_interface.md" rel="noopener noreferrer"&gt;Weaviate cookbooks&lt;/a&gt; enable the app to use a two-layer architecture: a FastAPI backend that exposes REST endpoints and a Next.js frontend that renders the UI. The backend connects directly to Weaviate Cloud and the Weaviate Query Agent. Weaviate cookbooks also include some frontend guidelines to communicate with the &lt;a href="https://github.com/weaviate/agent-skills/blob/main/skills/weaviate-cookbooks/references/frontend_interface.md" rel="noopener noreferrer"&gt;Weaviate backend&lt;/a&gt; over HTTP.&lt;/p&gt;

&lt;p&gt;The app is organized into two views accessed via a collapsible sidebar:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Search view&lt;/strong&gt;: performs semantic search and RAG using Weaviate named vectors.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Chat view&lt;/strong&gt;: handles multi-turn conversations through the Weaviate Query Agent.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Our app includes the following features:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;strong&gt;Layer&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Component&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Role&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Backend&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;backend.py (FastAPI) - REST API on port 8000/docs&lt;/td&gt;
&lt;td&gt;Routes: GET /health, GET /search, POST /ai/explain, POST /ai/plan, POST /chat&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Frontend&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Next.js + TypeScript (port 3000)&lt;/td&gt;
&lt;td&gt;Single-page app with sidebar navigation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;SearchView.tsx&lt;/td&gt;
&lt;td&gt;Semantic search (near_text), AI explanations (single_prompt), Movie Night Planner (grouped_task)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;MovieCard.tsx&lt;/td&gt;
&lt;td&gt;Renders base64 poster inline, watchlist add/remove button&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;ChatView.tsx&lt;/td&gt;
&lt;td&gt;Multi-turn Query AI Agent chat&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;AppSidebar.tsx&lt;/td&gt;
&lt;td&gt;Navigation (Search/Chat), Weaviate logo + feature summary, watchlist manager with ‘.txt’ export&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Use the following prompts with Claude Code to generate the backend and frontend:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Backend Prompt:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;/weaviate cookbooks 

Create &lt;span class="sb"&gt;`backend.py`&lt;/span&gt;: a FastAPI app with CORS enabled for localhost:3000.
Connect to Weaviate Cloud using credentials from .env with skip_init_checks=True.
The /search endpoint should return genre and vote_average alongside title, description, release_year, and poster.
Implement these routes:  
&lt;span class="p"&gt;
-&lt;/span&gt; GET  /health                  → {"status": "ok"}  
&lt;span class="p"&gt;-&lt;/span&gt; GET  /search?q=...&amp;amp;limit=3    → near_text on text_vector, return title/description/release_year/poster  
&lt;span class="p"&gt;-&lt;/span&gt; POST /ai/explain              → generate.near_text with single_prompt  
&lt;span class="p"&gt;-&lt;/span&gt; POST /ai/plan                 → generate.near_text with grouped_task  
&lt;span class="p"&gt;-&lt;/span&gt; POST /chat                    → QueryAgent.ask() with full message history

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Frontend Prompt:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;Using Weaviate cookbooks frontend reference, create a Next.js TypeScript app in the frontend/ folder.
MovieCard.tsx should display a star rating (vote_average) and genre tag beneath the movie title. 

Components needed:  
&lt;span class="p"&gt;
-&lt;/span&gt; page.tsx        — SidebarProvider layout, view state (search | chat)  
&lt;span class="p"&gt;-&lt;/span&gt; SearchView.tsx  — search input, MovieCard grid, AI explain and plan buttons  
&lt;span class="p"&gt;-&lt;/span&gt; MovieCard.tsx   — poster image, title, year, description, watchlist button  
&lt;span class="p"&gt;-&lt;/span&gt; ChatView.tsx    — message bubbles, source citations, clear chat  
&lt;span class="p"&gt;-&lt;/span&gt; AppSidebar.tsx  — navigation, Weaviate logo + feature list, watchlist + exportBackend base URL from NEXT_PUBLIC_BACKEND_HOST env var (default localhost:8000)

Run backend and frontend servers with: uvicorn backend:app --reload --port 800 and npm run dev
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After this, Claude Code will automatically build the app by adding relevant files and start both servers. You can start using the application immediately.&lt;/p&gt;

&lt;p&gt;The FastAPI backend runs at &lt;code&gt;http://localhost:8000/docs&lt;/code&gt;while the frontend app is available at &lt;code&gt;http://localhost:3000&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;You can also manually start both processes in separate terminals:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Terminal 1 — Backend &lt;/span&gt;
uvicorn backend:app &lt;span class="nt"&gt;--reload&lt;/span&gt; &lt;span class="nt"&gt;--port&lt;/span&gt; 8000
&lt;span class="c"&gt;# Terminal 2 — Frontend&lt;/span&gt;
&lt;span class="nb"&gt;cd &lt;/span&gt;frontend &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; npm run dev
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Congratulations! You’ve completed the project without needing to do much manual configuration or coding.&lt;/strong&gt; 🔥&lt;/p&gt;

&lt;h3&gt;
  
  
  Demo
&lt;/h3&gt;

&lt;p&gt;So far, we have used Weaviate Agent Skills with Claude Code to build a Semantic Movie Discovery Application powered by an OpenAI API key, a TMDB API key, and Weaviate.&lt;/p&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/4udXaqI0PaQ"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Movie Discovery app we built includes the following features&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Semantic search:&lt;/strong&gt; Describe a mood or theme and retrieve matching movies using vector-based search (&lt;code&gt;near_text&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI explanations:&lt;/strong&gt; Generate per-movie summaries using RAG with &lt;code&gt;single_prompt&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Movie Night Planner:&lt;/strong&gt; Create a viewing order, snack pairings and a theme summary using &lt;code&gt;grouped_task&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Conversational chat:&lt;/strong&gt; Ask questions about the movie collection through a chat interface powered by the Weaviate Query Agent, with source citations.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Watchlist:&lt;/strong&gt; Save movies during your session and export the list as a &lt;code&gt;.txt&lt;/code&gt; file.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  What’s Next?
&lt;/h3&gt;

&lt;p&gt;You could add image-based search to find similar movies and better meet your movie choices. You could also include a hybrid search feature that incorporates keyword-heavy queries and image search. &lt;/p&gt;

&lt;p&gt;You can take your app even further by getting up to speed with Weaviate’s latest &lt;a href="https://weaviate.io/blog" rel="noopener noreferrer"&gt;releases&lt;/a&gt; and becoming familiar with features such as server-side batching, async replication improvements, Object TTL and many more.&lt;/p&gt;

&lt;p&gt;To explore further, check out the latest Weaviate &lt;a href="https://weaviate.io/blog" rel="noopener noreferrer"&gt;releases&lt;/a&gt; and join the discussion on the &lt;a href="https://forum.weaviate.io/" rel="noopener noreferrer"&gt;community forum&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Weaviant Agent Skills in Action&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Weaviate modules were used in the application:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Text2vec-weaviate:&lt;/strong&gt; Responsible for text embeddings.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi2multivec-weaviate:&lt;/strong&gt; Responsible for embedding images.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Generative-openai:&lt;/strong&gt; Integrates GPT directly into the query workflow.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Weaviate Skill:&lt;/strong&gt; Creates a collection and imports data.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Weaviate Cookbooks Skill:&lt;/strong&gt; For defining the app’s logic.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Weaviate Query Agent:&lt;/strong&gt; A higher-level abstraction that accepts natural language queries, decides the best query method, executes queries, synthesizes results and returns answers.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Weaviate Agent Skills help in shipping faster and more accurate RAG applications. Backend development tasks such as schema inspection, data ingestion and search operations are automated and optimized. Ultimately, this helps developers save valuable development time.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Both MCP servers and Agent Skills provide useful patterns for building AI-powered applications. MCP servers are well-suited for exposing external tools and services through a standardized interface, while Agent Skills focus on guiding coding agents with structured workflows and best practices.&lt;/p&gt;

&lt;p&gt;In this tutorial, we demonstrated how Weaviate Agent Skills can simplify development by helping Claude Code generate correct database queries, ingestion pipelines and search logic. By combining vector search, multimodal storage and generative capabilities, we built a semantic movie discovery application with minimal manual setup.&lt;/p&gt;

&lt;p&gt;As agentic development environments continue to evolve, tools like MCP servers and Agent Skills will likely be used together. The key is understanding where each approach fits and selecting the one that best supports your application architecture.&lt;/p&gt;

&lt;p&gt;Happy building.&lt;/p&gt;




&lt;h3&gt;
  
  
  Resources
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://modelcontextprotocol.io/docs/getting-started/intro" rel="noopener noreferrer"&gt;Model Context Protocol&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/weaviate/agent-skills" rel="noopener noreferrer"&gt;Weaviate Agent Skills&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://code.claude.com/docs/en/overview" rel="noopener noreferrer"&gt;Claude Code&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/Studio1HQ/movie-discovery-app" rel="noopener noreferrer"&gt;GitHub Repository&lt;/a&gt; for the Movie Discovery App&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>mcp</category>
      <category>rag</category>
      <category>webdev</category>
    </item>
    <item>
      <title>We Cut Our MCP Token Spend in Half. Here's the Architecture</title>
      <dc:creator>Arindam Majumder </dc:creator>
      <pubDate>Wed, 25 Mar 2026 19:04:52 +0000</pubDate>
      <link>https://forem.com/studio1hq/we-cut-our-mcp-token-spend-in-half-heres-the-architecture-1jic</link>
      <guid>https://forem.com/studio1hq/we-cut-our-mcp-token-spend-in-half-heres-the-architecture-1jic</guid>
      <description>&lt;p&gt;When we started scaling our MCP workflows, token usage was something we barely tracked. The system worked well, responses were accurate, and adding more tools felt like the right next step. Over time, the cost began rising in ways that did not align with how much the system was actually used.&lt;/p&gt;

&lt;p&gt;At first, we assumed this was due to higher usage or more complex queries. The data showed something else. Even simple requests were using more tokens than expected. This led us to ask a basic question. What exactly are we sending to the LLM on every call?&lt;/p&gt;

&lt;p&gt;A closer look made things clearer. The issue came from how the system was built. We handled context, tool definitions, and execution flow by adding extra tokens at every step.&lt;/p&gt;

&lt;p&gt;This article explains how we found the root cause and redesigned the architecture to fix it. The changes cut our MCP token usage by nearly half and gave us better control over how the system behaves.&lt;/p&gt;

&lt;h2&gt;
  
  
  Understanding Token Usage in MCP Systems
&lt;/h2&gt;

&lt;p&gt;Once we started examining token usage, a clear pattern showed up. The LLM was receiving far more context than most requests actually needed. A large part of this came from tool definitions being sent repeatedly on every call.&lt;/p&gt;

&lt;p&gt;Each request included the full list of tools, even when only one or two were needed. On top of that, earlier outputs and intermediate results were passed back into the model. The context kept growing, even for simple queries.&lt;/p&gt;

&lt;p&gt;The execution flow added to the problem. The LLM would choose a tool, call it, process the result, and then repeat the same cycle if another step was needed. Each step added more tokens, and the same data often appeared many times across calls.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fraya207lc4ie4r2yqsd2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fraya207lc4ie4r2yqsd2.png" alt="Image1" width="800" height="1422"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This setup worked at a small scale. As the number of tools increased, the cost grew quickly. More tools meant more context. More steps meant repeated processing. The system was doing extra work without adding real value. At this point, the cause was clear. Token usage came from how the system handled context and execution. The design itself was driving the overhead.&lt;/p&gt;

&lt;h2&gt;
  
  
  Introducing Bifrost
&lt;/h2&gt;

&lt;p&gt;We started looking for a way to change how the system handled tool execution. The goal was simple. Reduce the amount of context sent to the LLM and avoid repeated processing across steps.&lt;/p&gt;

&lt;p&gt;During this process, we came across &lt;a href="https://www.getmaxim.ai/bifrost" rel="noopener noreferrer"&gt;Bifrost&lt;/a&gt;, an &lt;a href="https://github.com/maximhq/bifrost" rel="noopener noreferrer"&gt;open source&lt;/a&gt; MCP gateway. It works between the application, the model, and the tools. It brings structure for how tools are discovered and executed, so the LLM receives only what is needed on each call.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flhnphaglsh5ymggy61oe.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flhnphaglsh5ymggy61oe.png" alt="Image" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This changed how we thought about the system. Tool access became more controlled. Context stayed limited to what was required for each request. The overall flow of execution became easier to follow and reason about.&lt;/p&gt;

&lt;p&gt;These changes directly addressed the issues we were seeing. Tool definitions were sent only when required. Repeated decision loops were reduced. The system handled execution in a more controlled and predictable way.&lt;/p&gt;

&lt;p&gt;From here, the focus moved away from adjusting prompts and toward changing how the system runs end-to-end.&lt;/p&gt;

&lt;h2&gt;
  
  
  Architectural Changes with Bifrost Code Mode
&lt;/h2&gt;

&lt;p&gt;The main change came from how execution was handled inside Bifrost. &lt;a href="https://docs.getbifrost.ai/mcp/code-mode" rel="noopener noreferrer"&gt;Code Mode&lt;/a&gt; is a Bifrost feature that changes how the LLM interacts with MCP tools. Earlier, the LLM handled both planning and step-by-step tool interaction. Each step required another call, and each call carried a growing context.&lt;/p&gt;

&lt;p&gt;Code Mode separates these responsibilities. The LLM focuses on planning. It generates executable code that defines the full workflow for a task. &lt;/p&gt;

&lt;p&gt;Code Mode works best when multiple MCP servers are involved, workflows have several steps, or tools need to share data. For simpler setups with one or two tools, Classic MCP works well.&lt;/p&gt;

&lt;p&gt;A mixed setup also works. Use Code Mode for heavier workflows like search or databases, and keep simple tools as direct calls.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcz78lp878cwfdmchwomm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcz78lp878cwfdmchwomm.png" alt="Image2" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Selecting the right tools&lt;/li&gt;
&lt;li&gt;Passing data between tools&lt;/li&gt;
&lt;li&gt;Defining how the final output is produced&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The system exposes a minimal interface to the LLM. It can list available tools, read tool details, and, when required, understand how each tool works. Tool definitions are accessed on demand, which keeps the initial context small.&lt;/p&gt;

&lt;p&gt;Once the plan is generated, execution moves to a runtime environment. The code runs in a sandbox and interacts directly with tools. All intermediate steps, tool responses, and data transformations stay within this layer.&lt;/p&gt;

&lt;p&gt;This removes the need for repeated LLM calls during execution. The workflow runs in one pass, guided by the generated code. The LLM is involved mainly at the planning stage and for producing the final response if required.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fawpurvuv48ogzbgr1rdu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fawpurvuv48ogzbgr1rdu.png" alt="Image" width="800" height="1200"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The flow becomes more structured. A request comes in, relevant tools are identified, code is generated, and execution happens in a controlled environment. The system handles state and intermediate data outside the LLM.&lt;/p&gt;

&lt;p&gt;This approach improves clarity in how tasks are executed. The generated code can be inspected, debugged, and understood directly. Each request follows a defined path, which makes behavior easier to track and reason about.&lt;/p&gt;

&lt;h2&gt;
  
  
  Using Bifrost CLI in Our Workflow
&lt;/h2&gt;

&lt;p&gt;Getting started required two commands. First, start the gateway:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx &lt;span class="nt"&gt;-y&lt;/span&gt; @maximhq/bifrost
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then launch the CLI from a separate terminal:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx &lt;span class="nt"&gt;-y&lt;/span&gt; @maximhq/bifrost-cli
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;MCP servers are registered once through the API. The key flag is &lt;code&gt;is_code_mode_client&lt;/code&gt;, which tells Bifrost to handle that server through Code Mode instead of sending its tool definitions on every request:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST http://localhost:8080/api/mcp/client &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
    "name": "youtube",
    "connection_type": "http",
    "connection_string": "http://localhost:3001/mcp",
    "tools_to_execute": ["*"],
    "is_code_mode_client": true
  }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once registered, the LLM discovers tools on demand using &lt;code&gt;listToolFiles&lt;/code&gt; and &lt;code&gt;readToolFile&lt;/code&gt;, then submits a full execution plan through &lt;code&gt;executeToolCode&lt;/code&gt;. A workflow that previously took six LLM turns now completes in three to four.&lt;/p&gt;

&lt;p&gt;Bifrost organizes tool definitions using two binding levels. Server-level (default) groups all tools from a server into one &lt;code&gt;.pyi&lt;/code&gt; file. Tool-level gives each tool its own file — better for servers with 30+ tools. Set it once in &lt;code&gt;config.json&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcp"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"tool_manager_config"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"code_mode_binding_level"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"server"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Debugging became simpler because the generated code is the execution plan. When something went wrong, the issue was visible directly in the code rather than buried in prompt chains. This setup also made execution easier to inspect.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;youtube&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AI infrastructure&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;maxResults&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;titles&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;snippet&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;items&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;
&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;titles&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;titles&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;count&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;titles&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The execution runs in a Starlark interpreter, a restricted subset of Python. A few constraints to keep in mind:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No import statements, file I/O, or network access&lt;/li&gt;
&lt;li&gt;Classes are not supported, use dictionaries&lt;/li&gt;
&lt;li&gt;Tool calls run synchronously; async handling is not required&lt;/li&gt;
&lt;li&gt;Each tool call has a default timeout of 30 seconds&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Code Mode also works with &lt;a href="https://docs.getbifrost.ai/mcp/agent-mode" rel="noopener noreferrer"&gt;Agent Mode&lt;/a&gt; for automated workflows. The &lt;code&gt;listToolFiles&lt;/code&gt; and &lt;code&gt;readToolFile&lt;/code&gt; tools are always auto-executable since they are read-only. &lt;/p&gt;

&lt;p&gt;The &lt;code&gt;executeToolCode&lt;/code&gt; tool only auto-executes if every tool call within the generated code is on the approved list. If any call falls outside that list, Bifrost returns it to the user for approval before running.&lt;/p&gt;

&lt;h2&gt;
  
  
  Impact on Token Usage and System Efficiency
&lt;/h2&gt;

&lt;p&gt;The reduction in token usage came from four specific changes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Tool schemas were sent only when required&lt;/li&gt;
&lt;li&gt;Intermediate outputs stayed within the execution layer&lt;/li&gt;
&lt;li&gt;Repeated context across steps was removed&lt;/li&gt;
&lt;li&gt;Fewer LLM calls were needed, since execution moved to a sandbox and ran in a single flow&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These changes had a clear effect. Token usage dropped by nearly half. Latency reduced along with it. Execution became more predictable, since each request followed a defined path with fewer moving parts.&lt;/p&gt;

&lt;p&gt;The broader takeaway is clear. Token cost comes from system design. Small changes in prompts or outputs help at the edges. The main overhead comes from the system's structure.&lt;/p&gt;

&lt;p&gt;LLMs work best when they focus on planning. Managing execution through repeated loops adds cost and introduces variability. A separate execution layer keeps the flow stable and easier to understand. Context also needs careful control. It should be built for each request with only the required information. Letting it grow across steps results in unnecessary overhead and increased token usage.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Token inefficiency in MCP workflows comes from system design. Bifrost and Code Mode introduced a clear separation between planning and execution. The LLM handles planning, and the runtime handles execution. This brought immediate and measurable improvements in both cost and system behavior.&lt;/p&gt;

&lt;p&gt;If you are working with MCP workflows at scale, &lt;a href="https://www.getmaxim.ai/bifrost" rel="noopener noreferrer"&gt;Bifrost&lt;/a&gt; is worth exploring. The &lt;a href="https://docs.getbifrost.ai/" rel="noopener noreferrer"&gt;documentation&lt;/a&gt; provides a good starting point to set up the gateway, connect servers, and run workflows using Code Mode.&lt;/p&gt;

</description>
      <category>mcp</category>
      <category>ai</category>
      <category>webdev</category>
      <category>programming</category>
    </item>
    <item>
      <title>Composer 2 is controversial, but my actual experience was solid</title>
      <dc:creator>Arindam Majumder </dc:creator>
      <pubDate>Sat, 21 Mar 2026 06:55:09 +0000</pubDate>
      <link>https://forem.com/arindam_1729/composer-2-is-controversial-but-my-actual-experience-was-solid-5a7h</link>
      <guid>https://forem.com/arindam_1729/composer-2-is-controversial-but-my-actual-experience-was-solid-5a7h</guid>
      <description>&lt;p&gt;I tried Composer 2 properly today, and honestly, if you put all the controversy aside for a second, the model itself is not bad at all.&lt;/p&gt;

&lt;p&gt;In fact, my first impression is that it’s a real upgrade over Composer 1 and 1.5. I gave it a pretty solid test. I asked it to build a full-stack Reddit clone and deploy it too.&lt;/p&gt;

&lt;p&gt;On the first go, it handled most of the work surprisingly well. The deployment also worked, which was a good sign. The main thing that broke was authentication.&lt;/p&gt;

&lt;p&gt;Then on the second prompt, I asked it to fix that, and it actually fixed the auth issue and redeployed the app.&lt;/p&gt;

&lt;p&gt;That said, it was not perfect. There were still some backend issues left that it could not fully solve. So I would not say it is at the level of Claude Opus 4.6 or GPT-5.4 for coding quality.&lt;/p&gt;

&lt;p&gt;But speed-wise, it felt much faster. For me, it was around 5 to 7x faster than Opus 4.6 / GPT-5.4 in actual workflow, and it also feels much more cost-effective.&lt;/p&gt;

&lt;p&gt;That combination matters a lot.&lt;/p&gt;

&lt;p&gt;Because even if the raw coding quality is still below Opus 4.6 / GPT-5.4, the overall experience was smoother than I expected. It gets you from idea to working product much faster, and for a lot of people that tradeoff will be worth it.&lt;/p&gt;

&lt;p&gt;My current take is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Better than Composer 1 / 1.5 by a clear margin&lt;/li&gt;
&lt;li&gt;Fast enough to change how often I’d use it&lt;/li&gt;
&lt;li&gt;Good at getting most of the app done quickly&lt;/li&gt;
&lt;li&gt;Still weak enough in backend reliability that I would not fully trust it yet for complex production work&lt;/li&gt;
&lt;li&gt;Not as strong as Opus 4.6 / GPT-5.4 in coding depth, but still very usable&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So yeah, I agree with the criticism that it is not on the same level as Opus 4.6 / GPT-5.4 for hard-coding tasks. ( may be because the base model is Kimi K2.5)&lt;/p&gt;

&lt;p&gt;But I also think some people are dismissing it too quickly. If you judge it as a fast, cheaper, improved Composer, it is genuinely solid. &lt;/p&gt;

&lt;p&gt;I shared a longer breakdown &lt;a href="https://www.youtube.com/watch?v=nv1fcjfC5wg" rel="noopener noreferrer"&gt;here&lt;/a&gt; with the exact build flow, where it got things right, and where it still fell short, in case anyone wants more context&lt;/p&gt;

</description>
      <category>cursor</category>
      <category>kimi</category>
      <category>ai</category>
      <category>composer</category>
    </item>
    <item>
      <title>Building an AI-Powered Content Moderation API with InsForge Edge Functions</title>
      <dc:creator>Arindam Majumder </dc:creator>
      <pubDate>Fri, 20 Mar 2026 09:55:13 +0000</pubDate>
      <link>https://forem.com/arindam_1729/building-an-ai-powered-content-moderation-api-with-insforge-edge-functions-j0k</link>
      <guid>https://forem.com/arindam_1729/building-an-ai-powered-content-moderation-api-with-insforge-edge-functions-j0k</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Modern applications rely on user-generated content such as comments, reviews, and messages. Platforms must moderate this content to enforce safety policies and maintain compliance. Manual moderation does not scale, so production systems typically rely on automated moderation pipelines powered by AI.&lt;/p&gt;

&lt;p&gt;Traditional implementations require multiple backend services. Developers often provision servers, integrate AI APIs, manage databases, and configure storage separately. This fragmented setup increases operational overhead and slows development. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/InsForge/InsForge" rel="noopener noreferrer"&gt;InsForge&lt;/a&gt; simplifies this architecture by combining Edge Functions, PostgreSQL Database, Storage, and Model Gateway in a single platform. Benchmarks also show that it can deliver &lt;a href="https://insforge.dev/blog/mcpmark-benchmark-results-v2" rel="noopener noreferrer"&gt;~1.6× faster responses and 2.4x lower token usage&lt;/a&gt; compared to fragmented integrations.&lt;/p&gt;

&lt;p&gt;In this tutorial, we will build a production-ready AI moderation API that runs entirely within InsForge.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcqdx9ozf5ku2uwyr2ypm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcqdx9ozf5ku2uwyr2ypm.png" alt=" " width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What We Are Building
&lt;/h2&gt;

&lt;p&gt;Here are the tools that we will be using to build a simple backend moderation workflow using InsForge core services:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;AI Moderation API Endpoint:&lt;/strong&gt; We will create an API endpoint using &lt;a href="https://docs.insforge.dev/core-concepts/functions/architecture" rel="noopener noreferrer"&gt;Edge Functions&lt;/a&gt; that accepts user-submitted text content and processes moderation requests.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI-Powered Content Evaluation:&lt;/strong&gt; The API will use Model Gateway to access an AI model that classifies submitted content as SAFE or UNSAFE.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Database Storage for Approved Content:&lt;/strong&gt; Approved comments will be stored in a PostgreSQL &lt;a href="https://docs.insforge.dev/core-concepts/database/architecture" rel="noopener noreferrer"&gt;Database managed by InsForge&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Attachment Handling with Storage:&lt;/strong&gt; Optional user attachments will be uploaded and stored using &lt;a href="https://docs.insforge.dev/core-concepts/storage/architecture" rel="noopener noreferrer"&gt;Storage Buckets&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Automated Moderation Response:&lt;/strong&gt; Unsafe content will be rejected immediately, and the API will return a structured moderation response.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Production-Ready Backend Workflow:&lt;/strong&gt; The moderation pipeline will run entirely within InsForge using Database, Edge Functions, &lt;a href="https://docs.insforge.dev/core-concepts/ai/architecture" rel="noopener noreferrer"&gt;Model Gateway&lt;/a&gt;, and Storage, without external servers or additional infrastructure.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Project Setup and Repository Structure
&lt;/h2&gt;

&lt;p&gt;Before configuring the backend resources, clone the project repository and review the project structure.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/Studio1HQ/Content-moderation-Insforge" rel="noopener noreferrer"&gt;Clone the repository&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/Studio1HQ/Content-moderation-Insforge
&lt;span class="nb"&gt;cd &lt;/span&gt;content-moderation-app
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Install dependencies:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The repository contains both the Next.js frontend and the InsForge Edge Function used for moderation.&lt;/p&gt;

&lt;h3&gt;
  
  
  Repository Structure
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Folder&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;src/app&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Next.js application pages and layouts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;src/components&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;UI components such as the moderation form&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;src/lib&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Client utilities for connecting to InsForge APIs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;insforge-functions/moderate-comment&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Edge Function implementation for moderation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;handler.ts&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Serverless function that processes moderation requests&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This structure keeps the frontend and backend logic organized within the same project while allowing the Edge Function to be deployed independently.&lt;/p&gt;

&lt;p&gt;After cloning the repository, proceed with configuring the backend resources in InsForge.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Note: You can set up this backend in two ways. Follow the manual steps in this tutorial to create the database, storage bucket, and Edge Function using the dashboard and CLI. Alternatively, you can use InsForge MCP with your AI coding agent to provision the same resources using a single prompt. See the MCP section at the end of the article for the prompt template and instructions.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Step 1: Setting Up the Database
&lt;/h2&gt;

&lt;p&gt;InsForge provides a managed PostgreSQL Database that you can configure directly from the dashboard.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Open the Tables Section&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Open your project in the InsForge Dashboard.&lt;/li&gt;
&lt;li&gt;In the left sidebar, select Tables.&lt;/li&gt;
&lt;li&gt;Click the + icon next to Tables.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3w6xkfmm1hgtpswgto3r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3w6xkfmm1hgtpswgto3r.png" alt="Image1" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Create the following columns.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Column&lt;/th&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;id&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;uuid&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Primary key for each comment&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;content&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;string&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;User submitted comment text&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;attachment_url&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;string&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;URL for uploaded file (optional)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;status&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;string&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Moderation result (&lt;code&gt;approved&lt;/code&gt; or &lt;code&gt;rejected&lt;/code&gt;)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;created_at&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;timestamp&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Time when the comment was created&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Save the Table&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Click Create Table to apply the schema.&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;comments&lt;/code&gt; table will appear in the Tables panel.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F53ywzcg9oeat3v2g9ugd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F53ywzcg9oeat3v2g9ugd.png" alt="Image3" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2: Creating the Edge Function
&lt;/h2&gt;

&lt;p&gt;Next, create the serverless API that will process moderation requests.&lt;/p&gt;

&lt;p&gt;InsForge Edge Functions allow you to run backend logic without managing servers. In this tutorial, the function receives user content, evaluates it using AI, and stores approved results in the database.&lt;/p&gt;

&lt;p&gt;Navigate to the Edge Function directory in the repository:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;insforge-functions/moderate-comment/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Inside this folder, there will be a file named:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;handler.ts
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This file will contain the moderation logic executed by the Edge Function.&lt;/p&gt;

&lt;p&gt;The Edge Function performs the following tasks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Accept a POST request containing user content.&lt;/li&gt;
&lt;li&gt;Send the content to the AI model through Model Gateway.&lt;/li&gt;
&lt;li&gt;Classify the content as SAFE or UNSAFE.&lt;/li&gt;
&lt;li&gt;Upload attachments to Storage if present.&lt;/li&gt;
&lt;li&gt;Insert approved content into the comments table.&lt;/li&gt;
&lt;li&gt;Return a structured moderation response.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All moderation logic runs inside the Edge Function, keeping the backend workflow centralized within InsForge.&lt;/p&gt;

&lt;p&gt;Deploy the function using the InsForge CLI:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;insforge functions deploy moderate-comment--file ./insforge-functions/moderate-comment/handler.ts
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faslzh2e3doyhv84lfjc7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faslzh2e3doyhv84lfjc7.png" alt="Image5" width="800" height="267"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Once deployed, the function becomes available as a backend API endpoint that the frontend application can call.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fljbdqd5046ag3genf6us.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fljbdqd5046ag3genf6us.png" alt="Image6" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 3: AI Integration Inside the Function
&lt;/h2&gt;

&lt;p&gt;The moderation logic inside the Edge Function uses Model Gateway, which provides unified access to multiple AI models directly within InsForge.&lt;/p&gt;

&lt;p&gt;Model Gateway allows Edge Functions to call AI models without configuring external API clients or managing provider-specific integrations.&lt;/p&gt;

&lt;p&gt;Open the Model Gateway section in the InsForge dashboard and enable a model for the project.&lt;/p&gt;

&lt;p&gt;For this tutorial, enable:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;openai/gpt-4o-mini
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This model will be used to classify incoming content during moderation.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbysoea6ct9gsuzn5ye2v.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbysoea6ct9gsuzn5ye2v.png" alt="Image9" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Use the CLI to send a test request to the moderation API.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="n"&gt;insforge&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;functions&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;invoke&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;moderate-comment--data&lt;/span&gt;&lt;span class="s2"&gt;"{\"&lt;/span&gt;&lt;span class="nx"&gt;content\&lt;/span&gt;&lt;span class="s2"&gt;":\"&lt;/span&gt;&lt;span class="nx"&gt;This&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;community&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;platform&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;is&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;very&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;helpful.\&lt;/span&gt;&lt;span class="s2"&gt;"}"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This command sends a JSON payload containing the &lt;code&gt;content&lt;/code&gt; field to the Edge Function.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fngzbpe1ujssmhstwvlyg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fngzbpe1ujssmhstwvlyg.png" alt="Image10" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The Edge Function also inserts the approved comment into the comments table in the database.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 4: Configuring Insforge Storage
&lt;/h2&gt;

&lt;p&gt;The moderation workflow also supports optional file uploads using InsForge Storage. Storage provides an S3-compatible object storage system that integrates directly with Edge Functions and the database.&lt;/p&gt;

&lt;p&gt;When a user submits a comment with an attachment, the Edge Function uploads the file to a storage bucket before inserting the comment into PostgreSQL.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Create a Storage Bucket&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Open the Storage section in the InsForge dashboard.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Navigate to Storage in the sidebar.&lt;/li&gt;
&lt;li&gt;Click Create Bucket.&lt;/li&gt;
&lt;li&gt;Name the bucket: attachments&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This bucket will store files uploaded with moderated comments. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7no9lh3kevy5mv8t7n65.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7no9lh3kevy5mv8t7n65.png" alt="Image 10" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The upload operation returns a &lt;strong&gt;public file URL&lt;/strong&gt;, which is stored in the &lt;code&gt;attachment_url&lt;/code&gt; column of the &lt;code&gt;comments&lt;/code&gt; table.&lt;/p&gt;

&lt;p&gt;The moderation function processes attachments as follows:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The user submits content with an optional file.&lt;/li&gt;
&lt;li&gt;The Edge Function evaluates the text using AI moderation.&lt;/li&gt;
&lt;li&gt;If the content is classified as SAFE, the file is uploaded to the attachments bucket.&lt;/li&gt;
&lt;li&gt;The returned file URL is stored in the comments table.&lt;/li&gt;
&lt;li&gt;If the content is UNSAFE, the function rejects the request and no file is uploaded.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This ensures that only approved content and attachments are stored, keeping the storage system aligned with the moderation rules.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 5: Building the Next.js UI
&lt;/h2&gt;

&lt;p&gt;The repository already includes a &lt;strong&gt;Next.js application&lt;/strong&gt; that provides a simple interface for interacting with the moderation API.&lt;/p&gt;

&lt;p&gt;Navigate to the frontend code inside the &lt;code&gt;src&lt;/code&gt; directory.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key UI Files&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;File / Folder&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;src/app/page.tsx&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Main page that renders the moderation interface&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;src/components&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Reusable UI components for the moderation workflow&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;src/lib/insforge.ts&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Utility for connecting the frontend to the InsForge backend&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The UI includes a form where users submit content for moderation.&lt;/p&gt;

&lt;p&gt;The form collects:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Text content entered by the user&lt;/li&gt;
&lt;li&gt;Optional file attachment&lt;/li&gt;
&lt;li&gt;Submit an action that triggers the moderation request&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When the user submits the form, the application sends a POST request to the Edge Function endpoint.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8zaapex5dbdokcbra9ay.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8zaapex5dbdokcbra9ay.png" alt="Image11" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The UI handles the API response and updates the interface accordingly.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Approved comments appear in the moderation results section.&lt;/li&gt;
&lt;li&gt;Rejected content displays an error message.&lt;/li&gt;
&lt;li&gt;Approved entries are also visible in the comments database table.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This setup creates a complete workflow where the Next.js UI communicates with the InsForge Edge Function to perform moderation in real time.&lt;/p&gt;

&lt;h3&gt;
  
  
  Using an AI Agent to Build the UI
&lt;/h3&gt;

&lt;p&gt;You can also accelerate this step using an AI coding agent (such as Cursor, Claude Code, or other agent-based tools). Instead of manually writing the UI components, the agent can generate the form, API calls, and component structure based on a prompt.&lt;/p&gt;

&lt;p&gt;Example prompt:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;Create&lt;/span&gt; &lt;span class="nx"&gt;a&lt;/span&gt; &lt;span class="nx"&gt;Next&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;js&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="nx"&gt;a&lt;/span&gt; &lt;span class="nx"&gt;content&lt;/span&gt; &lt;span class="nx"&gt;moderation&lt;/span&gt; &lt;span class="nx"&gt;demo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;

&lt;span class="nx"&gt;Requirements&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;A&lt;/span&gt; &lt;span class="nx"&gt;form&lt;/span&gt; &lt;span class="kd"&gt;with&lt;/span&gt; &lt;span class="nx"&gt;a&lt;/span&gt; &lt;span class="nx"&gt;textarea&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="nx"&gt;user&lt;/span&gt; &lt;span class="nx"&gt;comments&lt;/span&gt;
&lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;An&lt;/span&gt; &lt;span class="nx"&gt;optional&lt;/span&gt; &lt;span class="nx"&gt;file&lt;/span&gt; &lt;span class="nx"&gt;upload&lt;/span&gt; &lt;span class="nx"&gt;input&lt;/span&gt;
&lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;A&lt;/span&gt; &lt;span class="nx"&gt;submit&lt;/span&gt; &lt;span class="nx"&gt;button&lt;/span&gt;
&lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;Send&lt;/span&gt; &lt;span class="nx"&gt;a&lt;/span&gt; &lt;span class="nx"&gt;POST&lt;/span&gt; &lt;span class="nx"&gt;request&lt;/span&gt; &lt;span class="nx"&gt;to&lt;/span&gt; &lt;span class="nx"&gt;the&lt;/span&gt; &lt;span class="nx"&gt;InsForge&lt;/span&gt; &lt;span class="nx"&gt;Edge&lt;/span&gt; &lt;span class="nb"&gt;Function&lt;/span&gt; &lt;span class="nx"&gt;endpoint&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="nx"&gt;moderation&lt;/span&gt;
&lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;Display&lt;/span&gt; &lt;span class="nx"&gt;the&lt;/span&gt; &lt;span class="nx"&gt;moderation&lt;/span&gt; &lt;span class="nf"&gt;result &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;approved&lt;/span&gt; &lt;span class="nx"&gt;or&lt;/span&gt; &lt;span class="nx"&gt;rejected&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="nx"&gt;the&lt;/span&gt; &lt;span class="nx"&gt;UI&lt;/span&gt;
&lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;Use&lt;/span&gt; &lt;span class="nx"&gt;React&lt;/span&gt; &lt;span class="nx"&gt;state&lt;/span&gt; &lt;span class="nx"&gt;to&lt;/span&gt; &lt;span class="nx"&gt;handle&lt;/span&gt; &lt;span class="nx"&gt;form&lt;/span&gt; &lt;span class="nx"&gt;submission&lt;/span&gt; &lt;span class="nx"&gt;and&lt;/span&gt; &lt;span class="nx"&gt;responses&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 6: Testing the API Endpoint
&lt;/h2&gt;

&lt;p&gt;After deploying the Edge Function and setting up the UI, test the moderation workflow to verify that the API behaves correctly. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Submit Safe Content&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Enter a comment through the UI and submit the form.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9sz5omlqs9mvejn1zo2m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9sz5omlqs9mvejn1zo2m.png" alt="Image 12" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Expected behavior:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The Edge Function sends the content to the AI moderation model.&lt;/li&gt;
&lt;li&gt;The model classifies the text as SAFE.&lt;/li&gt;
&lt;li&gt;The function inserts the comment into the comments table in PostgreSQL.&lt;/li&gt;
&lt;li&gt;If an attachment is included, the file is uploaded to the attachments storage bucket.&lt;/li&gt;
&lt;li&gt;The API returns an approved response to the frontend.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F32e8hp25nekpkmnmwd4u.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F32e8hp25nekpkmnmwd4u.png" alt="Image 13" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Next, test a rejection case.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff3putsxojxrqvnfim1ft.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff3putsxojxrqvnfim1ft.png" alt="Image 14" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Expected behavior:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The Edge Function sends the text to the AI moderation model.&lt;/li&gt;
&lt;li&gt;The model classifies the content as UNSAFE.&lt;/li&gt;
&lt;li&gt;The function immediately returns a rejection response.&lt;/li&gt;
&lt;li&gt;No entry is inserted into the comments table.&lt;/li&gt;
&lt;li&gt;No file is uploaded to Storage.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu71rplg51e61daejmmuh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu71rplg51e61daejmmuh.png" alt="Image 16" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The table in your Insforge dashboard also reflects the results: &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqrufr30e913dc6304kbk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqrufr30e913dc6304kbk.png" alt="Image 17" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 6: Deployment Using InsForge
&lt;/h2&gt;

&lt;p&gt;Once the function and UI are ready, deploy the backend using the InsForge CLI. This publishes the Edge Function and connects it to the project environment.&lt;/p&gt;

&lt;p&gt;Refer to the &lt;a href="https://insforge.dev/blog/insforge-deployment" rel="noopener noreferrer"&gt;deployment guide here&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Authenticate the CLI with your InsForge account.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;insforge auth login
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Complete the authentication process in the browser. Link the local project directory to your InsForge backend.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;insforge &lt;span class="nb"&gt;link&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Select the project created earlier in the InsForge dashboard. This connects the CLI to the correct backend workspace.&lt;/p&gt;

&lt;p&gt;Deploy the Next.js application while passing the required environment variable.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="n"&gt;insforge&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;deployments&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;deploy&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;--env&lt;/span&gt;&lt;span class="s2"&gt;"{\"&lt;/span&gt;&lt;span class="nx"&gt;NEXT_PUBLIC_INSFORGE_BASE_URL\&lt;/span&gt;&lt;span class="s2"&gt;":\"&lt;/span&gt;&lt;span class="nx"&gt;https://your-project.insforge.app\&lt;/span&gt;&lt;span class="s2"&gt;"}"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This environment variable allows the frontend to communicate with the deployed Edge Function.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fosegkycwonbfpt46vvw9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fosegkycwonbfpt46vvw9.png" alt="Image 14" width="800" height="267"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Verify the Deployment&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;After deployment, the application becomes accessible via the InsForge-hosted domain.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwej10rcemepkz59bxtit.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwej10rcemepkz59bxtit.png" alt="Image 16" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Access the &lt;a href="https://sec3hf94.insforge.site/" rel="noopener noreferrer"&gt;live demo here&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Using MCP to Accelerate Development
&lt;/h2&gt;

&lt;p&gt;Instead of manually creating tables, storage buckets, and Edge Functions, you can also configure the backend using &lt;a href="https://docs.insforge.dev/mcp-setup" rel="noopener noreferrer"&gt;Remote MCP (Model Context Protocol)&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;MCP exposes InsForge backend capabilities as tools that an AI coding agent can call to provision resources automatically. With a single prompt, the agent can generate the database schema, configure storage, and deploy the moderation function.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpt7ojupntpxibvg3snb4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpt7ojupntpxibvg3snb4.png" alt="Image 17" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Example prompt used to create this backend workflow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;Create&lt;/span&gt; &lt;span class="n"&gt;backend&lt;/span&gt; &lt;span class="n"&gt;resources&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="n"&gt;moderation&lt;/span&gt; &lt;span class="n"&gt;application&lt;/span&gt; &lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="n"&gt;InsForge&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;

&lt;span class="n"&gt;Requirements&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;

&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt; &lt;span class="k"&gt;Create&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="n"&gt;PostgreSQL&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;named&lt;/span&gt; &lt;span class="nv"&gt;"comments"&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;fields&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
   &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;UUID&lt;/span&gt; &lt;span class="k"&gt;primary&lt;/span&gt; &lt;span class="k"&gt;key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
   &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
   &lt;span class="n"&gt;attachment_url&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;nullable&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
   &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
   &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt; &lt;span class="k"&gt;Create&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="k"&gt;storage&lt;/span&gt; &lt;span class="n"&gt;bucket&lt;/span&gt; &lt;span class="n"&gt;named&lt;/span&gt; &lt;span class="nv"&gt;"attachments"&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;storing&lt;/span&gt; &lt;span class="n"&gt;uploaded&lt;/span&gt; &lt;span class="n"&gt;files&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;

&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt; &lt;span class="k"&gt;Create&lt;/span&gt; &lt;span class="n"&gt;an&lt;/span&gt; &lt;span class="n"&gt;Edge&lt;/span&gt; &lt;span class="k"&gt;Function&lt;/span&gt; &lt;span class="n"&gt;named&lt;/span&gt; &lt;span class="nv"&gt;"moderate-comment"&lt;/span&gt; &lt;span class="n"&gt;that&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
   &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;accepts&lt;/span&gt; &lt;span class="n"&gt;POST&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="k"&gt;comment&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;
   &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;sends&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt; &lt;span class="k"&gt;to&lt;/span&gt; &lt;span class="n"&gt;an&lt;/span&gt; &lt;span class="n"&gt;AI&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;
   &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;classifies&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;SAFE&lt;/span&gt; &lt;span class="k"&gt;or&lt;/span&gt; &lt;span class="n"&gt;UNSAFE&lt;/span&gt;
   &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;uploads&lt;/span&gt; &lt;span class="n"&gt;attachments&lt;/span&gt; &lt;span class="k"&gt;to&lt;/span&gt; &lt;span class="k"&gt;storage&lt;/span&gt; &lt;span class="n"&gt;if&lt;/span&gt; &lt;span class="n"&gt;present&lt;/span&gt;
   &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;inserts&lt;/span&gt; &lt;span class="n"&gt;approved&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="k"&gt;into&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="k"&gt;database&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Using MCP, developers can provision backend resources and deploy functions directly from prompts, significantly accelerating backend setup while keeping the same architecture described in this tutorial.&lt;/p&gt;

&lt;p&gt;Refer to the &lt;a href="https://docs.insforge.dev/mcp-setup" rel="noopener noreferrer"&gt;quick demo here&lt;/a&gt;. &lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;In this tutorial, we built a content moderation API using InsForge Edge Functions, integrated AI-powered classification through Model Gateway, stored approved results in PostgreSQL, and handled optional file uploads with Storage. The entire workflow runs inside InsForge, without external servers or fragmented infrastructure.&lt;/p&gt;

&lt;p&gt;This approach demonstrates how developers can combine Edge Functions, AI integration, database services, and storage to implement production-ready backend APIs with minimal operational overhead.&lt;/p&gt;

&lt;p&gt;If your application relies on user-generated content, moderation pipelines, or AI-assisted workflows, this architecture provides a straightforward and scalable foundation.&lt;/p&gt;

&lt;p&gt;Ready to simplify your backend stack? Explore InsForge’s Edge Functions, Model Gateway, PostgreSQL database, and Storage services to build intelligent APIs without managing infrastructure.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Try &lt;a href="https://github.com/InsForge/InsForge" rel="noopener noreferrer"&gt;InsForge&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Quickstart guide &lt;a href="https://github.com/InsForge/InsForge?tab=readme-ov-file#quickstart" rel="noopener noreferrer"&gt;here&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>fullstack</category>
      <category>insforge</category>
      <category>edgefunctions</category>
    </item>
    <item>
      <title>Cursor Composer 2: Features, Pricing, Benchmarks, and Initial Impressions</title>
      <dc:creator>Arindam Majumder </dc:creator>
      <pubDate>Thu, 19 Mar 2026 20:25:28 +0000</pubDate>
      <link>https://forem.com/arindam_1729/cursor-composer-20-features-pricing-benchmarks-and-initial-impressions-19jd</link>
      <guid>https://forem.com/arindam_1729/cursor-composer-20-features-pricing-benchmarks-and-initial-impressions-19jd</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Cursor has released Composer 2, the latest version of its in-house coding model.&lt;/p&gt;

&lt;p&gt;The announcement is focused and fairly easy to summarize. Cursor is making three main claims:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Composer 2 is frontier-level at coding&lt;/li&gt;
&lt;li&gt;it is materially better than previous Composer versions on Cursor’s published benchmarks&lt;/li&gt;
&lt;li&gt;it is priced aggressively enough to be practical for everyday use&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That combination makes the release worth paying attention to. In this post, I’ll walk through what Composer 2 is, what Cursor says improved, how the benchmark results look, what the pricing means, and my initial take on the release.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Is Composer 2?
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9tq4zz7n7m00yc7hk1gy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9tq4zz7n7m00yc7hk1gy.png" alt="Image1" width="800" height="418"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Composer 2 is Cursor’s latest in-house coding model.&lt;/p&gt;

&lt;p&gt;Cursor describes it as frontier-level at coding and positions it as a better cost-performance option for agentic software work. The model is now available in Cursor, and the announcement puts most of the emphasis on three areas:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;stronger coding performance&lt;/li&gt;
&lt;li&gt;improved long-horizon task handling&lt;/li&gt;
&lt;li&gt;lower cost than many competing fast models&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Unlike some model launches that bundle a large number of product features together, this one is mostly about the model itself. Cursor is not presenting Composer 2 as a general platform shift. It is presenting it as a more capable and more economical coding model.&lt;/p&gt;




&lt;h2&gt;
  
  
  Composer 2 Key Features
&lt;/h2&gt;

&lt;p&gt;The Composer 2 announcement is short, but there are still a few important takeaways.&lt;/p&gt;

&lt;h3&gt;
  
  
  Better coding performance
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F839qounser481rjf4b1r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F839qounser481rjf4b1r.png" alt="Image4" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Cursor says Composer 2 delivers large improvements on all of the benchmarks it tracks, including Terminal-Bench 2.0 and SWE-bench Multilingual.&lt;/p&gt;

&lt;p&gt;That matters because it suggests the gains are not limited to one internal evaluation. Cursor is showing improvement across several coding-oriented benchmarks rather than relying on a single headline number.&lt;/p&gt;

&lt;h3&gt;
  
  
  Continued pretraining
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa5tq4atu9e7ii4hrx0do.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa5tq4atu9e7ii4hrx0do.png" alt="Image3" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;One of the most notable details in the post is that these improvements come from Cursor’s first continued pretraining run.&lt;/p&gt;

&lt;p&gt;This is important because continued pretraining is often what gives a model a stronger base before more specialized post-training methods are applied. Cursor is explicitly saying that Composer 2 starts from a better foundation than earlier Composer versions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Reinforcement learning for long-horizon tasks
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frphs7vriemh5upww7cyi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frphs7vriemh5upww7cyi.png" alt="Image2" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Cursor also says it trains Composer 2 on long-horizon coding tasks using reinforcement learning.&lt;/p&gt;

&lt;p&gt;This is probably the most interesting technical claim in the announcement. Cursor says Composer 2 can solve challenging tasks requiring hundreds of actions. That implies the model is being optimized for sustained multi-step software tasks, not just short code completions or simple edits.&lt;/p&gt;

&lt;h3&gt;
  
  
  A fast variant with the same intelligence
&lt;/h3&gt;

&lt;p&gt;Cursor also introduces a faster Composer 2 variant and says it has the same intelligence.&lt;/p&gt;

&lt;p&gt;That is a useful product choice. Instead of forcing users to pick between a “smart” model and a “fast” model family, Cursor is presenting speed as a deployment option on top of the same underlying capability level.&lt;/p&gt;




&lt;h2&gt;
  
  
  Composer 2 Benchmarks
&lt;/h2&gt;

&lt;p&gt;Cursor publishes three benchmark comparisons in the announcement:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;CursorBench&lt;/th&gt;
&lt;th&gt;Terminal-Bench 2.0&lt;/th&gt;
&lt;th&gt;SWE-bench Multilingual&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Composer 2&lt;/td&gt;
&lt;td&gt;61.3&lt;/td&gt;
&lt;td&gt;61.7&lt;/td&gt;
&lt;td&gt;73.7&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Composer 1.5&lt;/td&gt;
&lt;td&gt;44.2&lt;/td&gt;
&lt;td&gt;47.9&lt;/td&gt;
&lt;td&gt;65.9&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Composer 1&lt;/td&gt;
&lt;td&gt;38.0&lt;/td&gt;
&lt;td&gt;40.0&lt;/td&gt;
&lt;td&gt;56.9&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;These gains are large enough to be meaningful.&lt;/p&gt;

&lt;p&gt;The biggest point here is not just that Composer 2 is ahead of Composer 1 and 1.5, but that the improvements show up consistently across all three benchmarks. That gives the release more credibility than a single isolated result would.&lt;/p&gt;

&lt;p&gt;Terminal-Bench 2.0 is especially relevant because Cursor frames it as an evaluation for agentic terminal use. If Composer 2 is genuinely stronger there, that supports Cursor’s claim that the model is getting better at longer, more interactive coding tasks.&lt;/p&gt;

&lt;p&gt;SWE-bench Multilingual is also worth noting because it suggests broader coding competence beyond narrow English-only setups.&lt;/p&gt;

&lt;p&gt;Still, these are vendor-published numbers, so the right takeaway is measured optimism rather than certainty.&lt;/p&gt;




&lt;h2&gt;
  
  
  How Composer 2 Is Priced
&lt;/h2&gt;

&lt;p&gt;Cursor says Composer 2 is priced at:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;$0.50 per million input tokens&lt;/li&gt;
&lt;li&gt;$2.50 per million output tokens&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The faster variant is priced at:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;$1.50 per million input tokens&lt;/li&gt;
&lt;li&gt;$7.50 per million output tokens&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Cursor also says the fast variant has lower cost than other fast models and that fast will be the default option.&lt;/p&gt;

&lt;p&gt;This part of the announcement is more important than it looks. Model releases are usually judged on benchmark quality first, but pricing determines whether a model becomes part of normal daily use or gets reserved for occasional high-value tasks. Cursor is clearly trying to push Composer 2 into the first category.&lt;/p&gt;

&lt;p&gt;On individual plans, Composer usage is part of a standalone usage pool with generous usage included.&lt;/p&gt;




&lt;h2&gt;
  
  
  Composer 2 vs Earlier Composer Versions
&lt;/h2&gt;

&lt;p&gt;Based on Cursor’s published table, Composer 2 is a clear step up from Composer 1.5 and Composer 1.&lt;/p&gt;

&lt;p&gt;The improvement is visible across all the benchmarks included in the post, and Cursor attributes that jump to a combination of:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a stronger base model from continued pretraining&lt;/li&gt;
&lt;li&gt;reinforcement learning on long-horizon coding tasks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is a sensible recipe for a coding model. Better base training improves general capability, while long-horizon RL helps the model stay coherent over extended multi-step tasks.&lt;/p&gt;

&lt;p&gt;From the announcement alone, Composer 2 looks like a real model upgrade rather than a minor iteration.&lt;/p&gt;




&lt;h2&gt;
  
  
  Initial Impressions
&lt;/h2&gt;

&lt;p&gt;My first impression is that this is a disciplined release.&lt;/p&gt;

&lt;p&gt;Cursor is not trying to claim that Composer 2 changes everything. The message is narrower and more believable: the model is better, it handles long-horizon coding tasks more effectively, and it is priced aggressively enough to be useful in regular workflows.&lt;/p&gt;

&lt;p&gt;The long-horizon point is the one I would pay most attention to. A lot of coding models can produce a good patch in one pass. Fewer models stay reliable across a task that unfolds over many actions. If Composer 2 is genuinely stronger there, that is a meaningful improvement.&lt;/p&gt;

&lt;p&gt;The pricing is the other major strength. A coding model can be strong on benchmarks and still be awkward in practice if the economics are wrong. Cursor seems to understand that and is making cost a central part of the launch rather than an afterthought.&lt;/p&gt;

&lt;p&gt;At the same time, this is still an announcement built around Cursor’s own evaluation framing. The benchmark gains look strong, but the real test will be whether Composer 2 feels materially better in day-to-day software work.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk1simb06d45iu9zxnhgo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk1simb06d45iu9zxnhgo.png" alt="Image2" width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;Composer 2 looks like a meaningful upgrade to Cursor’s coding model stack.&lt;/p&gt;

&lt;p&gt;The release is compelling for three reasons:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the benchmark gains are substantial&lt;/li&gt;
&lt;li&gt;the training story is technically coherent&lt;/li&gt;
&lt;li&gt;the pricing is practical&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you already use Cursor, Composer 2 is worth trying.&lt;/p&gt;

&lt;p&gt;If you evaluate coding models more broadly, this release is notable because it tries to improve both capability and economics at the same time. That is the right combination to optimize for.&lt;/p&gt;

</description>
      <category>cursor</category>
      <category>ai</category>
      <category>llm</category>
      <category>mcp</category>
    </item>
    <item>
      <title>Running LLM Applications Across Providers with Bifrost</title>
      <dc:creator>Arindam Majumder </dc:creator>
      <pubDate>Tue, 17 Mar 2026 16:15:23 +0000</pubDate>
      <link>https://forem.com/studio1hq/running-llm-applications-across-providers-with-bifrost-313h</link>
      <guid>https://forem.com/studio1hq/running-llm-applications-across-providers-with-bifrost-313h</guid>
      <description>&lt;p&gt;Many modern applications include AI features that rely on large language models accessed through APIs. When an application sends a prompt to a model and receives a response, that request usually goes through an external service.&lt;/p&gt;

&lt;p&gt;Getting access to different LLM models is easier today. Providers such as &lt;a href="https://platform.openai.com/api-keys" rel="noopener noreferrer"&gt;OpenAI&lt;/a&gt; and &lt;a href="https://platform.claude.com/" rel="noopener noreferrer"&gt;Anthropic&lt;/a&gt; provide model APIs, and platforms like &lt;a href="https://aws.amazon.com/bedrock/" rel="noopener noreferrer"&gt;Amazon Bedrock&lt;/a&gt; and &lt;a href="https://cloud.google.com/vertex-ai" rel="noopener noreferrer"&gt;Google Vertex&lt;/a&gt; AI give access to several models from one place. Because of this, many applications connect to more than one provider to compare models, manage cost, or keep a backup option if one service fails.&lt;/p&gt;

&lt;p&gt;But each provider works a little differently. Authentication methods, rate limits, and request formats are not the same. Managing these differences inside an application can slowly add complexity to the system. In this article, let us explore Bifrost, an open-source LLM gateway that provides a single layer to route requests and manage interactions with multiple model providers.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Hidden Cost of Provider Integrations
&lt;/h2&gt;

&lt;p&gt;Connecting to several LLM providers may look simple at the start. Adding another provider can feel like just integrating one more API.&lt;/p&gt;

&lt;p&gt;That situation changes once the application runs in production. Requests may need to go to different models based on cost, response quality, or latency. If a provider slows down or becomes unavailable, the system must redirect requests to another provider and keep the service running.&lt;/p&gt;

&lt;p&gt;Handling these situations introduces additional logic into the codebase. The application needs to manage how requests are routed between models. It must also include retry logic for failed calls, fallback providers during outages, and tracking for how requests are distributed across models.&lt;/p&gt;

&lt;p&gt;Each of these responsibilities adds extra work to the system. Over time, operational logic becomes part of the application and increases maintenance effort. This overhead becomes the hidden cost of working directly with multiple model providers.&lt;/p&gt;

&lt;h2&gt;
  
  
  Introducing Bifrost: A Gateway for LLM Infrastructure
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://docs.getbifrost.ai/overview" rel="noopener noreferrer"&gt;Bifrost&lt;/a&gt; is an &lt;a href="https://github.com/maximhq/bifrost" rel="noopener noreferrer"&gt;open-source&lt;/a&gt; LLM and MCP gateway designed to manage interactions between applications and model providers. It sits between the application and the LLM services and acts as a central layer that controls how requests move between systems.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvsyseg3iy2fg1v6h6yhe.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvsyseg3iy2fg1v6h6yhe.png" alt="Image1" width="800" height="370"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Applications often connect directly to each provider they use. Bifrost adds a gateway layer between the application and the providers, so requests pass through a single entry point before reaching the model services.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffygdaoyre598cw4i7cdw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffygdaoyre598cw4i7cdw.png" alt="Image2" width="800" height="370"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This structure separates provider management from the application. The application sends requests to one endpoint, and the gateway manages communication with different model providers. Provider configuration and request handling stay inside the gateway layer, reducing provider-specific logic in the application code.&lt;/p&gt;

&lt;h2&gt;
  
  
  Core Infrastructure Capabilities
&lt;/h2&gt;

&lt;p&gt;Bifrost provides several infrastructure capabilities for managing LLM interactions across providers. These capabilities move provider-specific handling out of the application and into the gateway layer.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Multi-provider routing:&lt;/strong&gt; Bifrost supports multiple AI providers through a single API interface. Applications send requests to one endpoint, and the gateway routes each request to the configured provider or model.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Load balancing:&lt;/strong&gt; When multiple providers or API keys are configured, Bifrost distributes requests across them based on defined rules. Traffic spreads across providers and reduces the chance of hitting rate limits on a single service.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Automatic fallback:&lt;/strong&gt; When a provider returns an error or becomes unavailable, Bifrost sends the request to another configured provider.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Semantic caching:&lt;/strong&gt; Bifrost stores responses and returns them for similar prompts. Prompt comparison uses semantic similarity. This reduces repeated API calls and improves response time.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Platform Support and Integrations
&lt;/h2&gt;

&lt;p&gt;Bifrost fits environments where applications use multiple models and providers. The gateway exposes an OpenAI-compatible API, so applications that already use OpenAI SDKs can connect with minimal changes and send requests through a single endpoint.&lt;/p&gt;

&lt;p&gt;Bifrost works with several &lt;a href="https://docs.getbifrost.ai/providers/supported-providers/overview" rel="noopener noreferrer"&gt;LLM providers&lt;/a&gt;, such as OpenAI, Anthropic, Amazon Bedrock, Google Vertex AI, Cohere, and Mistral. Applications can reach these providers through the same gateway interface.&lt;/p&gt;

&lt;p&gt;The gateway also supports the &lt;a href="https://docs.getbifrost.ai/mcp/overview" rel="noopener noreferrer"&gt;Model Context Protocol (MCP)&lt;/a&gt;. Systems that use MCP can connect tools and external services through the same layer used for model requests. Bifrost also includes a &lt;a href="https://docs.getbifrost.ai/plugins/getting-started" rel="noopener noreferrer"&gt;plugin system&lt;/a&gt; for adding custom behavior such as request validation, logging, or request transformation.&lt;/p&gt;

&lt;p&gt;Bifrost can run using tools such as NPX or Docker and can operate in local setups or production environments. The project is open source under the MIT license and can run across different infrastructure environments.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Gateway Performance and Benchmark&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;A gateway processes every request sent to a model provider. The performance of this layer becomes important in systems that handle a large number of AI requests.&lt;/p&gt;

&lt;p&gt;Bifrost is written in Go, a language often used for backend services that process many requests simultaneously. The system focuses on keeping the extra processing time very small.&lt;/p&gt;

&lt;p&gt;Benchmark tests show that Bifrost adds about 11 microseconds of latency at 5,000 requests per second. One microsecond equals 0.001 milliseconds, so 11 microseconds equals 0.011 milliseconds, which means the delay introduced by the gateway remains extremely small.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://docs.getbifrost.ai/benchmarking/getting-started" rel="noopener noreferrer"&gt;published benchmarks&lt;/a&gt; were executed on AWS EC2 t3.medium and t3.large instances. These are cloud virtual machines with moderate CPU and memory resources that are commonly used to run backend services and APIs.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnqud1pe1ewno7lns871w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnqud1pe1ewno7lns871w.png" alt="Image3" width="800" height="267"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Bifrost also provides a &lt;a href="https://github.com/maximhq/bifrost-benchmarking" rel="noopener noreferrer"&gt;public benchmarking repository&lt;/a&gt; with the scripts and setup used in the tests. Anyone can run the same tests or perform custom benchmarking based on their own infrastructure, traffic patterns, or model providers.&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting Started with Bifrost
&lt;/h2&gt;

&lt;p&gt;Bifrost is designed for quick setup and can run locally or in a server environment. The gateway can start in a few steps and begin routing LLM requests through a single endpoint.&lt;/p&gt;

&lt;p&gt;One way to start Bifrost is by using &lt;strong&gt;NPX&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx &lt;span class="nt"&gt;-y&lt;/span&gt; @maximhq/bifrost
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Bifrost can also run using &lt;strong&gt;Docker&lt;/strong&gt;, which allows the gateway to start inside a container environment:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker run &lt;span class="nt"&gt;-p&lt;/span&gt; 8080:8080 maximhq/bifrost
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After the gateway starts, applications can send LLM requests to the Bifrost endpoint. The gateway then routes the requests to the configured model providers.&lt;/p&gt;

&lt;p&gt;Configuration options allow the gateway to define providers, API keys, routing rules, caching behavior, and fallback settings. These configurations control how requests move between different LLM providers.&lt;/p&gt;

&lt;h2&gt;
  
  
  Closing
&lt;/h2&gt;

&lt;p&gt;Managing several LLM providers inside an application can introduce extra operational logic and maintenance effort. A gateway layer offers a cleaner structure for handling these interactions.&lt;/p&gt;

&lt;p&gt;Bifrost provides this layer by placing a gateway between applications and model providers. Requests go through one endpoint, and the gateway manages routing and provider communication.&lt;/p&gt;

&lt;p&gt;This approach keeps provider integrations outside the core application code and places request management in a separate infrastructure layer.&lt;/p&gt;

&lt;p&gt;To explore configuration options, deployment steps, and additional features, &lt;a href="https://docs.getbifrost.ai/overview" rel="noopener noreferrer"&gt;refer to the official Bifrost documentation&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>proxy</category>
      <category>litellm</category>
    </item>
    <item>
      <title>5 OpenClaw Plugins That Actually Make It Production-Ready</title>
      <dc:creator>Arindam Majumder </dc:creator>
      <pubDate>Fri, 13 Mar 2026 15:19:20 +0000</pubDate>
      <link>https://forem.com/arindam_1729/5-openclaw-plugins-that-actually-make-it-production-ready-14kn</link>
      <guid>https://forem.com/arindam_1729/5-openclaw-plugins-that-actually-make-it-production-ready-14kn</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;There is a certain point every serious &lt;a href="https://openclaw.ai/" rel="noopener noreferrer"&gt;OpenClaw&lt;/a&gt; user reaches. The agent is running, the setup works, and then slowly, almost without noticing, the cracks show up. A workflow that should take seconds starts requiring three follow-up prompts. The context window fills up with things the agent should already know. The API bill at the end of the month is higher than expected, and there is no clear answer for why.&lt;/p&gt;

&lt;p&gt;Most people at this point start tweaking their skills, adjusting prompts, or switching models, but the problem is usually none of those things.&lt;/p&gt;

&lt;p&gt;OpenClaw's default configuration is designed to get you started, not to match how you actually use it. The real power that makes it suitable for daily professional use lies in the plugin layer, yet most OpenClaw users have never explored it.&lt;/p&gt;

&lt;p&gt;In this post, we are covering five &lt;a href="https://docs.openclaw.ai/tools/plugin" rel="noopener noreferrer"&gt;OpenClaw Plugins&lt;/a&gt;, and each solves a different problem, each adding a layer that the default setup simply does not have. But before getting into the plugins themselves, it is worth understanding what separates a plugin from a skill.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Are OpenClaw Plugins (And Why They're Different from Skills)
&lt;/h2&gt;

&lt;p&gt;If you have spent any time in the OpenClaw community, you have probably seen both terms used interchangeably. They are not the same thing, and the distinction matters more than it seems.&lt;/p&gt;

&lt;p&gt;A skill is a markdown file, specifically a &lt;code&gt;SKILL.md&lt;/code&gt;  that gets injected into the agent's context at inference time. It shapes how the agent thinks, what tone it uses, and what steps it follows. Every time the agent runs, that file loads into the prompt. Skills are useful for behavior, but they come at a cost: they consume tokens on every single request, whether or not they are relevant to what you asked.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk7n9a9e4277gcv1qhoc0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk7n9a9e4277gcv1qhoc0.png" alt="OpenClaw skill vs plugin." width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A plugin is fundamentally different. It is a standalone executable that runs as a separate process alongside OpenClaw. Instead of loading into context, it exposes a set of tools through a defined interface that the agent can call when it actually needs them.  OpenClaw loads plugins once at startup and calls into them only when a task requires it. No tokens consumed just by existing.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to Install a Plugin
&lt;/h2&gt;

&lt;p&gt;Installing any plugin follows the same pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;openclaw plugins &lt;span class="nb"&gt;install&lt;/span&gt; &amp;lt;plugin-name&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That command downloads the plugin, registers it in your OpenClaw configuration at &lt;code&gt;~/.openclaw/openclaw.json&lt;/code&gt;, and makes its tools available the next time the agent starts. You can open that file at any time to see which plugins are currently registered and adjust their individual configurations.&lt;/p&gt;

&lt;p&gt;To confirm a plugin is active after installation, run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;openclaw plugins list
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This returns all registered plugins and their current status. If something is not showing up, a full restart of the OpenClaw daemon is usually all it takes.&lt;/p&gt;

&lt;p&gt;With that covered, here are the five plugins worth adding to your setup.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. &lt;a href="https://manifest.build/docs/install" rel="noopener noreferrer"&gt;Manifest&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;When you configure OpenClaw, you pick a default model. Claude Opus, GPT-4, whatever you prefer. From that point on, every request, regardless of its type, goes to that model. Asking the agent to list files in a directory costs the same as asking it to debug a race condition across three services. The model does not know the difference, and OpenClaw does not try to make one.&lt;/p&gt;

&lt;p&gt;This is where most API bills quietly spiral. Not from one expensive task, but from hundreds of simple ones hitting a premium model they never needed.&lt;/p&gt;

&lt;p&gt;Manifest sits between OpenClaw and your LLM providers. Every request passes through it before reaching a model. It reads the request, classifies the task complexity, and routes it to the cheapest model capable of handling it. Simple lookups go to lighter models. Reasoning-heavy tasks escalate to whatever model can actually handle them. Routing occurs in milliseconds and is invisible to the agent; it only sees a response.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgcg691g1sqc289fsfcz0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgcg691g1sqc289fsfcz0.png" alt="Manifest plugin routing OpenClaw" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The cost difference compounds fast. Users running OpenClaw through Manifest have reported up to 70% reduction in monthly API spend, not by doing less, but by stopping the habit of paying Opus prices for Haiku-level work. The Manifest dashboard makes this visible: you can see cost broken down per session, per tool call, and per model, so you know exactly where your spend is going and whether the routing decisions are working as expected.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn6cf2c6c4amj9953a552.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn6cf2c6c4amj9953a552.png" alt="Manifest Dashboard" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Installing Manifest:&lt;/strong&gt;
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;openclaw plugins &lt;span class="nb"&gt;install &lt;/span&gt;manifest
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once installed, Manifest registers itself as the default routing layer. You can configure routing thresholds and model preferences in &lt;code&gt;~/.openclaw/openclaw.json&lt;/code&gt; under the &lt;code&gt;manifest&lt;/code&gt; plugin entry.&lt;/p&gt;

&lt;p&gt;Manifest makes the biggest difference in setups where the agent runs long sessions, handles multi-step tasks, or operates overnight without supervision. The more requests flow through OpenClaw, the more the routing logic saves, because the inefficiency it fixes is not a one-time cost; it is per-request.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. &lt;a href="https://composio.dev/toolkits/composio/framework/openclaw" rel="noopener noreferrer"&gt;Composio&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;Out of the box, OpenClaw cannot reach your Gmail, Slack, GitHub, or Notion. Not because the agent is incapable, but because every external service requires OAuth authentication, token management, and refresh handling, none of which OpenClaw sets up for you. Most people work around this by manually generating API keys, pasting them into configuration files, and hoping the tokens don't expire mid-session. It works until it does not.&lt;/p&gt;

&lt;p&gt;Composio solves this at the authentication layer. It runs as an MCP server that sits between OpenClaw and every external app you want the agent to reach. You connect your accounts once through the Composio dashboard, and from that point on, OpenClaw talks to Composio, and it handles everything else. Token refresh, OAuth flows, rate limits, API versioning. None of that touches your OpenClaw config directly.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fumz76k9hvc68dhv6ii1w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fumz76k9hvc68dhv6ii1w.png" alt="Composio MCP Server connecting OpenClaw" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Each app connection runs in an isolated MCP session. If one integration fails or a token expires, it does not affect the others. The agent continues operating normally while Composio handles the reconnection in the background.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Installing Composio:&lt;/strong&gt;
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;openclaw plugins &lt;span class="nb"&gt;install &lt;/span&gt;composio
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After installation, connect your apps through the Composio dashboard and add the plugin entry to &lt;code&gt;~/.openclaw/openclaw.json&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"plugins"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"composio"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"apiKey"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"your-composio-api-key"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In practice, what this unlocks is straightforward. A single prompt like &lt;em&gt;"summarize my unread emails, open a GitHub issue for anything that needs follow-up, and post a summary to the team Slack channel"&lt;/em&gt; now executes end to end, no switching tabs, no copying API keys, no manual auth setup. The agent has the required access, and Composio ensures it remains valid.&lt;/p&gt;

&lt;p&gt;With 850+ supported apps, Composio covers most of what a professional OpenClaw setup would ever need to reach.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. &lt;a href="https://github.com/hyperspell/hyperspell-openclaw" rel="noopener noreferrer"&gt;Hyperspell&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;OpenClaw's default memory is a &lt;code&gt;MEMORY.md&lt;/code&gt; file. It grows with every session, gets compacted when it reaches a limit, loses information in the process, and reloads entirely on every turn, whether the content is relevant or not. For occasional use, this is fine, but for anyone relying on OpenClaw daily, it becomes a real problem fast.&lt;/p&gt;

&lt;p&gt;Hyperspell replaces this layer entirely. It indexes your connected data sources, emails, documents, and past conversations into a knowledge graph, then injects only the relevant slice of that graph before each agent turn. The agent gets what it needs, not everything it has ever seen.&lt;/p&gt;

&lt;p&gt;Memory also becomes sharper over time. Every query refines how the knowledge graph is indexed, so context recall improves the more you use it. An agent running Hyperspell can reference a decision you made three weeks ago without you having to bring it up.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Installing Hyperspell:&lt;/strong&gt;
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;openclaw&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;plugins&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;install&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;@hyperspell/openclaw-hyperspell&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Connect your data sources through the Hyperspell dashboard, then add your API key under the &lt;code&gt;hyperspell&lt;/code&gt; entry in &lt;code&gt;~/.openclaw/openclaw.json&lt;/code&gt;. Context injection is automatic from there.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. &lt;a href="https://github.com/lekt9/openclaw-foundry" rel="noopener noreferrer"&gt;OpenClaw Foundry&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;Most workflows repeat. You run the same sequence of tasks every morning, follow the same steps every time a PR needs review, and ask the agent the same three things before a meeting. OpenClaw handles all of these, but it handles them the same way every time, waiting for you to prompt it from scratch. It does not recognize the pattern. It does not try to make things easier on its own.&lt;/p&gt;

&lt;p&gt;Foundry fixes this. It sits in the background during your sessions, watches what you ask for, and, when it detects a recurring pattern, writes a new tool definition into itself. That tool becomes part of the agent's available toolkit the next time you start a session, no manual configuration required.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn2tv42jbkdfqbbntrf78.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn2tv42jbkdfqbbntrf78.png" alt="OpenClaw Foundry plugin" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;What makes this different from writing a skill is the output. A skill adds behavioral instructions to the agent's context. Foundry creates an executable tool that the agent can call, with its own inputs and outputs, registered in the tool registry and available on demand.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Installing Foundry:&lt;/strong&gt;
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;openclaw plugins &lt;span class="nb"&gt;install&lt;/span&gt; @getfoundry/foundry-openclaw
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This downloads the plugin from npm, extracts it to &lt;code&gt;~/.openclaw/extensions/foundry/&lt;/code&gt;, enables it automatically, and restarts the gateway. After that, add the following to &lt;code&gt;~/.openclaw/openclaw.json&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"plugins"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"entries"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"foundry"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"enabled"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"config"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"autoLearn"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"sources"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"docs"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"experience"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"arxiv"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"github"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"marketplace"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"autoPublish"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;autoLearn: true&lt;/code&gt; is the key setting, it tells Foundry to continuously learn from your sessions without requiring you to trigger it manually. The &lt;code&gt;sources&lt;/code&gt; block controls where Foundry pulls additional context when writing new tools: OpenClaw's own documentation, your past session experience, arXiv papers, and public GitHub repos. For most setups, keeping &lt;code&gt;docs&lt;/code&gt; and &lt;code&gt;experience&lt;/code&gt; enabled is enough to start.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. &lt;a href="https://github.com/comet-ml/opik-openclaw" rel="noopener noreferrer"&gt;Opik&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;Multi-step agent runs fail in non-obvious ways. A tool call returns incorrect output, a sub-agent silently errors out, and a model call takes 12 seconds on a task that should take 2 seconds. Without structured tracing, you are left reading raw logs and guessing. That gets old fast.&lt;/p&gt;

&lt;p&gt;Opik is an open-source LLM and agent observability platform built by Comet ML. The OpenClaw plugin hooks into the gateway process and exports a structured trace for every run, LLM request and response spans, tool call inputs and outputs, sub-agent lifecycle events, latency at each step, and token usage with cost. Every event that matters has a corresponding span in the Opik dashboard.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5tih007yxok3phfi4enr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5tih007yxok3phfi4enr.png" alt="Opik Dashboard" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Image reference: &lt;a href="https://github.com/comet-ml/opik-openclaw" rel="noopener noreferrer"&gt;https://github.com/comet-ml/opik-openclaw&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is a different layer from what Manifest covers. Manifest tells you how much a request costs and which model handled it. Opik tells you what the agent actually did inside that request, which tools it called, in what order, what each one returned, and where the run slowed down or failed. Both answer different questions and neither replaces the other.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Installing Opik:&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Requirements: OpenClaw &lt;code&gt;&amp;gt;=2026.3.2&lt;/code&gt;, Node.js &lt;code&gt;&amp;gt;=22.12.0&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;openclaw plugins &lt;span class="nb"&gt;install&lt;/span&gt; @opik/opik-openclaw
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After installation, restart the gateway, then run the setup wizard:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;openclaw opik configure
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This validates your endpoint and API key and automatically writes the config. To verify everything is connected:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;openclaw opik status
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The recommended config in &lt;code&gt;~/.openclaw/openclaw.json&lt;/code&gt; looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="o"&gt;{&lt;/span&gt;
  &lt;span class="s2"&gt;"plugins"&lt;/span&gt;: &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="s2"&gt;"entries"&lt;/span&gt;: &lt;span class="o"&gt;{&lt;/span&gt;
      &lt;span class="s2"&gt;"opik-openclaw"&lt;/span&gt;: &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="s2"&gt;"enabled"&lt;/span&gt;: &lt;span class="nb"&gt;true&lt;/span&gt;,
        &lt;span class="s2"&gt;"config"&lt;/span&gt;: &lt;span class="o"&gt;{&lt;/span&gt;
          &lt;span class="s2"&gt;"apiKey"&lt;/span&gt;: &lt;span class="s2"&gt;"your-api-key"&lt;/span&gt;,
          &lt;span class="s2"&gt;"apiUrl"&lt;/span&gt;: &lt;span class="s2"&gt;"https://www.comet.com/opik/api"&lt;/span&gt;,
          &lt;span class="s2"&gt;"projectName"&lt;/span&gt;: &lt;span class="s2"&gt;"openclaw"&lt;/span&gt;,
          &lt;span class="s2"&gt;"workspaceName"&lt;/span&gt;: &lt;span class="s2"&gt;"default"&lt;/span&gt;
        &lt;span class="o"&gt;}&lt;/span&gt;
      &lt;span class="o"&gt;}&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
  &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For teams that cannot send trace data to a third party, Opik is fully self-hostable. Replace &lt;code&gt;apiUrl&lt;/code&gt; with your own instance endpoint, and nothing else changes.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Where to Go From Here&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Each plugin owns a distinct layer. Hyperspell handles context before the request starts. Manifest handles model routing during it. Composio handles external reach when the agent needs to act. Foundry watches for patterns across sessions and builds tools from them. Opik traces everything after the fact, so you know exactly what happened and why.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F164mtxkcmhpwjyfb6hjr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F164mtxkcmhpwjyfb6hjr.png" alt="openclaw plugins" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;None of them overlap, and none of them are doing the same job twice. You can start with just one, whichever layer is causing the most friction in your current setup, and layer in the rest as your workflow grows.&lt;/p&gt;

&lt;p&gt;Each plugin has its own documentation to read before configuring anything: &lt;a href="https://manifest.build/docs" rel="noopener noreferrer"&gt;&lt;strong&gt;Manifest&lt;/strong&gt;&lt;/a&gt;, &lt;a href="https://composio.dev/" rel="noopener noreferrer"&gt;&lt;strong&gt;Composio&lt;/strong&gt;&lt;/a&gt;, &lt;a href="https://github.com/hyperspell/hyperspell-openclaw" rel="noopener noreferrer"&gt;&lt;strong&gt;Hyperspell&lt;/strong&gt;&lt;/a&gt;, &lt;a href="https://github.com/lekt9/openclaw-foundry" rel="noopener noreferrer"&gt;&lt;strong&gt;OpenClaw Foundry&lt;/strong&gt;&lt;/a&gt;, and &lt;a href="https://www.comet.com/docs/opik" rel="noopener noreferrer"&gt;&lt;strong&gt;Opik&lt;/strong&gt;&lt;/a&gt;. The &lt;a href="https://docs.openclaw.ai/tools/plugin" rel="noopener noreferrer"&gt;&lt;strong&gt;OpenClaw plugin docs&lt;/strong&gt;&lt;/a&gt; cover the installation system in full if you want to go deeper on how plugins interact with the gateway.&lt;/p&gt;

</description>
      <category>openclaw</category>
      <category>ai</category>
      <category>programming</category>
      <category>skills</category>
    </item>
    <item>
      <title>[Boost]</title>
      <dc:creator>Arindam Majumder </dc:creator>
      <pubDate>Sat, 07 Mar 2026 08:35:19 +0000</pubDate>
      <link>https://forem.com/arindam_1729/-554c</link>
      <guid>https://forem.com/arindam_1729/-554c</guid>
      <description>&lt;div class="ltag__link--embedded"&gt;
  &lt;div class="crayons-story "&gt;
  &lt;a href="https://dev.to/arindam_1729/what-is-llm-observability-the-complete-guide-2026-26e6" class="crayons-story__hidden-navigation-link"&gt;What is LLM Observability? The Complete Guide (2026)&lt;/a&gt;


  &lt;div class="crayons-story__body crayons-story__body-full_post"&gt;
    &lt;div class="crayons-story__top"&gt;
      &lt;div class="crayons-story__meta"&gt;
        &lt;div class="crayons-story__author-pic"&gt;

          &lt;a href="/arindam_1729" class="crayons-avatar  crayons-avatar--l  "&gt;
            &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F965723%2F8c3a1bb4-eb47-4302-a280-09eedb8bc785.png" alt="arindam_1729 profile" class="crayons-avatar__image" width="800" height="678"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
        &lt;div&gt;
          &lt;div&gt;
            &lt;a href="/arindam_1729" class="crayons-story__secondary fw-medium m:hidden"&gt;
              Arindam Majumder 
            &lt;/a&gt;
            &lt;div class="profile-preview-card relative mb-4 s:mb-0 fw-medium hidden m:inline-block"&gt;
              
                Arindam Majumder 
                &lt;a href="/++"&gt;&lt;img alt="Subscriber" class="subscription-icon" src="https://assets.dev.to/assets/subscription-icon-805dfa7ac7dd660f07ed8d654877270825b07a92a03841aa99a1093bd00431b2.png" width="166" height="102"&gt;&lt;/a&gt;
              
              &lt;div id="story-author-preview-content-3296623" class="profile-preview-card__content crayons-dropdown branded-7 p-4 pt-0"&gt;
                &lt;div class="gap-4 grid"&gt;
                  &lt;div class="-mt-4"&gt;
                    &lt;a href="/arindam_1729" class="flex"&gt;
                      &lt;span class="crayons-avatar crayons-avatar--xl mr-2 shrink-0"&gt;
                        &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F965723%2F8c3a1bb4-eb47-4302-a280-09eedb8bc785.png" class="crayons-avatar__image" alt="" width="800" height="678"&gt;
                      &lt;/span&gt;
                      &lt;span class="crayons-link crayons-subtitle-2 mt-5"&gt;Arindam Majumder &lt;/span&gt;
                    &lt;/a&gt;
                  &lt;/div&gt;
                  &lt;div class="print-hidden"&gt;
                    
                      Follow
                    
                  &lt;/div&gt;
                  &lt;div class="author-preview-metadata-container"&gt;&lt;/div&gt;
                &lt;/div&gt;
              &lt;/div&gt;
            &lt;/div&gt;

          &lt;/div&gt;
          &lt;a href="https://dev.to/arindam_1729/what-is-llm-observability-the-complete-guide-2026-26e6" class="crayons-story__tertiary fs-xs"&gt;&lt;time&gt;Mar 6&lt;/time&gt;&lt;span class="time-ago-indicator-initial-placeholder"&gt;&lt;/span&gt;&lt;/a&gt;
        &lt;/div&gt;
      &lt;/div&gt;

    &lt;/div&gt;

    &lt;div class="crayons-story__indention"&gt;
      &lt;h2 class="crayons-story__title crayons-story__title-full_post"&gt;
        &lt;a href="https://dev.to/arindam_1729/what-is-llm-observability-the-complete-guide-2026-26e6" id="article-link-3296623"&gt;
          What is LLM Observability? The Complete Guide (2026)
        &lt;/a&gt;
      &lt;/h2&gt;
        &lt;div class="crayons-story__tags"&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/ai"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;ai&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/llm"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;llm&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/observability"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;observability&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/programming"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;programming&lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="crayons-story__bottom"&gt;
        &lt;div class="crayons-story__details"&gt;
          &lt;a href="https://dev.to/arindam_1729/what-is-llm-observability-the-complete-guide-2026-26e6" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left"&gt;
            &lt;div class="multiple_reactions_aggregate"&gt;
              &lt;span class="multiple_reactions_icons_container"&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/exploding-head-daceb38d627e6ae9b730f36a1e390fca556a4289d5a41abb2c35068ad3e2c4b5.svg" width="24" height="24"&gt;
                  &lt;/span&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/multi-unicorn-b44d6f8c23cdd00964192bedc38af3e82463978aa611b4365bd33a0f1f4f3e97.svg" width="24" height="24"&gt;
                  &lt;/span&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/sparkle-heart-5f9bee3767e18deb1bb725290cb151c25234768a0e9a2bd39370c382d02920cf.svg" width="24" height="24"&gt;
                  &lt;/span&gt;
              &lt;/span&gt;
              &lt;span class="aggregate_reactions_counter"&gt;12&lt;span class="hidden s:inline"&gt; reactions&lt;/span&gt;&lt;/span&gt;
            &lt;/div&gt;
          &lt;/a&gt;
            &lt;a href="https://dev.to/arindam_1729/what-is-llm-observability-the-complete-guide-2026-26e6#comments" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left flex items-center"&gt;
              Comments


              2&lt;span class="hidden s:inline"&gt; comments&lt;/span&gt;
            &lt;/a&gt;
        &lt;/div&gt;
        &lt;div class="crayons-story__save"&gt;
          &lt;small class="crayons-story__tertiary fs-xs mr-2"&gt;
            15 min read
          &lt;/small&gt;
            
              &lt;span class="bm-initial"&gt;
                

              &lt;/span&gt;
              &lt;span class="bm-success"&gt;
                

              &lt;/span&gt;
            
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;


</description>
      <category>ai</category>
      <category>llm</category>
      <category>observability</category>
      <category>programming</category>
    </item>
  </channel>
</rss>
