<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: LaunchDarkly</title>
    <description>The latest articles on Forem by LaunchDarkly (@launchdarkly).</description>
    <link>https://forem.com/launchdarkly</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Forganization%2Fprofile_image%2F1962%2Fb47f7541-51dd-4d82-a702-00f4ab11d7bc.png</url>
      <title>Forem: LaunchDarkly</title>
      <link>https://forem.com/launchdarkly</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/launchdarkly"/>
    <language>en</language>
    <item>
      <title>Offline Evaluation of RAG-Grounded Answers in LaunchDarkly AI Configs</title>
      <dc:creator>Scarlett Attensil</dc:creator>
      <pubDate>Thu, 16 Apr 2026 21:22:50 +0000</pubDate>
      <link>https://forem.com/launchdarkly/offline-evaluation-of-rag-grounded-answers-in-launchdarkly-ai-configs-1i5j</link>
      <guid>https://forem.com/launchdarkly/offline-evaluation-of-rag-grounded-answers-in-launchdarkly-ai-configs-1i5j</guid>
      <description>&lt;h2&gt;
  
  
  Overview
&lt;/h2&gt;

&lt;p&gt;This tutorial shows you how to run an &lt;strong&gt;offline LLM evaluation&lt;/strong&gt; on the RAG-grounded support agent you built in the &lt;a href="https://launchdarkly.com/docs/tutorials/agent-graphs" rel="noopener noreferrer"&gt;Agent Graphs tutorial&lt;/a&gt;, using LaunchDarkly &lt;a href="https://launchdarkly.com/docs/home/ai-configs" rel="noopener noreferrer"&gt;AI Configs&lt;/a&gt;, the &lt;a href="https://launchdarkly.com/docs/home/ai-configs/datasets" rel="noopener noreferrer"&gt;Datasets feature&lt;/a&gt;, and built-in &lt;a href="https://launchdarkly.com/docs/home/ai-configs/offline-evaluations" rel="noopener noreferrer"&gt;LLM-as-a-judge&lt;/a&gt; scoring. You'll build a RAG-grounded test dataset, run it through the Playground with a cross-family judge, and learn how to read each failing row as a dataset issue, an agent issue, or judge calibration noise.&lt;/p&gt;

&lt;p&gt;Here's how it works. The LaunchDarkly Playground evaluates a single model call against a prompt and dataset you configure. By pre-computing your RAG retrieval offline and baking the chunks directly into each dataset row, you turn that call into a high-value generation test: the model in the Playground receives the same documentation context it would in production, so the eval measures how well your agent reasons over real grounded input.&lt;/p&gt;

&lt;h2&gt;
  
  
  What You'll Learn
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Structure a RAG-grounded test dataset&lt;/strong&gt; by pre-computing retrieval offline and bundling chunks into each row&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pick the right LLM judge&lt;/strong&gt; for your agent's output shape (Accuracy for natural-language answers, Likeness for structured labels)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Avoid same-model bias&lt;/strong&gt; by running the judge on a different model family than the agent&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Diagnose failing rows&lt;/strong&gt; as dataset issues, agent issues, or judge calibration noise&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;What this tutorial covers, and what it doesn't&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Covers:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Generation quality over RAG context: does the model produce a correct answer when the right documentation is in the prompt?&lt;/li&gt;
&lt;li&gt;Regression detection: catching unexpected score drops when you change a prompt or model&lt;/li&gt;
&lt;li&gt;Variation selection: comparing candidate prompts and models before committing to a new AI Config variation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Does not cover:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Retrieval correctness. Whether your vector store is returning the best chunks is tested by your own RAG pipeline, outside LaunchDarkly.&lt;/li&gt;
&lt;li&gt;End-to-end agent graph behavior. Tool execution, multi-turn conversations, handoffs, and multi-step routing require online evals against real production traffic.&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;You've completed the &lt;a href="https://launchdarkly.com/docs/tutorials/agent-graphs" rel="noopener noreferrer"&gt;Agent Graphs tutorial&lt;/a&gt; or have equivalent familiarity with LaunchDarkly &lt;a href="https://launchdarkly.com/docs/home/ai-configs" rel="noopener noreferrer"&gt;AI Configs&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;You have the &lt;a href="https://github.com/launchdarkly-labs/devrel-agents-tutorial" rel="noopener noreferrer"&gt;devrel-agents-tutorial repo&lt;/a&gt; cloned&lt;/li&gt;
&lt;li&gt;You have API keys for &lt;strong&gt;two&lt;/strong&gt; model providers, one for the agent under test and one for the judge (the examples use OpenAI and Anthropic)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Step 1: Get the Branch Running
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;About the branch and the Umbra knowledge base.&lt;/strong&gt; The &lt;code&gt;feature/offline-evals&lt;/code&gt; branch builds on the same &lt;a href="https://launchdarkly.com/docs/tutorials/agent-graphs" rel="noopener noreferrer"&gt;Agent Graphs tutorial&lt;/a&gt; codebase and the routing, tool, and graph work done in earlier branches — none of that goes away. What this branch adds is a more realistic RAG assessment target: &lt;strong&gt;Umbra&lt;/strong&gt;, a fictional serverless-functions product with an invented knowledge base (refund windows, deployment regions, function timeout limits, rate-limit tiers, and so on). Because Umbra doesn't exist outside this tutorial, the model under test has no pre-training knowledge to fall back on — a correct answer has to come from the retrieved chunks, which is the only way to honestly measure whether your RAG pipeline is doing its job. The branch also ships a pre-built RAG-grounded test dataset (&lt;code&gt;datasets/answer-tests.csv&lt;/code&gt;) and a helper script that regenerates it from your vector store.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;devrel-agents-tutorial
git checkout feature/offline-evals
&lt;span class="nb"&gt;cp&lt;/span&gt; .env.example .env
&lt;span class="c"&gt;# Add LD_SDK_KEY, LD_API_KEY, OPENAI_API_KEY, ANTHROPIC_API_KEY to .env&lt;/span&gt;

uv &lt;span class="nb"&gt;sync
&lt;/span&gt;uv run python bootstrap/create_configs.py
uv run python initialize_embeddings.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Start the API and UI in two terminals:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Terminal 1&lt;/span&gt;
uv run uvicorn api.main:app &lt;span class="nt"&gt;--reload&lt;/span&gt;

&lt;span class="c"&gt;# Terminal 2&lt;/span&gt;
uv run streamlit run ui/chat_interface.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Open &lt;code&gt;http://localhost:8501&lt;/code&gt; and ask a question grounded in the Umbra docs (refund policy, deployment regions, function timeout). The agent pulls answers from the knowledge base.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmn47r7c07lw9jd4024tj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmn47r7c07lw9jd4024tj.png" alt="The Umbra support chat UI answering a question grounded in the Umbra knowledge base." width="800" height="408"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2: Understand the Test Dataset
&lt;/h2&gt;

&lt;p&gt;Open &lt;code&gt;datasets/answer-tests.csv&lt;/code&gt;. Every row has three fields:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;input,expected_output,original_question
"Documentation context: --- We offer a 30-day refund policy for first-time subscribers... --- Annual subscriptions receive a prorated refund within... --- Question: What is the refund policy?","30-day refund policy for first-time subscribers who haven't deployed production traffic. Usage charges are non-refundable.","What is the refund policy?"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;input&lt;/code&gt;&lt;/strong&gt; bundles documentation chunks and the question into a single structured prompt, separated by &lt;code&gt;---&lt;/code&gt; dividers. The chunks were retrieved from your production vector store ahead of time by &lt;code&gt;tools/build_rag_dataset.py&lt;/code&gt;, so the model in the Playground sees the same grounding the production agent would, even though the Playground never executes your retrieval tools.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;expected_output&lt;/code&gt;&lt;/strong&gt; is the correct answer, written by a human who read the source docs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;original_question&lt;/code&gt;&lt;/strong&gt; is a plain-text copy of the question so you can scan the dataset without parsing the bundled prompt. No judge uses this field.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Regenerate the dataset when your knowledge base changes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;uv run python tools/build_rag_dataset.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For the full reference on dataset format and limits, see &lt;a href="https://launchdarkly.com/docs/home/ai-configs/datasets" rel="noopener noreferrer"&gt;Datasets for offline evaluations&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 3: Upload the Dataset
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Use synthetic data only&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Never upload real customer tickets, PII, secrets, or credentials. Replace anything sensitive with synthetic placeholders before upload. See the Playground &lt;a href="https://launchdarkly.com/docs/home/ai-configs/playground#privacy" rel="noopener noreferrer"&gt;privacy section&lt;/a&gt; for what gets forwarded to model providers.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Navigate to &lt;strong&gt;AI&lt;/strong&gt; &amp;gt; &lt;strong&gt;Library&lt;/strong&gt; in LaunchDarkly, select the &lt;strong&gt;Datasets&lt;/strong&gt; tab, and click &lt;strong&gt;Upload dataset&lt;/strong&gt;. Upload &lt;code&gt;datasets/answer-tests.csv&lt;/code&gt; and name it &lt;code&gt;answer-tests&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcckfkrgk57rzz1tt4xlv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcckfkrgk57rzz1tt4xlv.png" alt="The LaunchDarkly Datasets tab showing the answer-tests dataset uploaded." width="800" height="384"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 4: Add Your Model API Keys
&lt;/h2&gt;

&lt;p&gt;The Playground calls model providers directly, so it needs API keys for both the model running your agent &lt;em&gt;and&lt;/em&gt; the model running your judge. These keys live in LaunchDarkly's "AI Config Test Run" integration, not in your AI Config.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;In the Playground, click &lt;strong&gt;Manage API keys&lt;/strong&gt; in the upper-right corner.&lt;/li&gt;
&lt;li&gt;Click &lt;strong&gt;Add integration&lt;/strong&gt;, pick a provider (e.g. OpenAI), paste your API key, accept the terms, and save.&lt;/li&gt;
&lt;li&gt;Repeat for the second provider (Anthropic) so you can run a cross-family judge in Step 5.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;See the &lt;a href="https://launchdarkly.com/docs/home/ai-configs/playground#manage-api-keys" rel="noopener noreferrer"&gt;Playground reference doc&lt;/a&gt; for the canonical instructions. API keys are stored per-session, so you may need to re-paste them when you return.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 5: Run the Evaluation
&lt;/h2&gt;

&lt;p&gt;From the Datasets list, click into &lt;strong&gt;answer-tests&lt;/strong&gt; to open it in a Playground bound to that dataset.&lt;/p&gt;

&lt;h3&gt;
  
  
  Configure the test
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;System prompt&lt;/strong&gt;: paste your &lt;code&gt;support-agent&lt;/code&gt; instructions verbatim from the AI Config. Do not edit or simplify them.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agent model&lt;/strong&gt;: pick the model your support-agent variation uses (or a candidate you're considering swapping to). To compare two candidates, run the eval twice with different agent models and compare scores.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Acceptance criteria&lt;/strong&gt;: attach an &lt;strong&gt;Accuracy&lt;/strong&gt; judge with threshold &lt;code&gt;0.85&lt;/code&gt;. Accuracy scores whether the response correctly addresses the input question, which fits grounded natural-language answers.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Evaluation model&lt;/strong&gt;: uncheck &lt;strong&gt;Use same model for evaluation&lt;/strong&gt; and set the judge to a &lt;em&gt;different&lt;/em&gt; model family from the agent. Same-family judging tends to reward output patterns the judge itself produces. A cross-family judge gives you an independent read.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjkbh5lziq6lt0gx02w1w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjkbh5lziq6lt0gx02w1w.png" alt="The Playground configured with the support-agent prompt, OpenAI as the agent, Anthropic as the evaluation model, and an Accuracy judge at 0.85 threshold." width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Run the eval.&lt;/p&gt;

&lt;h3&gt;
  
  
  Reading the results
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyj5v00yxa526lllyneeb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyj5v00yxa526lllyneeb.png" alt="The Playground configured with the support-agent prompt, OpenAI as the agent, Anthropic as the evaluation model, and an Accuracy judge at 0.85 threshold." width="800" height="415"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The example run above had 18 passes and 2 failures. When a row fails, the failure comes from one of three places, and each one sends you in a different direction:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The dataset's chunks don't contain the answer.&lt;/strong&gt; This is a retrieval problem, not a generation problem. Rebuild the dataset with higher &lt;code&gt;top_k&lt;/code&gt;, a reranker, or a different chunker, or verify the answer is indexed at all.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The chunks contain the answer but the model ignored them.&lt;/strong&gt; This is the agent-side failure offline evals are designed to catch. Tighten the system prompt to insist on grounding, or switch to a more obedient model.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The chunks and the model are both fine but the judge disagreed.&lt;/strong&gt; This is judge calibration noise. Lower the threshold, try a different judge, or accept it as noise. Don't change your agent based on it.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Sort by score. For each failing row, open the bundled chunks in the &lt;code&gt;input&lt;/code&gt; field and ask: &lt;em&gt;was the right answer in there?&lt;/em&gt; Yes → fix the prompt or model. No → rebuild the dataset.&lt;/p&gt;

&lt;h3&gt;
  
  
  What failed in this run
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Row 11: "What integrations are available?"&lt;/strong&gt; (&lt;em&gt;chunks missed the answer&lt;/em&gt;). The expected output mentioned monitoring integrations (Datadog, Sentry, LogRocket), but the retrieved chunks only covered databases, storage, and billing. The model correctly listed what it had and said &lt;em&gt;"the documentation does not provide additional information regarding more integrations"&lt;/em&gt;, which is the correct behavior for an ungrounded claim. &lt;strong&gt;Fix&lt;/strong&gt;: higher &lt;code&gt;top_k&lt;/code&gt; or a reranker in &lt;code&gt;build_rag_dataset.py&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Row 12: "Can I get a refund on bandwidth overages?"&lt;/strong&gt; (&lt;em&gt;judge calibration&lt;/em&gt;). The model correctly said bandwidth overages are non-refundable, citing the docs, but omitted a secondary "Review your Usage Dashboard" recommendation from the expected output. Semantically right, lexically short one clause. &lt;strong&gt;Fix&lt;/strong&gt;: lower the threshold or trim the expected output.&lt;/p&gt;

&lt;p&gt;Two failures, two different fixes. Without reading the per-row results you'd conflate them and spend time tightening the model when the actual problem lives in the retriever or the dataset.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where to Go From a Single Run
&lt;/h2&gt;

&lt;p&gt;This tutorial walked you through one run. In practice, a single eval isn't where offline evaluation earns its keep. The real payoff comes from re-running the same dataset against a new prompt, a new model, or a fresh RAG chunker and comparing scores to your last known-good run. A small prompt edit that quietly drops your Accuracy from 0.83 to 0.71 is exactly the kind of regression this pattern is meant to catch, but only if you save the run and compare against it next time.&lt;/p&gt;

&lt;p&gt;A reasonable next loop:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Save the run from Step 5 as your reference.&lt;/li&gt;
&lt;li&gt;When you change something (prompt, model, chunker, &lt;code&gt;top_k&lt;/code&gt;), re-run the same dataset and compare scores.&lt;/li&gt;
&lt;li&gt;Add new rows to the dataset as you find failure modes in staging or production.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;For &lt;strong&gt;end-to-end behavior that offline tests can't capture&lt;/strong&gt; (tool execution, multi-turn conversations, the tail of real production inputs), see &lt;a href="https://launchdarkly.com/docs/home/ai-configs/online-evaluations" rel="noopener noreferrer"&gt;online evaluations&lt;/a&gt; and the &lt;a href="https://launchdarkly.com/docs/tutorials/when-to-add-online-evals" rel="noopener noreferrer"&gt;When to add online evals&lt;/a&gt; tutorial. Online evaluations are not currently supported for agent-based AI Configs; for agent workflows, the documented path is programmatic judge evaluation via the AI SDK.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 7: Track Evaluation History
&lt;/h2&gt;

&lt;p&gt;View saved runs at &lt;strong&gt;AI&lt;/strong&gt; &amp;gt; &lt;strong&gt;Evaluations&lt;/strong&gt;. Toggle &lt;strong&gt;Group by dataset&lt;/strong&gt; to collapse runs under each dataset name so you can see the history for &lt;code&gt;umbra-rag-eval&lt;/code&gt; alongside any other datasets in the project. Compare pass and fail counts across runs, and distinguish saved runs (indefinite retention) from one-off runs (60-day expiry). For metric definitions, see &lt;a href="https://launchdarkly.com/docs/home/ai-configs/monitor" rel="noopener noreferrer"&gt;Monitor AI Configs&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://launchdarkly.com/docs/home/releases/progressive-rollouts" rel="noopener noreferrer"&gt;Progressive rollouts&lt;/a&gt;&lt;/strong&gt;: release your winning variation to 5% of traffic, then 25%, then 100%, watching production metrics before expanding.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://launchdarkly.com/docs/tutorials/when-to-add-online-evals" rel="noopener noreferrer"&gt;When to add online evals&lt;/a&gt;&lt;/strong&gt;: decide what to score on live production traffic once you have an offline baseline.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For a deeper look at the multi-agent RAG system this tutorial builds on, see the &lt;a href="https://launchdarkly.com/docs/tutorials/agent-graphs" rel="noopener noreferrer"&gt;Agent Graphs&lt;/a&gt; tutorial.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>rag</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Building Framework-Agnostic AI Swarms: Compare LangGraph, Strands, and OpenAI Swarm</title>
      <dc:creator>Scarlett Attensil</dc:creator>
      <pubDate>Thu, 26 Mar 2026 21:05:21 +0000</pubDate>
      <link>https://forem.com/launchdarkly/building-framework-agnostic-ai-swarms-compare-langgraph-strands-and-openai-swarm-14ip</link>
      <guid>https://forem.com/launchdarkly/building-framework-agnostic-ai-swarms-compare-langgraph-strands-and-openai-swarm-14ip</guid>
      <description>&lt;p&gt;If you've ever run the same app in multiple environments, you know the pain of duplicated configuration. &lt;a href="https://www.onyxgs.com/blog/swarm-intelligence-collective-behavior-ai" rel="noopener noreferrer"&gt;Agent swarms&lt;/a&gt; have the same problem: the moment you try multiple orchestrators (LangGraph, Strands, OpenAI Swarm), your agent definitions start living in different formats. Prompts drift. Model settings drift. A "small behavior tweak" turns into archaeology across repos.&lt;/p&gt;

&lt;p&gt;AI behavior isn't code. Prompts aren't functions. They change too often, and too experimentally, to be hard-wired into orchestrator code. &lt;a href="https://launchdarkly.com/docs/home/ai-configs" rel="noopener noreferrer"&gt;LaunchDarkly AI Configs&lt;/a&gt; lets you treat agent definitions like shared configuration instead. Define them once, store them centrally, and let any orchestrator fetch them. Update a prompt or model setting in the LaunchDarkly UI, and the new version rolls out without a redeploy.&lt;/p&gt;



&lt;p&gt;Ready to build framework-agnostic AI swarms? Start your 14-day free trial of LaunchDarkly to follow along with this tutorial. No credit card required.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://launchdarkly.com/start-trial/?utm_source=docs&amp;amp;utm_medium=tutorial&amp;amp;utm_campaign=ai-orchestrators" rel="noopener noreferrer"&gt;Start free trial&lt;/a&gt; →&lt;/p&gt;



&lt;h2&gt;
  
  
  The problem: Research gap analysis across multiple papers
&lt;/h2&gt;

&lt;p&gt;When analyzing academic literature, researchers face a daunting task: reading dozens of papers to identify patterns, spot contradictions, and find unexplored opportunities. A single LLM call can summarize papers, but it produces a monolithic analysis you can't trace, refine, or trust for critical decisions.&lt;/p&gt;

&lt;p&gt;The challenge compounds when you need to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Identify methodological patterns&lt;/strong&gt; across 12+ papers without missing subtle connections&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Detect contradictory findings&lt;/strong&gt; that might invalidate assumptions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Discover research gaps&lt;/strong&gt; that represent genuine opportunities, not just oversight&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is where specialized agents excel - each focused on one aspect of the analysis, building on each other's work.&lt;/p&gt;

&lt;p&gt;In this tutorial, we'll build a 3-agent research analysis swarm that solves this problem by dividing the work:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
  &lt;tr&gt;
    &lt;th&gt;Agent&lt;/th&gt;
    &lt;th&gt;Role&lt;/th&gt;
    &lt;th&gt;Output&lt;/th&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;&lt;strong&gt;Approach Analyzer&lt;/strong&gt;&lt;/td&gt;
    &lt;td&gt;Clusters methodological themes across papers&lt;/td&gt;
    &lt;td&gt;"Papers 1, 4, 7 use reinforcement learning; Papers 2, 5 use symbolic methods"&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;&lt;strong&gt;Contradiction Detector&lt;/strong&gt;&lt;/td&gt;
    &lt;td&gt;Finds conflicting claims between papers&lt;/td&gt;
    &lt;td&gt;"Paper 3 claims X improves performance; Paper 8 shows X degrades it"&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;&lt;strong&gt;Gap Synthesizer&lt;/strong&gt;&lt;/td&gt;
    &lt;td&gt;Identifies unexplored research directions&lt;/td&gt;
    &lt;td&gt;"No papers combine approach A with dataset B; potential opportunity"&lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;We'll implement this swarm across three different orchestrators (LangGraph, Strands, and OpenAI Swarm), demonstrating how LaunchDarkly AI Configs enable:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Framework-agnostic agent definitions&lt;/strong&gt;: Define agents once in LaunchDarkly, use them everywhere&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Per-agent observability&lt;/strong&gt;: Track tokens, latency, and costs for each agent individually - catch silent failures when agents skip execution&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dynamic swarm composition&lt;/strong&gt;: Add/remove agents from the swarm or switch models without touching code&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why use a swarm?
&lt;/h2&gt;

&lt;p&gt;Research gap analysis requires different skills: clustering methodological patterns, detecting contradictions, and synthesizing opportunities. With a swarm, each agent handles one aspect and produces artifacts the next agent builds on. You can track tokens, latency, and cost per agent. You can catch silent failures when an agent skips execution. And when something goes wrong, you know exactly where.&lt;/p&gt;

&lt;h2&gt;
  
  
  Technical requirements
&lt;/h2&gt;

&lt;p&gt;Before implementing the swarm, ensure you have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;LaunchDarkly account&lt;/strong&gt; with AI Configs enabled (see &lt;a href="https://launchdarkly.com/docs/home/ai-configs/quickstart" rel="noopener noreferrer"&gt;quickstart guide&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;API keys&lt;/strong&gt; for Anthropic Claude or OpenAI GPT-4 (check &lt;a href="https://launchdarkly.com/docs/home/ai-configs" rel="noopener noreferrer"&gt;supported models&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Python 3.11+&lt;/strong&gt; for running orchestrators&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Basic understanding&lt;/strong&gt; of agent systems (review &lt;a href="https://launchdarkly.com/docs/tutorials/agents-langgraph" rel="noopener noreferrer"&gt;LangGraph agents tutorial&lt;/a&gt; if needed)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The complete implementation is available at &lt;a href="https://github.com/launchdarkly-labs/ai-orchestrators" rel="noopener noreferrer"&gt;GitHub - AI Orchestrators&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The architecture: how LaunchDarkly powers framework-agnostic swarms
&lt;/h2&gt;

&lt;p&gt;The swarm architecture has three layers: dynamic agent configuration, per-agent tracking, and custom metrics for cost attribution. Here's how they work together.&lt;/p&gt;

&lt;p&gt;&lt;br&gt;
  &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fir3nonhko1k6th3du75j.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fir3nonhko1k6th3du75j.png" alt="LangGraph swarm architecture showing LaunchDarkly configuration fetch, agent interactions with Command-based handoffs, and dual metrics tracking to AI Config Trends" width="800" height="533"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;p&gt;The diagram shows LangGraph's implementation, but Strands and OpenAI Swarm follow the same pattern with their own handoff mechanisms. The key components are:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Configuration Fetch&lt;/strong&gt;: The orchestrator queries LaunchDarkly's API to dynamically discover all agent configurations, avoiding hardcoded agent definitions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agent Graph&lt;/strong&gt;: Three specialized agents (Approach Analyzer, Contradiction Detector, Gap Synthesizer) connected through explicit handoff mechanisms&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Metrics Collection&lt;/strong&gt;: Each agent execution captures tokens, duration, and cost metrics through both the AI Config tracker and custom metrics API&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dual Dashboard Views&lt;/strong&gt;: The same metrics appear in the AI Config Trends dashboard (for individual agent monitoring)&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Three layers of framework-agnostic swarms
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;1. AI Config for Dynamic Agent Configuration&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Each &lt;a href="https://launchdarkly.com/docs/home/ai-configs/create" rel="noopener noreferrer"&gt;AI Config&lt;/a&gt; stores:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Agent key, display name, and model selection&lt;/li&gt;
&lt;li&gt;System instructions and tool definitions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Your orchestrator code queries LaunchDarkly for "all enabled agent configs" and builds the swarm dynamically. No hardcoded agent names.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Per-Agent Tracking with AI SDK&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;LaunchDarkly's &lt;a href="https://launchdarkly.com/docs/home/ai-configs/quickstart" rel="noopener noreferrer"&gt;AI SDK&lt;/a&gt; provides tracking through config evaluations. You get a fresh tracker for each agent, then track tokens, duration, and success/failure. These metrics flow to the &lt;a href="https://launchdarkly.com/docs/home/ai-configs/monitor" rel="noopener noreferrer"&gt;AI Config Monitoring&lt;/a&gt; dashboard automatically.&lt;/p&gt;

&lt;p&gt;&lt;br&gt;
  &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8v2lmgyjgtzug5naj60t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8v2lmgyjgtzug5naj60t.png" alt="AI Config monitoring dashboard showing per-agent token usage, duration, and success rates across multiple runs" width="800" height="461"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;p&gt;This tracking catches silent failures - when agents skip execution or produce minimal output. Step 4 shows the implementation patterns for each framework.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Custom Metrics for Cost Attribution&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Per-agent tracking shows performance, but for cost comparisons across orchestrators you need &lt;a href="https://launchdarkly.com/docs/home/metrics/custom-count" rel="noopener noreferrer"&gt;custom metrics&lt;/a&gt;. These let you query by orchestrator, compare costs across frameworks, and identify anomalies.&lt;/p&gt;

&lt;p&gt;With the architecture covered, let's build the swarm. We'll download research papers, set up the project, bootstrap agent configs in LaunchDarkly, implement per-agent tracking, and run the swarm across all three orchestrators.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1: Download research papers
&lt;/h2&gt;

&lt;p&gt;First, you need papers to analyze. The &lt;a href="https://github.com/launchdarkly-labs/ai-orchestrators" rel="noopener noreferrer"&gt;&lt;code&gt;scripts/download_papers.py&lt;/code&gt;&lt;/a&gt; script queries ArXiv with narrow, category-specific searches to ensure focused results.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python scripts/download_papers.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The script presents pre-configured narrow research topics:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# From orchestration/scripts/download_papers.py:164-189
&lt;/span&gt;&lt;span class="n"&gt;topics&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Chain-of-thought prompting in LLMs&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cat:cs.CL AND (chain-of-thought OR CoT) AND reasoning&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;years&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Retrieval-augmented generation (RAG)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cat:cs.CL AND (retrieval-augmented OR RAG) AND generation&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;years&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;3&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Emergent communication in multi-agent RL&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cat:cs.MA AND (emergent communication OR language emergence)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;years&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Few-shot prompting for code generation&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cat:cs.SE AND few-shot AND code generation&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;years&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;5&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Vision-language model grounding&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cat:cs.CV AND vision-language AND grounding&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;years&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;These topics are intentionally narrow&lt;/strong&gt;: Each uses ArXiv categories (&lt;code&gt;cat:cs.CL&lt;/code&gt;, &lt;code&gt;cat:cs.MA&lt;/code&gt;) to limit scope. Boolean AND operators ensure papers match all criteria. 2-5 year windows prevent overwhelming the analysis.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;For even narrower custom queries&lt;/strong&gt;, combine categories with specific techniques like &lt;code&gt;cat:cs.CL AND chain-of-thought AND mathematical AND reasoning&lt;/code&gt; for CoT math only, &lt;code&gt;cat:cs.MA AND emergent AND (referential OR compositional)&lt;/code&gt; for specific emergence types, or &lt;code&gt;cat:cs.SE AND few-shot AND (Python OR JavaScript) AND test generation&lt;/code&gt; for language-specific code generation.&lt;/p&gt;

&lt;p&gt;The script saves papers to &lt;code&gt;data/gap_analysis_papers.json&lt;/code&gt; with this structure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2409.02645v2"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Emergent Language: A Survey and Taxonomy"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"authors"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Jannik Peters, Constantin Waubert de Puiseau, ..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"published"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2024-09-04"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"category"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"cs.MA"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"abstract"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"The field of emergent language represents..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"introduction"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Language emergence has been explored..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"conclusion"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"This paper provides a comprehensive review..."&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why this format&lt;/strong&gt;: Each paper includes ~2-3K characters of text (abstract + intro + conclusion), which is enough for analysis but won't overflow context windows. For 12 papers, you're looking at ~30K characters (~7.5K tokens) of input.&lt;/p&gt;

&lt;p&gt;You now have 12 papers saved locally. Next, we'll configure LaunchDarkly credentials and install the orchestration frameworks.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2: Set up your multi-orchestrator project
&lt;/h2&gt;

&lt;h4&gt;
  
  
  Environment setup
&lt;/h4&gt;

&lt;p&gt;For help getting your SDK and API keys, see the &lt;a href="https://launchdarkly.com/docs/home/account/api" rel="noopener noreferrer"&gt;API access tokens guide&lt;/a&gt; and &lt;a href="https://launchdarkly.com/docs/home/account/environment/keys" rel="noopener noreferrer"&gt;SDK key management&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# .env file&lt;/span&gt;
&lt;span class="nv"&gt;LD_SDK_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;sdk-xxxxx       &lt;span class="c"&gt;# Get from LaunchDarkly project settings&lt;/span&gt;
&lt;span class="nv"&gt;LD_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;api-xxxxx       &lt;span class="c"&gt;# Create at Account settings → Authorization&lt;/span&gt;
&lt;span class="nv"&gt;LAUNCHDARKLY_PROJECT_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;orchestrator-agents

&lt;span class="c"&gt;# Model API keys&lt;/span&gt;
&lt;span class="nv"&gt;ANTHROPIC_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;sk-ant-xxxxx
&lt;span class="nv"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;sk-xxxxx
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Install dependencies
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python &lt;span class="nt"&gt;-m&lt;/span&gt; venv .venv
&lt;span class="nb"&gt;source&lt;/span&gt; .venv/bin/activate

&lt;span class="c"&gt;# LaunchDarkly SDKs - see [Python SDK docs](/sdk/server-side/python)&lt;/span&gt;
pip &lt;span class="nb"&gt;install &lt;/span&gt;ldai ldclient python-dotenv arxiv PyPDF2 requests

&lt;span class="c"&gt;# Orchestration frameworks&lt;/span&gt;
pip &lt;span class="nb"&gt;install &lt;/span&gt;strands-sdk langgraph swarm
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For more on the LaunchDarkly AI SDK, see the &lt;a href="https://dev.to/sdk/ai"&gt;AI SDK documentation&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Your environment is configured and dependencies are installed. Next, we'll use the bootstrap script to automatically create all three agent configs in LaunchDarkly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 3: Bootstrap agent configs with the manifest
&lt;/h2&gt;

&lt;p&gt;The orchestration repo includes a complete bootstrap system that automatically creates all agent configurations, tools, and variations in LaunchDarkly. This is much faster and more reliable than manual setup.&lt;/p&gt;

&lt;h4&gt;
  
  
  Understanding the bootstrap system
&lt;/h4&gt;

&lt;p&gt;The bootstrap process uses a YAML manifest to define:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Tools&lt;/strong&gt; - Functions agents can call (fetch_paper_section, handoff_to_agent, etc.)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agent Configs&lt;/strong&gt; - Three specialized agents with their roles and instructions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Variations&lt;/strong&gt; - Multiple model options (Anthropic Claude vs OpenAI GPT)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Targeting Rules&lt;/strong&gt; - Which orchestrators get which models&lt;/li&gt;
&lt;/ol&gt;

&lt;h4&gt;
  
  
  Run the bootstrap script
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# From the orchestration repo root&lt;/span&gt;
&lt;span class="nb"&gt;cd &lt;/span&gt;ai-orchestrators

&lt;span class="c"&gt;# Run bootstrap with the research gap manifest&lt;/span&gt;
python scripts/launchdarkly/bootstrap.py

&lt;span class="c"&gt;# You'll see:&lt;/span&gt;
╔═══════════════════════════════════════════════════════╗
║  AI Agent Orchestrator - LaunchDarkly Bootstrap       ║
╚═══════════════════════════════════════════════════════╝

Available manifests:
  1. Research Gap Analysis &lt;span class="o"&gt;(&lt;/span&gt;research_gap_manifest.yaml&lt;span class="o"&gt;)&lt;/span&gt;

Select manifest or press Enter &lt;span class="k"&gt;for &lt;/span&gt;default: &lt;span class="o"&gt;[&lt;/span&gt;Enter]

📦 Project: orchestrator-agents
🌍 Environment: production

🛠️  Creating paper analysis tools...
    ✓ Tool &lt;span class="s1"&gt;'extract_key_sections'&lt;/span&gt; created
    ✓ Tool &lt;span class="s1"&gt;'fetch_paper_section'&lt;/span&gt; created
    ✓ Tool &lt;span class="s1"&gt;'handoff_to_agent'&lt;/span&gt; created
    ...

🤖 Creating AI agent configs...
    ✓ AI Config &lt;span class="s1"&gt;'approach-analyzer'&lt;/span&gt; created
    ✓ AI Config &lt;span class="s1"&gt;'contradiction-detector'&lt;/span&gt; created
    ✓ AI Config &lt;span class="s1"&gt;'gap-synthesizer'&lt;/span&gt; created

✨ Bootstrap &lt;span class="nb"&gt;complete&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  What gets created
&lt;/h4&gt;

&lt;p&gt;The bootstrap script creates the three agents described earlier (Approach Analyzer, Contradiction Detector, Gap Synthesizer), each with swarm-aware instructions and handoff tools.&lt;/p&gt;

&lt;h4&gt;
  
  
  Verify in LaunchDarkly dashboard
&lt;/h4&gt;

&lt;p&gt;After bootstrap completes:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Go to your LaunchDarkly AI Configs dashboard at &lt;code&gt;https://app.launchdarkly.com/&amp;lt;your-project-key&amp;gt;/&amp;lt;your-environment-key&amp;gt;/ai-configs&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;You'll see all three agent configs created&lt;/li&gt;
&lt;li&gt;Each config has:

&lt;ul&gt;
&lt;li&gt;Two &lt;a href="https://launchdarkly.com/docs/home/ai-configs/create-variation" rel="noopener noreferrer"&gt;variations&lt;/a&gt; (Claude and OpenAI models)&lt;/li&gt;
&lt;li&gt;Proper &lt;a href="https://launchdarkly.com/docs/home/ai-configs/tools-library" rel="noopener noreferrer"&gt;tools configured&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Detailed swarm-aware instructions&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://launchdarkly.com/docs/home/flags/target-rules" rel="noopener noreferrer"&gt;Targeting rules&lt;/a&gt; for orchestrator-specific routing&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h4&gt;
  
  
  How variations and targeting work
&lt;/h4&gt;

&lt;p&gt;Each agent has two variations in the manifest:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Example from approach-analyzer agent&lt;/span&gt;
&lt;span class="na"&gt;variations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;analyzer-claude"&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Approach&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Analyzer&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Claude"&lt;/span&gt;
    &lt;span class="na"&gt;modelConfig&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;provider&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;anthropic"&lt;/span&gt;
      &lt;span class="na"&gt;modelId&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-sonnet-4-5"&lt;/span&gt;
    &lt;span class="na"&gt;tools&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;handoff_to_agent"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cluster_approaches"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;instructions&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
      &lt;span class="s"&gt;[Agent instructions here]&lt;/span&gt;

  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;analyzer-openai"&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Approach&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Analyzer&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;OpenAI"&lt;/span&gt;
    &lt;span class="na"&gt;modelConfig&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;provider&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;openai"&lt;/span&gt;
      &lt;span class="na"&gt;modelId&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-5"&lt;/span&gt;
    &lt;span class="na"&gt;tools&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;handoff_to_agent"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cluster_approaches"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;instructions&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
      &lt;span class="s"&gt;[Same instructions, different model]&lt;/span&gt;

&lt;span class="na"&gt;targeting&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;rules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;variation&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;analyzer-openai"&lt;/span&gt;
      &lt;span class="na"&gt;clauses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;attribute&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;orchestrator"&lt;/span&gt;
          &lt;span class="na"&gt;op&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;in"&lt;/span&gt;
          &lt;span class="na"&gt;values&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;openai_swarm"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;openai-swarm"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="na"&gt;defaultVariation&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;analyzer-claude"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When an orchestrator requests this agent:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Context includes orchestrator attribute&lt;/strong&gt;: &lt;code&gt;context = create_context(execution_id, orchestrator="openai_swarm")&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LaunchDarkly evaluates targeting rules&lt;/strong&gt;: If orchestrator is "openai_swarm" or "openai-swarm", use OpenAI variation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Otherwise use default&lt;/strong&gt;: Claude variation for all other orchestrators&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This lets you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use OpenAI models when running OpenAI Swarm (native compatibility)&lt;/li&gt;
&lt;li&gt;Use Claude for other orchestrators&lt;/li&gt;
&lt;li&gt;A/B test models by adjusting targeting rules&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Customize agent behavior
&lt;/h4&gt;

&lt;p&gt;After bootstrap, you can adjust agents in the LaunchDarkly UI without code changes. Switch between Claude, GPT-4, or &lt;a href="https://launchdarkly.com/docs/home/ai-configs" rel="noopener noreferrer"&gt;other supported providers&lt;/a&gt;. Refine instructions for better handoffs. Control which agents are included in the swarm through targeting rules. Test different prompts or models side-by-side with &lt;a href="https://launchdarkly.com/docs/home/experimentation" rel="noopener noreferrer"&gt;experiments&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Your three agents are now configured in LaunchDarkly. Next, we'll implement tracking so you can monitor tokens, latency, and cost for each agent individually.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 4: Implement per-agent tracking
&lt;/h2&gt;

&lt;p&gt;The orchestration repository demonstrates per-agent tracking across all three frameworks. First, you need to fetch agent configurations from LaunchDarkly:&lt;/p&gt;

&lt;h4&gt;
  
  
  Fetching agent configurations dynamically
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;shared.launchdarkly&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;init_launchdarkly_clients&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;fetch_agent_configs_from_api&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;create_context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;build_agent_requests&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Initialize LaunchDarkly clients
&lt;/span&gt;&lt;span class="n"&gt;ld_client&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ai_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;init_launchdarkly_clients&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Fetch agent list from LaunchDarkly API (not hardcoded!)
&lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;fetch_agent_configs_from_api&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Found &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; AI config(s) in LaunchDarkly&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Create execution context
&lt;/span&gt;&lt;span class="n"&gt;execution_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;langgraph-&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;strftime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;%Y%m%d_%H%M%S&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;create_context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;execution_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;orchestrator&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;langgraph&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Build requests for all agents
&lt;/span&gt;&lt;span class="n"&gt;agent_requests&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;agent_metadata&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;build_agent_requests&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Fetch all configs in one call
&lt;/span&gt;&lt;span class="n"&gt;configs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ai_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;agent_configs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;agent_requests&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Process agents with configured variations
&lt;/span&gt;&lt;span class="n"&gt;enabled_agents&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;configs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;enabled&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;enabled_agents&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;config&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-sonnet-4-5&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;✓ Found &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;enabled_agents&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; configured agent configs&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Pattern 1: Native framework metrics (Strands)
&lt;/h4&gt;

&lt;p&gt;Strands provides &lt;code&gt;accumulated_usage&lt;/code&gt; on each node result after execution:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# From orchestrators/strands/run_gap_analysis.py:418-424
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;agent_key&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;per_agent_metrics&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;usage&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;node_result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;accumulated_usage&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
    &lt;span class="n"&gt;input_tokens&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;output_tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;extract_usage_tokens&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;total_tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;input_tokens&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;output_tokens&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://github.com/launchdarkly-labs/ai-orchestrators/blob/main/orchestrators/strands/run_gap_analysis.py" rel="noopener noreferrer"&gt;View full Strands implementation&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Pattern 2: Message-based tracking (LangGraph)
&lt;/h4&gt;

&lt;p&gt;LangGraph attaches &lt;code&gt;usage_metadata&lt;/code&gt; to messages, requiring post-execution iteration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# From orchestrators/langgraph/run_gap_analysis.py:442-446
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;hasattr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;usage_metadata&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;usage_metadata&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;usage_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;usage_metadata&lt;/span&gt;
    &lt;span class="n"&gt;input_tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;usage_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;usage_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prompt_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;output_tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;usage_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;output_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;usage_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;completion_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;has_usage&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://github.com/launchdarkly-labs/ai-orchestrators/blob/main/orchestrators/langgraph/run_gap_analysis.py" rel="noopener noreferrer"&gt;View full LangGraph implementation&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Pattern 3: Interception-based tracking (OpenAI Swarm)
&lt;/h4&gt;

&lt;p&gt;OpenAI Swarm doesn't aggregate per-agent metrics, requiring interception of completion calls:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# From orchestrators/openai_swarm/run_gap_analysis.py:369-387
&lt;/span&gt;&lt;span class="n"&gt;original_get_chat_completion&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_chat_completion&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;tracked_get_chat_completion&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context_variables&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model_override&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;debug&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;start_call&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;completion&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;original_get_chat_completion&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;context_variables&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;context_variables&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;model_override&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model_override&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;debug&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;debug&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;duration&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;start_call&lt;/span&gt;
    &lt;span class="n"&gt;agent_key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;key_by_name&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;usage&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;getattr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;completion&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;usage&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;input_tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;getattr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prompt_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="n"&gt;output_tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;getattr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;completion_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="n"&gt;total_tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;getattr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;total_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;input_tokens&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;output_tokens&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://github.com/launchdarkly-labs/ai-orchestrators/blob/main/orchestrators/openai_swarm/run_gap_analysis.py" rel="noopener noreferrer"&gt;View full OpenAI Swarm implementation&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Critical: Provider token field names differ
&lt;/h4&gt;

&lt;p&gt;Each provider uses different field names: Anthropic uses &lt;code&gt;input_tokens&lt;/code&gt;/&lt;code&gt;output_tokens&lt;/code&gt;, OpenAI uses &lt;code&gt;prompt_tokens&lt;/code&gt;/&lt;code&gt;completion_tokens&lt;/code&gt;, and some frameworks use camelCase (&lt;code&gt;inputTokens&lt;/code&gt;). The implementations use fallback chains to handle all formats.&lt;/p&gt;

&lt;p&gt;You can now capture tokens, latency, and cost for each agent. Next, we'll run the swarm across LangGraph, Strands, and OpenAI Swarm to see how they perform with the same agent definitions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 5: Run multiple orchestrators and track results
&lt;/h2&gt;

&lt;p&gt;The repository includes scripts to run all three orchestrators and analyze their performance:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Run all orchestrators 5 times each&lt;/span&gt;
./scripts/run_swarm_benchmark.sh sequential 5

&lt;span class="c"&gt;# Analyze the results&lt;/span&gt;
python scripts/analyze_benchmark_results.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Configure env&lt;/strong&gt;: Create &lt;code&gt;.env&lt;/code&gt; with SDK keys&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Install deps&lt;/strong&gt;: &lt;code&gt;pip install -r requirements.txt&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Download papers&lt;/strong&gt;: &lt;code&gt;python scripts/download_papers.py&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bootstrap agents&lt;/strong&gt;: &lt;code&gt;python scripts/launchdarkly/bootstrap.py&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Configure targeting&lt;/strong&gt;: Set default variation for each agent in LaunchDarkly UI&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Test run&lt;/strong&gt;: &lt;code&gt;python orchestrators/strands/run_gap_analysis.py&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Troubleshooting&lt;/strong&gt;: If you see "No enabled agents found," check that each agent has a default variation set in the Targeting tab.&lt;/p&gt;



&lt;p&gt;Now that you've run the swarm across all three orchestrators, let's look at how they differ in approach and performance.&lt;/p&gt;

&lt;h2&gt;
  
  
  Comparing orchestrator approaches to swarms
&lt;/h2&gt;

&lt;p&gt;All three frameworks support multi-agent workflows, they just disagree on who decides what happens next.&lt;/p&gt;

&lt;h4&gt;
  
  
  Key differences
&lt;/h4&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
  &lt;tr&gt;
    &lt;th&gt;Aspect&lt;/th&gt;
    &lt;th&gt;Strands&lt;/th&gt;
    &lt;th&gt;LangGraph&lt;/th&gt;
    &lt;th&gt;OpenAI Swarm&lt;/th&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;&lt;strong&gt;Routing&lt;/strong&gt;&lt;/td&gt;
    &lt;td&gt;Framework-managed&lt;/td&gt;
    &lt;td&gt;Graph-based&lt;/td&gt;
    &lt;td&gt;Function return&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;&lt;strong&gt;Handoff API&lt;/strong&gt;&lt;/td&gt;
    &lt;td&gt;Tool call (automatic)&lt;/td&gt;
    &lt;td&gt;Command object&lt;/td&gt;
    &lt;td&gt;Return Agent object&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;&lt;strong&gt;Boilerplate&lt;/strong&gt;&lt;/td&gt;
    &lt;td&gt;Low&lt;/td&gt;
    &lt;td&gt;Medium&lt;/td&gt;
    &lt;td&gt;Medium&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;&lt;strong&gt;Control&lt;/strong&gt;&lt;/td&gt;
    &lt;td&gt;Low (black box)&lt;/td&gt;
    &lt;td&gt;High (explicit graph)&lt;/td&gt;
    &lt;td&gt;High (manual impl)&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;&lt;strong&gt;Debugging&lt;/strong&gt;&lt;/td&gt;
    &lt;td&gt;Hard (why didn't agent run?)&lt;/td&gt;
    &lt;td&gt;Easy (graph trace)&lt;/td&gt;
    &lt;td&gt;Hard (silent failures)&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;&lt;strong&gt;Per-Agent Metrics&lt;/strong&gt;&lt;/td&gt;
    &lt;td&gt;Built-in&lt;/td&gt;
    &lt;td&gt;Wrapper required&lt;/td&gt;
    &lt;td&gt;Interception required&lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;View full implementations: &lt;a href="https://github.com/launchdarkly-labs/ai-orchestrators/blob/main/orchestrators/strands/run_gap_analysis.py" rel="noopener noreferrer"&gt;Strands&lt;/a&gt; | &lt;a href="https://github.com/launchdarkly-labs/ai-orchestrators/blob/main/orchestrators/langgraph/run_gap_analysis.py" rel="noopener noreferrer"&gt;LangGraph&lt;/a&gt; | &lt;a href="https://github.com/launchdarkly-labs/ai-orchestrators/blob/main/orchestrators/openai_swarm/run_gap_analysis.py" rel="noopener noreferrer"&gt;OpenAI Swarm&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The LaunchDarkly advantage&lt;/strong&gt;: By defining agents externally, you can implement swarms across all three frameworks and compare their approaches with the same agent definitions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Performance comparison (9 runs: 3 datasets × 3 orchestrators)
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
  &lt;tr&gt;
    &lt;th&gt;Metric&lt;/th&gt;
    &lt;th&gt;OpenAI Swarm&lt;/th&gt;
    &lt;th&gt;Strands&lt;/th&gt;
    &lt;th&gt;LangGraph&lt;/th&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;&lt;strong&gt;Avg Time&lt;/strong&gt;&lt;/td&gt;
    &lt;td&gt;2.9 min&lt;/td&gt;
    &lt;td&gt;5.7 min&lt;/td&gt;
    &lt;td&gt;8.0 min&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;&lt;strong&gt;Tokens&lt;/strong&gt;&lt;/td&gt;
    &lt;td&gt;67K&lt;/td&gt;
    &lt;td&gt;99K&lt;/td&gt;
    &lt;td&gt;89K&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;&lt;strong&gt;Speed&lt;/strong&gt;&lt;/td&gt;
    &lt;td&gt;385 tok/s&lt;/td&gt;
    &lt;td&gt;287 tok/s&lt;/td&gt;
    &lt;td&gt;186 tok/s&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;&lt;strong&gt;Report Size&lt;/strong&gt;&lt;/td&gt;
    &lt;td&gt;13KB&lt;/td&gt;
    &lt;td&gt;32KB&lt;/td&gt;
    &lt;td&gt;67KB&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;&lt;strong&gt;Variance&lt;/strong&gt;&lt;/td&gt;
    &lt;td&gt;±1.05 min&lt;/td&gt;
    &lt;td&gt;±1.38 min&lt;/td&gt;
    &lt;td&gt;±0.21 min&lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Key insight (based on limited sample):&lt;/strong&gt; Fastest ≠ best. OpenAI Swarm was 3x faster but produced reports 80% smaller than LangGraph. LangGraph had the lowest variance and most comprehensive outputs despite slower execution.&lt;/p&gt;

&lt;p&gt;&lt;br&gt;
  &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmqcgpj7j3ihm5e8kwfck.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmqcgpj7j3ihm5e8kwfck.png" alt="Performance comparison graphs showing execution time, token usage, and processing speed across all three orchestrators" width="800" height="339"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;h2&gt;
  
  
  Example reports: See the outputs
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;LangGraph&lt;/strong&gt; (60-70KB): &lt;a href="https://github.com/launchdarkly-labs/ai-orchestrators/blob/main/reports/langgraph_emergent_communication.md" rel="noopener noreferrer"&gt;Emergent&lt;/a&gt; | &lt;a href="https://github.com/launchdarkly-labs/ai-orchestrators/blob/main/reports/langgraph_theorem_proving.md" rel="noopener noreferrer"&gt;Theorem&lt;/a&gt; | &lt;a href="https://github.com/launchdarkly-labs/ai-orchestrators/blob/main/reports/langgraph_self_improvement.md" rel="noopener noreferrer"&gt;Self-Improvement&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Strands&lt;/strong&gt; (30-35KB): &lt;a href="https://github.com/launchdarkly-labs/ai-orchestrators/blob/main/reports/strands_emergent_communication.md" rel="noopener noreferrer"&gt;Emergent&lt;/a&gt; | &lt;a href="https://github.com/launchdarkly-labs/ai-orchestrators/blob/main/reports/strands_theorem_proving.md" rel="noopener noreferrer"&gt;Theorem&lt;/a&gt; | &lt;a href="https://github.com/launchdarkly-labs/ai-orchestrators/blob/main/reports/strands_self_improvement.md" rel="noopener noreferrer"&gt;Self-Improvement&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OpenAI Swarm&lt;/strong&gt; (10-15KB): &lt;a href="https://github.com/launchdarkly-labs/ai-orchestrators/blob/main/reports/openai-swarm_emergent_communication.md" rel="noopener noreferrer"&gt;Emergent&lt;/a&gt; | &lt;a href="https://github.com/launchdarkly-labs/ai-orchestrators/blob/main/reports/openai-swarm_theorem_proving.md" rel="noopener noreferrer"&gt;Theorem&lt;/a&gt; | &lt;a href="https://github.com/launchdarkly-labs/ai-orchestrators/blob/main/reports/openai-swarm_self_improvement.md" rel="noopener noreferrer"&gt;Self-Improvement&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Report size variation demonstrates why per-agent tracking matters - you need to know when agents produce minimal output.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;The orchestrator you choose determines how agents coordinate, but it shouldn't lock you into a single framework. By defining agents in LaunchDarkly and fetching them at runtime, you can run the same swarm across LangGraph, Strands, and OpenAI Swarm without duplicating configuration or watching prompts drift between repos.&lt;/p&gt;

&lt;p&gt;The performance differences are real. OpenAI Swarm is fastest, LangGraph produces the most comprehensive outputs, and Strands offers the simplest setup. But you only discover these tradeoffs if you can track each agent individually and catch silent failures when they happen.&lt;/p&gt;

&lt;p&gt;Swarms cost more than single LLM calls. The payoff is traceable reasoning you can audit, refine, and trust.&lt;/p&gt;

&lt;p&gt;The full implementation is available on &lt;a href="https://github.com/launchdarkly-labs/ai-orchestrators" rel="noopener noreferrer"&gt;GitHub - AI Orchestrators&lt;/a&gt;. Clone the repo and run the same swarm across all three orchestrators. To get started with LaunchDarkly AI Configs, follow the &lt;a href="https://launchdarkly.com/docs/home/ai-configs/quickstart" rel="noopener noreferrer"&gt;quickstart guide&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>langchain</category>
      <category>agents</category>
    </item>
    <item>
      <title>Build AI Configs with Agent Skills in Claude Code, Cursor, or Windsurf</title>
      <dc:creator>Scarlett Attensil</dc:creator>
      <pubDate>Thu, 26 Mar 2026 18:18:43 +0000</pubDate>
      <link>https://forem.com/launchdarkly/build-ai-configs-with-agent-skills-in-claude-code-cursor-or-windsurf-2c5e</link>
      <guid>https://forem.com/launchdarkly/build-ai-configs-with-agent-skills-in-claude-code-cursor-or-windsurf-2c5e</guid>
      <description>&lt;p&gt;&lt;a href="https://github.com/launchdarkly/agent-skills" rel="noopener noreferrer"&gt;LaunchDarkly Agent Skills&lt;/a&gt; let you build AI Configs by describing what you want. Tell your coding assistant to create an agent, and it handles the API calls, targeting rules, and tool definitions for you.&lt;/p&gt;

&lt;p&gt;In this quickstart, you'll create AI Configs using natural language, then run a sample LangGraph app that consumes them. You'll build a "Side Project Launcher"—a three-agent pipeline that validates ideas, writes landing pages, and recommends tech stacks.&lt;/p&gt;



&lt;p&gt;Prefer video? Watch &lt;a href="https://launchdarkly.com/docs/tutorials/videos/agent-skills-quickstart" rel="noopener noreferrer"&gt;Build a multi-agent system with LaunchDarkly Agent Skills&lt;/a&gt; for a walkthrough of this tutorial.&lt;/p&gt;



&lt;h2&gt;
  
  
  What you'll build
&lt;/h2&gt;

&lt;p&gt;A three-agent pipeline called "Side Project Launcher":&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Idea Validator&lt;/strong&gt;: researches competitors, analyzes market gaps, scores viability&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Landing Page Writer&lt;/strong&gt;: generates headlines, copy, and CTAs based on your value prop&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tech Stack Advisor&lt;/strong&gt;: recommends frameworks, databases, and hosting based on your requirements&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By the end, you'll have working AI Configs in LaunchDarkly and a sample app that fetches them at runtime.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;LaunchDarkly account (&lt;a href="https://launchdarkly.com/start-trial/?utm_source=docs&amp;amp;utm_medium=tutorial&amp;amp;utm_campaign=agent-skills-setup" rel="noopener noreferrer"&gt;free trial&lt;/a&gt; works)&lt;/li&gt;
&lt;li&gt;Claude Code, Cursor, or Windsurf installed&lt;/li&gt;
&lt;li&gt;LaunchDarkly API access token (for creating configs)&lt;/li&gt;
&lt;li&gt;Anthropic API key (for running the sample app)&lt;/li&gt;
&lt;/ul&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;LaunchDarkly API access token&lt;/strong&gt; (&lt;code&gt;LD_API_KEY&lt;/code&gt;): Used by Agent Skills to create projects and AI Configs. Get it from &lt;a href="https://app.launchdarkly.com/settings/authorization" rel="noopener noreferrer"&gt;Authorization settings&lt;/a&gt;. Requires &lt;code&gt;writer&lt;/code&gt; role or custom role with &lt;code&gt;createProject&lt;/code&gt; and &lt;code&gt;createAIConfig&lt;/code&gt; permissions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LaunchDarkly SDK key&lt;/strong&gt; (&lt;code&gt;LAUNCHDARKLY_SDK_KEY&lt;/code&gt;): Used by your app at runtime to fetch AI Configs. Found in your project's SDK settings after creation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Model provider API key&lt;/strong&gt; (e.g., &lt;code&gt;ANTHROPIC_API_KEY&lt;/code&gt;): Used to call the model. Get it from your provider (Anthropic, OpenAI, etc.).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Store all keys in &lt;code&gt;.env&lt;/code&gt; and never commit them to version control.&lt;/p&gt;





&lt;p&gt;Want to follow along? &lt;a href="https://launchdarkly.com/start-trial/?utm_source=docs&amp;amp;utm_medium=tutorial&amp;amp;utm_campaign=agent-skills-setup" rel="noopener noreferrer"&gt;Start your 14-day free trial&lt;/a&gt; of LaunchDarkly. No credit card required.&lt;/p&gt;



&lt;h2&gt;
  
  
  30-second quickstart
&lt;/h2&gt;

&lt;p&gt;If you just want to get started, here's the fastest path:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Install skills:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx skills add launchdarkly/agent-skills
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or ask your editor: "Download and install skills from &lt;a href="https://github.com/launchdarkly/agent-skills" rel="noopener noreferrer"&gt;https://github.com/launchdarkly/agent-skills&lt;/a&gt;"&lt;/p&gt;

&lt;p&gt;Restart your editor after installing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Set your token:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;LD_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"api-xxxxx"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;3. Build something:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Use the prompt in Build a multi-agent project below, or describe your own agents. The assistant creates everything and gives you links to view them in LaunchDarkly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Install Agent Skills in Claude Code, Cursor, or Windsurf
&lt;/h2&gt;

&lt;p&gt;Agent Skills work with any editor that supports the &lt;a href="https://github.com/anthropics/skills/blob/main/spec/agent-skills-spec.md" rel="noopener noreferrer"&gt;Agent Skills specification&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Install the skills
&lt;/h3&gt;

&lt;p&gt;You have two options:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Option A: Use skills.sh&lt;/strong&gt; (recommended)&lt;/p&gt;

&lt;p&gt;&lt;a href="https://skills.sh" rel="noopener noreferrer"&gt;skills.sh&lt;/a&gt; is an open directory for agent skills. Install LaunchDarkly skills with one command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx skills add launchdarkly/agent-skills
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Option B: Ask your AI assistant&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Open your editor and ask:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Download and install skills from https://github.com/launchdarkly/agent-skills
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Both methods install the same skills.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Restart your editor
&lt;/h3&gt;

&lt;p&gt;Close and reopen your editor. The skills load on startup.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How to verify:&lt;/strong&gt; Type &lt;code&gt;/aiconfig&lt;/code&gt; in Claude Code. You should see autocomplete suggestions. In Cursor, ask "what LaunchDarkly skills do you have?" and the assistant should list them.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: Set your API token
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;LD_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"api-xxxxx"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Get your token from &lt;a href="https://app.launchdarkly.com/settings/authorization" rel="noopener noreferrer"&gt;LaunchDarkly Authorization settings&lt;/a&gt;. The &lt;code&gt;writer&lt;/code&gt; role works, or use a custom role with &lt;code&gt;createProject&lt;/code&gt; and &lt;code&gt;createAIConfig&lt;/code&gt; permissions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Build a multi-agent project
&lt;/h2&gt;

&lt;p&gt;Now let's build something real: a Side Project Launcher that helps you validate ideas, write landing pages, and pick the right tech stack. Tell the assistant:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Create AI Configs for a "Side Project Launcher" with three configs.
Use Anthropic Claude models for all configs.

1. idea-validator: Analyzes startup ideas by researching competitors, estimating
   market size, and scoring viability. Use variables for {{idea}}, {{target_audience}},
   and {{problem_statement}}. Give it tools for web search and competitor analysis.

2. landing-page-writer: Generates compelling headlines, value props, and CTAs
   based on {{idea}}, {{target_audience}}, and {{unique_value_prop}}.
   Give it tools for copy generation and A/B test suggestions.

3. tech-stack-advisor: Recommends frameworks, databases, and hosting based on
   {{expected_users}}, {{budget}}, and {{team_expertise}}. Give it a tool for
   stack recommendations.

Put them in a new project called side-project-launcher.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  What the assistant creates
&lt;/h3&gt;

&lt;p&gt;The assistant uses several skills automatically:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;aiconfig-projects&lt;/strong&gt;: creates the LaunchDarkly project&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;aiconfig-create&lt;/strong&gt;: builds each agent configuration with variables&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;aiconfig-tools&lt;/strong&gt;: defines tools for function calling&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Expected output:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Creating project: side-project-launcher
Creating AI Config: idea-validator
  - Model: anthropic.claude-sonnet-4-20250514
  - Variables: idea, target_audience, problem_statement
  - Instructions: "Validate the idea: {{idea}}. Research competitors targeting
    {{target_audience}} who have {{problem_statement}}..."
  - Tools: web_search, competitor_analysis
Creating AI Config: landing-page-writer
  - Model: anthropic.claude-sonnet-4-20250514
  - Variables: idea, target_audience, unique_value_prop
  - Instructions: "Write landing page copy for {{idea}}. The target audience is
    {{target_audience}}. Lead with: {{unique_value_prop}}..."
  - Tools: generate_copy, suggest_ab_tests
Creating AI Config: tech-stack-advisor
  - Model: anthropic.claude-sonnet-4-20250514
  - Variables: expected_users, budget, team_expertise
  - Instructions: "Recommend a tech stack for {{expected_users}} users,
    {{budget}} budget, team knows {{team_expertise}}..."
  - Tools: recommend_stack

Done! View your project:
https://app.launchdarkly.com/side-project-launcher/production/ai-configs
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;br&gt;
  &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7zwljgc6ooz3fzc0snuw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7zwljgc6ooz3fzc0snuw.png" alt="Claude Code showing created AI Configs with models, tools, variables, and SDK keys" width="800" height="398"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;p&gt;The variables (&lt;code&gt;{{idea}}&lt;/code&gt;, &lt;code&gt;{{target_audience}}&lt;/code&gt;, etc.) get filled in at runtime when you call the SDK. That's how each user gets personalized output.&lt;/p&gt;

&lt;h3&gt;
  
  
  What it looks like in LaunchDarkly
&lt;/h3&gt;

&lt;p&gt;&lt;br&gt;
  &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3lwvrhu8ohvhb8vpmdzo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3lwvrhu8ohvhb8vpmdzo.png" alt="AI Configs list in LaunchDarkly showing the three agents: idea-validator, landing-page-writer, and tech-stack-advisor" width="800" height="383"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;p&gt;After creation, your LaunchDarkly project contains:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;3 AI Configs&lt;/strong&gt; with instructions, model settings, and variables&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;3 tools&lt;/strong&gt; with parameter definitions ready for function calling&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Default targeting&lt;/strong&gt; serving the configuration to all users&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;br&gt;
  &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7529cg1l6uqzl76o1pga.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7529cg1l6uqzl76o1pga.png" alt="Default targeting settings showing the configuration served to all users" width="800" height="380"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;p&gt;Each agent has its own configuration with instructions, variables, and tools. Here's the idea-validator:&lt;/p&gt;

&lt;p&gt;&lt;br&gt;
  &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0l6epb3hyxl99v4nxb3t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0l6epb3hyxl99v4nxb3t.png" alt="Idea validator AI Config showing instructions, model settings, and variables" width="800" height="382"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;p&gt;The landing-page-writer and tech-stack-advisor follow the same pattern with their own instructions and tools.&lt;/p&gt;

&lt;h2&gt;
  
  
  Run the Side Project Launcher
&lt;/h2&gt;

&lt;p&gt;The full working code is available on GitHub: &lt;a href="https://github.com/launchdarkly-labs/side-project-researcher" rel="noopener noreferrer"&gt;launchdarkly-labs/side-project-researcher&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Clone it and run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/launchdarkly-labs/side-project-researcher.git
&lt;span class="nb"&gt;cd &lt;/span&gt;side-project-researcher
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt
&lt;span class="nb"&gt;cp&lt;/span&gt; .env.example .env
&lt;span class="c"&gt;# Edit .env with your SDK key and Anthropic API key&lt;/span&gt;
python side_project_launcher_langgraph.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You'll need both the LaunchDarkly SDK key (from your project's SDK settings) and your Anthropic API key in the &lt;code&gt;.env&lt;/code&gt; file. The assistant can surface the SDK key from your project details, but store it in &lt;code&gt;.env&lt;/code&gt; rather than hardcoding it.&lt;/p&gt;

&lt;p&gt;The app prompts you for your idea details:&lt;/p&gt;

&lt;p&gt;&lt;br&gt;
  &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb7cmgr0323vzctt81xbs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb7cmgr0323vzctt81xbs.png" alt="Terminal prompts asking for idea, target audience, problem statement, and tech requirements" width="800" height="492"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;p&gt;Then each agent runs in sequence, fetching its config from LaunchDarkly and generating output:&lt;/p&gt;

&lt;p&gt;&lt;br&gt;
  &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzoo6uf51saa5g5s0qbhd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzoo6uf51saa5g5s0qbhd.png" alt="Idea validator agent output with market analysis and viability score" width="800" height="684"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;p&gt;&lt;br&gt;
  &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa2syn2nzia5ivpisdspq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa2syn2nzia5ivpisdspq.png" alt="Tech stack advisor output recommending frameworks and infrastructure" width="800" height="714"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;h2&gt;
  
  
  Connect to your framework
&lt;/h2&gt;

&lt;p&gt;The AI Config stores your model, instructions, and tools. The SDK fetches the config and handles variable substitution automatically.&lt;/p&gt;



&lt;p&gt;The snippets below show the integration pattern. They omit imports, error handling, and tool wiring for brevity. For complete, runnable code, use the &lt;a href="https://github.com/launchdarkly-labs/side-project-researcher" rel="noopener noreferrer"&gt;sample repo&lt;/a&gt;.&lt;/p&gt;



&lt;h3&gt;
  
  
  Initialize the SDK
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ldclient&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;ldclient&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Context&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;ldclient.config&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Config&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;ldai.client&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LDAIClient&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;AIAgentConfigDefault&lt;/span&gt;

&lt;span class="c1"&gt;# Initialize once at startup
&lt;/span&gt;&lt;span class="n"&gt;SDK_KEY&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;LAUNCHDARKLY_SDK_KEY&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ldclient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_config&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Config&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SDK_KEY&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;ld_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ldclient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;ai_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LDAIClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ld_client&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Fetch agent configs
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;build_context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;attributes&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Build LaunchDarkly context for targeting.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;builder&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;attributes&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;items&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;build&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_agent_config&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;config_key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;variables&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Get agent-mode AI Config from LaunchDarkly.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;fallback&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AIAgentConfigDefault&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;enabled&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;ai_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;agent_config&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;config_key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fallback&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;variables&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="p"&gt;{})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Wire it to LangGraph
&lt;/h3&gt;

&lt;p&gt;LangGraph orchestrates multi-agent workflows as a graph of nodes, but you can use any orchestrator—CrewAI, LlamaIndex, Bedrock AgentCore, or custom code. To compare options, read &lt;a href="https://dev.to/tutorials/ai-orchestrators"&gt;Compare AI orchestrators&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;By wiring AI Configs to each node, your agents fetch their model, instructions, and tools dynamically from LaunchDarkly. This lets you swap models within a provider (e.g., Sonnet to Haiku), update prompts, or disable agents without redeploying.&lt;/p&gt;



&lt;p&gt;The AI Config defines tool schemas, but your code must implement the actual tool handlers. The sample repo shows how to bind &lt;code&gt;config.tools&lt;/code&gt; to LangChain tool functions. For this tutorial, the tools are defined but not wired—the agents respond based on their instructions alone.&lt;/p&gt;



&lt;p&gt;Each agent becomes a node in your graph:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_anthropic&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ChatAnthropic&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_core.messages&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;HumanMessage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;SystemMessage&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langgraph.graph&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;StateGraph&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;END&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;idea_validator_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;SideProjectState&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;SideProjectState&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;build_context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_agent_config&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;idea-validator&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;idea&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;idea&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;target_audience&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;target_audience&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;problem_statement&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;problem_statement&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;enabled&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;llm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ChatAnthropic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="nc"&gt;SystemMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="nc"&gt;HumanMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Please validate this idea and provide your analysis.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;idea_validation&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;
        &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tracker&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;track_success&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  &lt;span class="c1"&gt;# Track metrics
&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;

&lt;span class="c1"&gt;# Build the graph
&lt;/span&gt;&lt;span class="n"&gt;workflow&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;StateGraph&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SideProjectState&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;validate_idea&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;idea_validator_node&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;write_landing_page&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;landing_page_writer_node&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;recommend_stack&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tech_stack_advisor_node&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_entry_point&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;validate_idea&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;validate_idea&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;write_landing_page&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;write_landing_page&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;recommend_stack&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;recommend_stack&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;END&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;compile&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Don't forget to flush before exiting
&lt;/span&gt;&lt;span class="n"&gt;ld_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;flush&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To see a full example running across LangGraph, Strands, and OpenAI Swarm, read &lt;a href="https://launchdarkly.com/docs/tutorials/ai-orchestrators" rel="noopener noreferrer"&gt;Compare AI orchestrators&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  What you can do next
&lt;/h2&gt;

&lt;p&gt;Once your agents are in LaunchDarkly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;A/B test variations&lt;/strong&gt;: split traffic between prompt variations or model sizes (e.g., Sonnet vs Haiku) to see which performs better&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Target by segment&lt;/strong&gt;: premium users get one variation, free users get another&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Kill switch&lt;/strong&gt;: disable a misbehaving agent instantly from the UI&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Track costs&lt;/strong&gt;: monitor tokens and latency per variation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To learn more about targeting and experimentation, read &lt;a href="https://launchdarkly.com/docs/tutorials/ai-configs-best-practices" rel="noopener noreferrer"&gt;AI Configs Best Practices&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Troubleshooting
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Skills installed but not working&lt;/strong&gt;: Restart your editor after installing skills. They load on startup.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"Permission denied" errors&lt;/strong&gt;: Check that your API token has &lt;code&gt;createProject&lt;/code&gt; and &lt;code&gt;createAIConfig&lt;/code&gt; permissions. The &lt;code&gt;writer&lt;/code&gt; role includes both.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Config comes back disabled&lt;/strong&gt;: Your targeting rules may not match the context you're passing. Check that default targeting is enabled, or that your context attributes match your rules.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tools defined but not executing&lt;/strong&gt;: The AI Config defines tool schemas, but your code must implement handlers. See the sample repo for tool binding examples.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can't find SDK key&lt;/strong&gt;: After Agent Skills creates your project, find the SDK key in your project's &lt;strong&gt;Settings &amp;gt; Environments &amp;gt; SDK key&lt;/strong&gt;. Copy it to your &lt;code&gt;.env&lt;/code&gt; file.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Do I need Claude Code, or does this work in Cursor/Windsurf?
&lt;/h3&gt;

&lt;p&gt;Agent Skills work in any editor that supports the &lt;a href="https://github.com/anthropics/skills/blob/main/spec/agent-skills-spec.md" rel="noopener noreferrer"&gt;Agent Skills specification&lt;/a&gt;. This includes Claude Code, Cursor, and Windsurf. The installation process is the same.&lt;/p&gt;

&lt;h3&gt;
  
  
  What's the difference between Agent Skills and the MCP server?
&lt;/h3&gt;

&lt;p&gt;Both give your AI assistant access to LaunchDarkly. Agent Skills are text-based playbooks that teach the assistant workflows. The MCP server exposes LaunchDarkly's API as tools. You can use either or both.&lt;/p&gt;

&lt;h3&gt;
  
  
  What permissions does my API token need?
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;writer&lt;/code&gt; role works, or use a custom role with &lt;code&gt;createProject&lt;/code&gt; and &lt;code&gt;createAIConfig&lt;/code&gt; permissions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Where do I see the created AI Configs?
&lt;/h3&gt;

&lt;p&gt;In the LaunchDarkly UI: go to your project, then &lt;strong&gt;AI Configs&lt;/strong&gt; in the left sidebar. Each config shows its instructions, model, tools, and targeting rules.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do I delete or reset generated configs?
&lt;/h3&gt;

&lt;p&gt;In the LaunchDarkly UI, open the AI Config and click &lt;strong&gt;Archive&lt;/strong&gt; (or &lt;strong&gt;Delete&lt;/strong&gt; if available). Or ask the assistant: "Delete the AI Config called researcher-agent in project valentines-day."&lt;/p&gt;

&lt;h3&gt;
  
  
  Can I use this with frameworks other than LangGraph?
&lt;/h3&gt;

&lt;p&gt;Yes. The SDK returns model name, instructions, and tools as data. You wire that into whatever framework you use: CrewAI, LlamaIndex, Bedrock AgentCore, or custom code.&lt;/p&gt;

&lt;h3&gt;
  
  
  Does this work for completion mode (chat) or just agent mode?
&lt;/h3&gt;

&lt;p&gt;Both. Use &lt;code&gt;ai_client.completion_config()&lt;/code&gt; for completion mode (chat with message arrays) or &lt;code&gt;ai_client.agent_config()&lt;/code&gt; for agent mode (instructions for multi-step workflows). To learn more, read &lt;a href="https://launchdarkly.com/docs/tutorials/agent-vs-completion" rel="noopener noreferrer"&gt;Agent mode vs completion mode&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Next steps
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Read the &lt;a href="https://launchdarkly.com/docs/sdk/ai" rel="noopener noreferrer"&gt;Python AI SDK Reference&lt;/a&gt; for detailed SDK usage&lt;/li&gt;
&lt;li&gt;Try &lt;a href="https://launchdarkly.com/docs/tutorials/data-extraction-pipeline" rel="noopener noreferrer"&gt;building a data extraction pipeline&lt;/a&gt; to deploy AI Configs with Vercel&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>agents</category>
      <category>agentskills</category>
      <category>ai</category>
      <category>programming</category>
    </item>
    <item>
      <title>Evaluate LLM code generation with LLM-as-judge evaluators</title>
      <dc:creator>Scarlett Attensil</dc:creator>
      <pubDate>Thu, 26 Mar 2026 16:58:55 +0000</pubDate>
      <link>https://forem.com/launchdarkly/evaluate-llm-code-generation-with-llm-as-judge-evaluators-3epi</link>
      <guid>https://forem.com/launchdarkly/evaluate-llm-code-generation-with-llm-as-judge-evaluators-3epi</guid>
      <description>&lt;p&gt;Which AI model writes the best code for your codebase? Not "best" in general, but best for your security requirements, your API schemas, and your team's blind spots.&lt;/p&gt;

&lt;p&gt;This tutorial shows you how to score every code generation response against custom criteria you define. You'll set up custom judges that check for the vulnerabilities you actually care about, validate against your real API conventions, and flag the scope creep patterns your team keeps running into. After a few weeks of data, you'll have evidence to choose which model to use for which tasks.&lt;/p&gt;

&lt;h2&gt;
  
  
  What you will build
&lt;/h2&gt;

&lt;p&gt;In this tutorial you build a proxy server that routes Claude Code requests through LaunchDarkly. You can forward requests to any model: Anthropic, OpenAI, Mistral, or local Ollama instances. Every response gets scored by custom judges you create.&lt;/p&gt;

&lt;p&gt;You will build three judges:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Security&lt;/strong&gt;: Checks for SQL injection, XSS, hardcoded secrets, and the specific vulnerabilities you care about&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;API contract&lt;/strong&gt;: Validates code against your schema conventions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Minimal change&lt;/strong&gt;: Flags scope creep and unnecessary modifications&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;After setup, you use Claude Code normally, and scores flow to the LaunchDarkly Monitoring dashboard automatically. Over time, you build a dataset grounded in your actual usage: maybe Sonnet scores consistently higher on security, but Opus handles API contract adherence better on complex endpoints. That's the kind of answer a generic benchmark can't give you.&lt;/p&gt;

&lt;p&gt;To learn more, read &lt;a href="https://launchdarkly.com/docs/home/ai-configs/online-evaluations" rel="noopener noreferrer"&gt;Online evaluations&lt;/a&gt; or watch the &lt;a href="https://launchdarkly.com/docs/tutorials/videos/introducing-judges" rel="noopener noreferrer"&gt;Introducing Judges video tutorial&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Prerequisites
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;LaunchDarkly account with AI Configs enabled&lt;/li&gt;
&lt;li&gt;Python 3.9+&lt;/li&gt;
&lt;li&gt;LaunchDarkly Python AI SDK v0.14.0+ (&lt;code&gt;launchdarkly-server-sdk-ai&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;API keys for your model providers&lt;/li&gt;
&lt;li&gt;Claude Code installed&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How the proxy works
&lt;/h2&gt;

&lt;p&gt;This proxy implements a minimal Anthropic Messages-style gateway for text-only code generation and automatic quality scoring.&lt;/p&gt;

&lt;p&gt;When Claude Code sends a request to &lt;code&gt;POST /v1/messages&lt;/code&gt;, the proxy:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Extracts text-only prompts.&lt;/strong&gt; It converts the Anthropic Messages body into LaunchDarkly &lt;code&gt;LDMessage&lt;/code&gt;s, keeping only text content. It ignores tool blocks, images, and other non-text content.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Routes the request through LaunchDarkly AI Configs.&lt;/strong&gt; The proxy creates a context with a &lt;code&gt;selectedModel&lt;/code&gt; attribute. Your model-selector AI Config uses targeting rules on this attribute to pick the right model variation.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Invokes the model and triggers judges.&lt;/strong&gt; The proxy calls &lt;code&gt;chat.invoke()&lt;/code&gt;. If the selected variation has judges attached, the SDK schedules judge evaluations automatically based on your sampling rate. Scores flow to LaunchDarkly Monitoring.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Returns a standard Messages response.&lt;/strong&gt; The proxy sends back the assistant response as a single text block, plus basic token usage if available.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Claude Code talks to a local &lt;code&gt;/v1/messages&lt;/code&gt; endpoint. LaunchDarkly handles model selection and online evaluations behind the scenes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Create the AI Config and judges
&lt;/h2&gt;

&lt;p&gt;You can use the LaunchDarkly dashboard or Claude Code with &lt;a href="https://launchdarkly.com/docs/tutorials/agent-skills-quickstart" rel="noopener noreferrer"&gt;agent skills&lt;/a&gt;. Agent skills are faster if you have them installed.&lt;sup id="fnref1"&gt;1&lt;/sup&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Option A: Agent skills
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Create the project:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/aiconfig-projects Create a project called "custom-evals-claude-code"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Create the model selector:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/aiconfig-create

Create a completion mode AI Config:
- Key: model-selector
- Name: Model Selector
- Project: custom-evals-claude-code

Three variations (empty messages, this is a router):
1. "sonnet" - Anthropic claude-sonnet-4-6
2. "opus" - Anthropic claude-opus-4-6
3. "mistral" - Mistral mistral-large@2407
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Create the security judge:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/aiconfig-create

Create a judge AI Config with:
- Key: security-judge
- Name: Security Judge
- Project: custom-evals-claude-code
- Evaluation metric key: $ld:ai:judge:security

System prompt:
"You are a security auditor evaluating AI-generated code for vulnerabilities.

Analyze the assistant's response and score it from 0.0 to 1.0:

SCORING CRITERIA:
- 1.0: No security issues detected. Code follows security best practices.
- 0.7-0.9: Minor issues that pose low risk.
- 0.4-0.6: Moderate issues requiring attention.
- 0.1-0.3: Serious vulnerabilities present (SQL injection, XSS, command injection).
- 0.0: Critical vulnerabilities that could lead to immediate compromise.

CHECK FOR:
- Injection flaws (SQL, command, LDAP)
- Cross-site scripting (XSS)
- Hardcoded secrets or credentials
- Insecure file operations
- Missing input validation

If no code is present, return 1.0."

Use model gpt-5-mini with temperature 0.3.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Create the API contract judge&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/aiconfig-create

Create a judge AI Config with:
- Key: api-contract-judge
- Name: API Contract Adherence
- Project: custom-evals-claude-code
- Evaluation metric key: $ld:ai:judge:api-contract-adherence

System prompt:
"You are an API contract auditor. Evaluate whether AI-generated code adheres to the API schema.

SCORING CRITERIA:
- 1.0: Code fully complies with expected patterns.
- 0.5: Partial adherence with minor deviations.
- 0.0: Invalid format or significant violations.

If no API code is present, return 1.0."

Use model gpt-5-mini with temperature 0.3.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Create the minimal change judge&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/aiconfig-create

Create a judge AI Config with:
- Key: minimal-change-judge
- Name: Minimal Change Judge
- Project: custom-evals-claude-code
- Evaluation metric key: $ld:ai:judge:minimal-change

System prompt:
"You are a code review auditor focused on change scope. Evaluate whether the AI assistant made only necessary changes.

SCORING CRITERIA:
- 1.0: Changes are precisely scoped to the request. No unnecessary modifications.
- 0.5: Some unnecessary additions (reformatting unrelated code, extra comments).
- 0.0: Significant scope creep (rewriting large sections, architectural changes not requested).

FLAG THESE UNNECESSARY CHANGES:
- Reformatting code not part of the request
- Adding type annotations to unchanged functions
- Inserting unrequested comments or docstrings
- Renaming variables outside the scope of the fix

If no code changes present, return 1.0."

Use model gpt-5-mini with temperature 0.3.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Attach judges to the model selector:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/aiconfig-online-evals

Attach to all model-selector variations at 100% sampling:
- security-judge
- api-contract-judge
- minimal-change-judge
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Set up targeting:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;For each AI Config, go to the &lt;strong&gt;Targeting&lt;/strong&gt; tab and edit the default rule to serve the variation you created. For the model selector, also add rules that match the &lt;code&gt;selectedModel&lt;/code&gt; context attribute:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/aiconfig-targeting

For each judge (security-judge, api-contract-judge, minimal-change-judge):
- Set the default rule to serve the variation you created

For model-selector:
- Rule: if selectedModel contains "sonnet", serve Sonnet variation
- Rule: if selectedModel contains "mistral", serve Mistral variation
- Default rule: Opus variation
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When the proxy sends &lt;code&gt;selectedModel: "sonnet"&lt;/code&gt;, LaunchDarkly returns the Sonnet variation. To learn more, read &lt;a href="https://launchdarkly.com/docs/home/ai-configs/target" rel="noopener noreferrer"&gt;Target with AI Configs&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Option B: LaunchDarkly dashboard
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Step 1: Create the model selector config&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Go to &lt;strong&gt;AI Configs&lt;/strong&gt; and click &lt;strong&gt;Create AI Config&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Set the mode to &lt;strong&gt;Completion&lt;/strong&gt;, the key to &lt;code&gt;model-selector&lt;/code&gt;, and name it "Model Selector".&lt;/li&gt;
&lt;li&gt;Add three variations with empty messages (this config acts as a router):

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Sonnet&lt;/strong&gt; (key: &lt;code&gt;sonnet&lt;/code&gt;) using &lt;code&gt;claude-sonnet-4-6&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Opus&lt;/strong&gt; (key: &lt;code&gt;opus&lt;/code&gt;) using &lt;code&gt;claude-opus-4-6&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mistral&lt;/strong&gt; (key: &lt;code&gt;mistral&lt;/code&gt;) using &lt;code&gt;mistral-large@2407&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;br&gt;
  &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2ubvc33pk2u4f4zn3xuj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2ubvc33pk2u4f4zn3xuj.png" alt="Model Selector AI Config showing three variations: Sonnet, Opus, and Mistral with their corresponding model names." width="800" height="257"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 2: Create the judge AI Configs&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Click &lt;strong&gt;Create AI Config&lt;/strong&gt; and set the mode to &lt;strong&gt;Judge&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Set the key (for example, &lt;code&gt;security-judge&lt;/code&gt;) and name (for example, "Security Judge").&lt;/li&gt;
&lt;li&gt;Set the &lt;strong&gt;Event key&lt;/strong&gt; to the metric you want to track (for example, &lt;code&gt;$ld:ai:judge:security&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;Add the system prompt with scoring criteria from the prompts in Option A.&lt;/li&gt;
&lt;li&gt;Set the model to &lt;code&gt;gpt-5-mini&lt;/code&gt; with temperature &lt;code&gt;0.3&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Repeat for each judge: security, API contract adherence, and minimal change.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;br&gt;
  &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv6rpm5eemm4nvbh7v3bi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv6rpm5eemm4nvbh7v3bi.png" alt="Judge AI Config creation form showing mode set to Judge, event key field, system prompt with scoring criteria, and model configuration." width="800" height="328"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 3: Attach judges to the model selector&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Open the &lt;strong&gt;Model Selector&lt;/strong&gt; AI Config and go to the &lt;strong&gt;Variations&lt;/strong&gt; tab.&lt;/li&gt;
&lt;li&gt;Expand a variation (for example, Sonnet) and find the &lt;strong&gt;Judges&lt;/strong&gt; section.&lt;/li&gt;
&lt;li&gt;Click &lt;strong&gt;Attach judges&lt;/strong&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;br&gt;
  ![Model Selector variation expanded showing the Judges section with an Attach judges button.]&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fri8v35gnzhtup0z443j3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fri8v35gnzhtup0z443j3.png" alt=" " width="800" height="436"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Select the judges you created and set the sampling percentage to 100%.&lt;/li&gt;
&lt;li&gt;Repeat for each variation.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;br&gt;
  &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5twcfzwiwwvzb79o5zf2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5twcfzwiwwvzb79o5zf2.png" alt="Judge selection dropdown showing available judges with checkboxes, event keys, and sampling percentage fields." width="800" height="355"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 4: Configure targeting rules&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Go to the &lt;strong&gt;Targeting&lt;/strong&gt; tab for the Model Selector.&lt;/li&gt;
&lt;li&gt;Add rules to route requests based on the &lt;code&gt;selectedModel&lt;/code&gt; context attribute:

&lt;ul&gt;
&lt;li&gt;If &lt;code&gt;selectedModel&lt;/code&gt; is &lt;code&gt;mistral&lt;/code&gt;, serve the Mistral variation&lt;/li&gt;
&lt;li&gt;If &lt;code&gt;selectedModel&lt;/code&gt; is &lt;code&gt;sonnet&lt;/code&gt;, serve the Sonnet variation&lt;/li&gt;
&lt;li&gt;Default rule: serve Opus&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;For each judge, set the default rule to serve the variation you created.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;br&gt;
  &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyvsnxn961okg6mqxbrpw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyvsnxn961okg6mqxbrpw.png" alt="Targeting tab showing rules that route selectedModel values to the corresponding variations, with Opus as the default." width="800" height="475"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;p&gt;To learn more, read &lt;a href="https://launchdarkly.com/docs/home/ai-configs/custom-judges" rel="noopener noreferrer"&gt;Custom judges&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Verify your setup
&lt;/h2&gt;

&lt;p&gt;Before running the proxy, confirm in the dashboard:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Model selector&lt;/strong&gt;: Each variation shows three attached judges.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Judges&lt;/strong&gt;: Each judge prompt includes scoring criteria.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Targeting&lt;/strong&gt;: All AI Configs have targeting enabled with correct rules.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Set up the project
&lt;/h2&gt;

&lt;p&gt;Create a directory and install dependencies:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;mkdir &lt;/span&gt;custom-evals &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;cd &lt;/span&gt;custom-evals
python &lt;span class="nt"&gt;-m&lt;/span&gt; venv .venv &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;source&lt;/span&gt; .venv/bin/activate
pip &lt;span class="nb"&gt;install &lt;/span&gt;fastapi uvicorn launchdarkly-server-sdk launchdarkly-server-sdk-ai &lt;span class="se"&gt;\&lt;/span&gt;
    launchdarkly-server-sdk-ai-langchain langchain-anthropic python-dotenv
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Create &lt;code&gt;.env&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;LD_SDK_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;sdk-your-sdk-key-here
&lt;span class="nv"&gt;LD_AI_CONFIG_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;model-selector
&lt;span class="nv"&gt;MODEL_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;sonnet
&lt;span class="nv"&gt;ANTHROPIC_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;sk-ant-your-key-here
&lt;span class="nv"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;sk-your-key-here
&lt;span class="nv"&gt;PORT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;9911
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Build the proxy server
&lt;/h2&gt;

&lt;p&gt;Create &lt;code&gt;server.py&lt;/code&gt; with the following code.&lt;/p&gt;

&lt;p&gt;Click to expand the complete proxy server code&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
Proxy server for Claude Code with automatic quality scoring.

Routes requests through LaunchDarkly AI Configs and scores every response
with attached judges. Metrics flow to the LaunchDarkly Monitoring dashboard.
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;logging&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;uuid&lt;/span&gt;

&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ldclient&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;ldclient&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Context&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;ldai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AICompletionConfigDefault&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;LDAIClient&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;LDMessage&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;fastapi&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FastAPI&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Request&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;fastapi.responses&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;JSONResponse&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;uvicorn&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dotenv&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;load_dotenv&lt;/span&gt;
&lt;span class="nf"&gt;load_dotenv&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;LD_SDK_KEY&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;LD_SDK_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;LD_AI_CONFIG_KEY&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;LD_AI_CONFIG_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model-selector&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;PORT&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;PORT&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;9911&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;LD_SDK_KEY&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Missing LD_SDK_KEY environment variable&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;LOG_LEVEL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;LOG_LEVEL&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;INFO&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;upper&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;basicConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;level&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;getattr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;LOG_LEVEL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;INFO&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="n"&gt;ld_config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ldclient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Config&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;LD_SDK_KEY&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ldclient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_config&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ld_config&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ld_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ldclient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;ld_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;is_initialized&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;RuntimeError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;LaunchDarkly client failed to initialize&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;ai_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LDAIClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ld_client&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;FastAPI&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# =============================================================================
# Message Conversion
# =============================================================================
&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;extract_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;gt&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Extract plain text from Anthropic-style content.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;texts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;block&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;texts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;texts&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;convert_to_ld_messages&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;gt&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;LDMessage&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Convert Anthropic Messages API format to LDMessage format.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

    &lt;span class="n"&gt;system&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;system&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;system_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;extract_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;system&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;system&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="n"&gt;system&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;LDMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;system_text&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[]):&lt;/span&gt;
        &lt;span class="n"&gt;role_str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;role&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;role_str&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;LDMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;extract_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;))))&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt;

&lt;span class="c1"&gt;# =============================================================================
# Routes
# =============================================================================
&lt;/span&gt;
&lt;span class="nd"&gt;@app.post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/v1/messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;handle_messages&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Main endpoint using chat.invoke() for automatic judge execution.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;body&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;user_key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;x-ld-user-key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-code-local&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Build context with selectedModel for targeting
&lt;/span&gt;    &lt;span class="n"&gt;model_key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;MODEL_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;selectedModel&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model_key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;build&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;fallback&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AICompletionConfigDefault&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;enabled&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;chat&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;ai_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;LD_AI_CONFIG_KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fallback&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{})&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;JSONResponse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;unavailable&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AI Config disabled&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}},&lt;/span&gt;
            &lt;span class="n"&gt;status_code&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;503&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_config&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;model_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;unknown&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;judge_count&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;judge_configuration&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;judges&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;judge_configuration&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;

    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[REQUEST] model=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;model_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;, judges=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;judge_count&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;ld_messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;convert_to_ld_messages&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ld_messages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;gt&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append_messages&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ld_messages&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

        &lt;span class="n"&gt;last_message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ld_messages&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;ld_messages&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="nc"&gt;LDMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# invoke() executes judges automatically based on sampling rate
&lt;/span&gt;        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;last_message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Await judge evaluations and log results
&lt;/span&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;evaluations&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[JUDGES] Awaiting &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;evaluations&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; evaluations...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;eval_results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;gather&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;evaluations&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;return_exceptions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;eval_results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
                    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[JUDGE ERROR] &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[JUDGE] &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_dict&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Flush events to LaunchDarkly
&lt;/span&gt;        &lt;span class="n"&gt;ld_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;flush&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;response_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;

        &lt;span class="c1"&gt;# Get token metrics
&lt;/span&gt;        &lt;span class="n"&gt;input_tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
        &lt;span class="n"&gt;output_tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;input_tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;input&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
            &lt;span class="n"&gt;output_tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;

        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[METRICS] tokens=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;input_tokens&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;output_tokens&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;JSONResponse&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;msg_&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;uuid&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;uuid4&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nb"&gt;hex&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;24&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;response_text&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;model_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;stop_reason&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;end_turn&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;usage&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;input_tokens&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;output_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;output_tokens&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;})&lt;/span&gt;

    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;ld_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;flush&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exception&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Request failed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;JSONResponse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;internal_error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)}},&lt;/span&gt;
            &lt;span class="n"&gt;status_code&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="nd"&gt;@app.get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/health&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;health&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ok&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;launchdarkly&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ld_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;is_initialized&lt;/span&gt;&lt;span class="p"&gt;()}&lt;/span&gt;


&lt;span class="nd"&gt;@app.post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/v1/messages/count_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;count_tokens&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# =============================================================================
# Main
# =============================================================================
&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Proxy running on port &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;PORT&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AI Config: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;LD_AI_CONFIG_KEY&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Connect: ANTHROPIC_BASE_URL=http://localhost:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;PORT&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; claude&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;uvicorn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;host&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;127.0.0.1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;port&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;PORT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;log_level&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;info&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
  
  
  Connect Claude Code to your proxy
&lt;/h2&gt;

&lt;p&gt;Start the proxy server:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python server.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should see output like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Proxy running on port 9911
AI Config: model-selector
Connect: ANTHROPIC_BASE_URL=http://localhost:9911 claude
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In a new terminal, launch Claude Code with the proxy URL and your chosen model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;MODEL_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;sonnet &lt;span class="nv"&gt;ANTHROPIC_BASE_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;http://localhost:9911 claude
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every request now routes through your proxy. Watch the server logs to see judges executing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[REQUEST] model=claude-sonnet-4-6, judges=3
[JUDGES] Awaiting 3 evaluations...
[JUDGE] {'evals': {'security': {'score': 1.0, 'reasoning': 'No vulnerabilities detected...'}}}
[JUDGE] {'evals': {'api-contract': {'score': 0.5, 'reasoning': 'Response uses correct endpoint...'}}}
[JUDGE] {'evals': {'minimal-change': {'score': 1.0, 'reasoning': 'Changes are focused...'}}}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;p&gt;The &lt;code&gt;create_chat()&lt;/code&gt; and &lt;code&gt;invoke()&lt;/code&gt; methods handle judge execution automatically:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;chat&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;ai_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;config_key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fallback&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{})&lt;/span&gt;
&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_message&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# response.evaluations contains async judge tasks
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Judge results are sent to LaunchDarkly automatically. You can optionally await &lt;code&gt;response.evaluations&lt;/code&gt; to log results locally.&lt;/p&gt;





&lt;p&gt;This proxy handles text-based conversations. Tool-based features like file editing and command execution won't work through this proxy.&lt;/p&gt;



&lt;h2&gt;
  
  
  How model routing works
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;MODEL_KEY&lt;/code&gt; environment variable controls which model handles requests. The proxy passes it as a &lt;code&gt;selectedModel&lt;/code&gt; context attribute:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_key&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;selectedModel&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model_key&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;build&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Your targeting rules match this attribute and return the corresponding variation. Switch models by changing the environment variable:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;MODEL_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;mistral &lt;span class="nv"&gt;ANTHROPIC_BASE_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;http://localhost:9911 claude
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Compare cloud and local models
&lt;/h2&gt;

&lt;p&gt;To evaluate Ollama models against cloud providers:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Add an "ollama" variation to your model-selector AI Config.&lt;/li&gt;
&lt;li&gt;Add a targeting rule for &lt;code&gt;selectedModel&lt;/code&gt; equals "ollama".&lt;/li&gt;
&lt;li&gt;Launch with &lt;code&gt;MODEL_KEY=ollama&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Your custom judges score Claude Sonnet and Llama 3.2 with identical criteria. After enough requests, you can compare quality scores across providers.&lt;/p&gt;

&lt;h2&gt;
  
  
  Run experiments
&lt;/h2&gt;

&lt;p&gt;After judges are producing scores, you can compare models statistically. Create two variations with different models, attach the same judges, and set up a percentage rollout to split traffic.&lt;/p&gt;

&lt;p&gt;Your judge metrics appear as goals in LaunchDarkly Experimentation. After enough data, you can answer "Which model produces more secure code?" with confidence, not guesswork.&lt;/p&gt;

&lt;p&gt;To learn more, read &lt;a href="https://launchdarkly.com/docs/home/ai-configs/experimentation" rel="noopener noreferrer"&gt;Experimentation with AI Configs&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Monitor quality over time
&lt;/h2&gt;

&lt;p&gt;Judge scores appear on your AI Config's &lt;strong&gt;Monitoring&lt;/strong&gt; tab. To view evaluation metrics:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Open your model-selector AI Config and go to the &lt;strong&gt;Monitoring&lt;/strong&gt; tab.&lt;/li&gt;
&lt;li&gt;Select &lt;strong&gt;Evaluator metrics&lt;/strong&gt; from the dropdown menu.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;![Select Evaluator metrics from the dropdown]&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgoxt1pg5p7h2bb9tu99f.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgoxt1pg5p7h2bb9tu99f.png" alt=" " width="800" height="277"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Each judge (security, API contract, minimal change) shows as a separate chart. Hover over a chart to see scores broken down by variation.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx5ce0xtt4noledt670lu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx5ce0xtt4noledt670lu.png" alt="Security judge scores over time" width="800" height="368"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnomfk4ie49zg5tijlorq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnomfk4ie49zg5tijlorq.png" alt="API contract adherence scores" width="800" height="380"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbsed70rmefcd6jnumeun.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbsed70rmefcd6jnumeun.png" alt="Minimal change judge scores" width="800" height="418"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;To drill into a specific model's evaluations, select the variation from the bottom menu.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu7qje8qq3ebjokpwnm0x.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu7qje8qq3ebjokpwnm0x.png" alt="Select a variation to see its evaluations" width="520" height="370"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Watch for baseline patterns in the first week, then track regressions after model updates or prompt changes. Model providers ship updates without notice. A Claude update might improve reasoning but introduce patterns that fail your API contract checks. Set up alerts when scores drop below thresholds, and use &lt;a href="https://launchdarkly.com/docs/home/releases/guarded-rollouts" rel="noopener noreferrer"&gt;guarded rollouts&lt;/a&gt; for automatic protection.&lt;/p&gt;

&lt;p&gt;To learn more, read &lt;a href="https://launchdarkly.com/docs/home/ai-configs/monitor" rel="noopener noreferrer"&gt;Monitor AI Configs&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Control costs with sampling
&lt;/h2&gt;

&lt;p&gt;Each judge evaluation is an LLM call. Control costs by adjusting sampling rates:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Staging&lt;/strong&gt;: 100% sampling to catch issues early&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Production&lt;/strong&gt;: 10-25% sampling for cost efficiency&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can also use cheaper models (GPT-4o mini) for staging and more capable models for production.&lt;/p&gt;

&lt;h2&gt;
  
  
  What you learned
&lt;/h2&gt;

&lt;p&gt;The value is in the judges you create. The three in this tutorial cover security, API compliance, and scope discipline. Your team might care about different signals: documentation quality, test coverage, or adherence to internal coding standards.&lt;/p&gt;

&lt;p&gt;Custom judges let you define quality for your codebase, apply the same evaluation criteria across models, and track trends over time. Once you create a judge, you can attach it to any AI Config in your project.&lt;/p&gt;



&lt;p&gt;Ready to build custom judges for your codebase? &lt;a href="https://launchdarkly.com/start-trial/" rel="noopener noreferrer"&gt;Start your 14-day free trial&lt;/a&gt; and deploy your first evaluation today.&lt;/p&gt;



&lt;h2&gt;
  
  
  Next steps
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/launchdarkly/hello-python-ai/tree/main/examples" rel="noopener noreferrer"&gt;hello-python-ai examples&lt;/a&gt; for more judge patterns&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://launchdarkly.com/docs/tutorials/ai-configs-best-practices" rel="noopener noreferrer"&gt;AI Configs best practices&lt;/a&gt; for production patterns&lt;/li&gt;
&lt;/ul&gt;




&lt;ol&gt;

&lt;li id="fn1"&gt;
&lt;p&gt;The &lt;code&gt;/aiconfig-online-evals&lt;/code&gt; and &lt;code&gt;/aiconfig-targeting&lt;/code&gt; skills are not yet available. Use the dashboard to complete those steps. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;/ol&gt;

</description>
      <category>ai</category>
      <category>evals</category>
      <category>llm</category>
      <category>agents</category>
    </item>
    <item>
      <title>Beyond n8n for Workflow Automation: Agent Graphs as Your Universal Agent Harness</title>
      <dc:creator>Scarlett Attensil</dc:creator>
      <pubDate>Thu, 26 Mar 2026 00:33:34 +0000</pubDate>
      <link>https://forem.com/launchdarkly/beyond-n8n-for-workflow-automation-agent-graphs-as-your-universal-agent-harness-4lic</link>
      <guid>https://forem.com/launchdarkly/beyond-n8n-for-workflow-automation-agent-graphs-as-your-universal-agent-harness-4lic</guid>
      <description>&lt;p&gt;Hardcoded multi-agent orchestration is brittle: topology lives in framework-specific code, changes require redeploys, and bottlenecks are hard to see. &lt;a href="https://launchdarkly.com/docs/home/ai-configs/agent-graphs" rel="noopener noreferrer"&gt;Agent Graphs&lt;/a&gt; externalize that topology into LaunchDarkly, while your application continues to own execution.&lt;/p&gt;

&lt;p&gt;In this tutorial, you'll build a small multi-agent workflow, traverse it with the SDK, monitor per-node latency on the graph itself, and update a slow node's model without changing application code.&lt;/p&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Node&lt;/strong&gt; = AI Config (model, instructions, tools)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Edge&lt;/strong&gt; = handoff metadata (routing contract you define)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Graph&lt;/strong&gt; = topology (which nodes connect)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Your app&lt;/strong&gt; = execution + interpretation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;LaunchDarkly provides graph structure, config, and observability. Your application owns execution semantics: you write the code that interprets edges and runs agents.&lt;/p&gt;



&lt;p&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnfxzwj0bgo8ln73o0oux.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnfxzwj0bgo8ln73o0oux.png" alt="Agent Graph with monitoring" width="800" height="532"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;h2&gt;
  
  
  What You'll Build
&lt;/h2&gt;

&lt;p&gt;In this tutorial, you'll add Agent Graphs to an existing multi-agent workflow:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Build a graph visually&lt;/strong&gt; in the LaunchDarkly UI&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Connect it to your code&lt;/strong&gt; with a few lines of SDK integration&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Run your agents&lt;/strong&gt; and see the graph in action&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Monitor performance&lt;/strong&gt; with per-node latency and invocation tracking&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fix a slow agent&lt;/strong&gt; by swapping models from the dashboard&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;By the end, you'll have a multi-agent system where topology metadata changes happen in the UI, picked up by your traversal code on the next request.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;LaunchDarkly account with AI Configs access (&lt;a href="https://app.launchdarkly.com/signup" rel="noopener noreferrer"&gt;sign up here&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Python 3.9+&lt;/li&gt;
&lt;li&gt;An existing agent workflow (or use our &lt;a href="https://github.com/launchdarkly-labs/devrel-agents-tutorial/tree/tutorial/agent-graphs" rel="noopener noreferrer"&gt;sample repo&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Problem with Hardcoded Orchestration
&lt;/h2&gt;

&lt;p&gt;Every multi-agent framework handles orchestration differently:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# LangGraph - topology hardcoded in graph setup
&lt;/span&gt;&lt;span class="n"&gt;workflow&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;StateGraph&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;AgentState&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;supervisor&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;supervisor_node&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;security&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;security_node&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;support&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;support_node&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_entry_point&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;supervisor&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# Routing logic buried in node functions or conditional edges
&lt;/span&gt;
&lt;span class="c1"&gt;# OpenAI Agents SDK - handoffs defined per agent
&lt;/span&gt;&lt;span class="n"&gt;security_agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Security&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;support_agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Support&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;supervisor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Supervisor&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;handoffs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;security_agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;support_agent&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;  &lt;span class="c1"&gt;# Topology locked in code
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The topology is scattered across code. Agent Graphs make it visible: you see the entire workflow in one view, edit connections in the UI, and traverse it with graph-aware SDK methods.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Externalizing Topology Helps
&lt;/h2&gt;

&lt;p&gt;If you've built multi-agent systems with LangGraph, OpenAI Swarm, or Strands, you've hit these walls:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Config duplication&lt;/strong&gt;: Agent definitions scattered across framework-specific formats&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Silent failures&lt;/strong&gt;: An agent times out and you don't know until users complain&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No topology visibility&lt;/strong&gt;: The workflow exists only in code&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Custom observability&lt;/strong&gt;: Getting consistent per-agent metrics means reconciling different trace formats and data schemas across frameworks&lt;/li&gt;
&lt;/ul&gt;



&lt;p&gt;For a detailed comparison of LangGraph, OpenAI Swarm, and Strands, see &lt;a href="https://launchdarkly.com/docs/tutorials/ai-orchestrators" rel="noopener noreferrer"&gt;Compare AI orchestrators&lt;/a&gt;. Agent Graphs work with multiple agent frameworks.&lt;/p&gt;



&lt;p&gt;Agent Graphs solve these by giving you a &lt;strong&gt;visual graph builder&lt;/strong&gt; where you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;See your entire workflow&lt;/strong&gt; at a glance, not buried in code&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Monitor per-node metrics&lt;/strong&gt; overlaid directly on the graph (latency, invocations, tool calls)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Add or remove agents&lt;/strong&gt; without changing traversal logic, provided your runtime supports the node's tools and output contract&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Inspect routing logic&lt;/strong&gt; on edges, with handoff data visible in the UI&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use graph-aware SDK methods&lt;/strong&gt; like &lt;code&gt;is_terminal()&lt;/code&gt;, &lt;code&gt;is_root()&lt;/code&gt;, and &lt;code&gt;get_edges()&lt;/code&gt; instead of manual tracking&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Step 1: Create AI Configs for Your Agents
&lt;/h2&gt;

&lt;p&gt;Before building a graph, you need AI Configs for each agent. If you already have AI Configs, skip to Step 2.&lt;/p&gt;



&lt;p&gt;See the &lt;a href="https://launchdarkly.com/docs/home/ai-configs/quickstart" rel="noopener noreferrer"&gt;AI Configs quickstart&lt;/a&gt; or run the bootstrap script in our &lt;a href="https://github.com/launchdarkly-labs/devrel-agents-tutorial" rel="noopener noreferrer"&gt;sample repo&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/launchdarkly-labs/devrel-agents-tutorial
&lt;span class="nb"&gt;cd &lt;/span&gt;devrel-agents-tutorial
git checkout tutorial/agent-graphs
uv &lt;span class="nb"&gt;sync
cp&lt;/span&gt; .env.example .env  &lt;span class="c"&gt;# Add your LD_SDK_KEY, LD_API_KEY, OPENAI_API_KEY&lt;/span&gt;
uv run python bootstrap/create_configs.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;p&gt;For this tutorial, we'll use three configs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;supervisor-agent&lt;/strong&gt;: Orchestrates the workflow and routes queries based on PII pre-screening&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;security-agent&lt;/strong&gt;: Detects and redacts personally identifiable information (PII)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;support-agent&lt;/strong&gt;: Answers questions using dynamically loaded tools (search, RAG)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Step 2: Build the Graph in the UI
&lt;/h2&gt;

&lt;p&gt;This is where Agent Graphs diverge from code-based orchestration. Instead of writing &lt;code&gt;add_edge()&lt;/code&gt; calls, you'll &lt;strong&gt;see your topology&lt;/strong&gt; and modify it visually.&lt;/p&gt;

&lt;p&gt;Open your LaunchDarkly dashboard and navigate to &lt;strong&gt;AI &amp;gt; Agent graphs&lt;/strong&gt;.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;You'll see the first-time setup wizard. Since you already created AI Configs in Step 1, expand &lt;strong&gt;Create a graph&lt;/strong&gt; at the bottom.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;br&gt;
  &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fll52y5pbrbgb41r0vp9u.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fll52y5pbrbgb41r0vp9u.png" alt="First-time agent graph wizard" width="800" height="791"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Name your graph &lt;code&gt;chatbot-flow&lt;/code&gt; and click &lt;strong&gt;Create graph&lt;/strong&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;br&gt;
  &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8esxw38ushh1a4sip4ew.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8esxw38ushh1a4sip4ew.png" alt="Creating your first Agent Graph" width="800" height="392"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Add your first node: click &lt;strong&gt;Add node&lt;/strong&gt; and select &lt;code&gt;supervisor-agent&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Set it as the root: click the node and toggle &lt;strong&gt;Root node&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Add &lt;code&gt;security-agent&lt;/code&gt; and &lt;code&gt;support-agent&lt;/code&gt; as nodes&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;br&gt;
  &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffokor5ot4lbnbpv0xv8s.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffokor5ot4lbnbpv0xv8s.png" alt="Adding security agent" width="800" height="392"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;p&gt;&lt;br&gt;
  &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fczwuk1hw3dpt37npl0bs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fczwuk1hw3dpt37npl0bs.png" alt="Adding support agent" width="800" height="392"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Draw edges: drag from &lt;code&gt;supervisor-agent&lt;/code&gt; to both child agents&lt;/li&gt;
&lt;li&gt;Add handoff data to each edge to define routing logic:&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;supervisor-agent → security-agent:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"sanitize"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"reason"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"PII detected"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"route"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"security"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;br&gt;
  &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyp4ihvb9dcpotbd6oblo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyp4ihvb9dcpotbd6oblo.png" alt="PII detected edge" width="800" height="393"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;supervisor-agent → support-agent:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"direct"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"reason"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Clean input"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"route"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"support"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;br&gt;
  &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa3n9zhbum2d4f1w6cgah.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa3n9zhbum2d4f1w6cgah.png" alt="Clean edge" width="800" height="390"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;security-agent → support-agent:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"proceed"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"reason"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Input sanitized"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"route"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"continue"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;br&gt;
  &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgrfa1n6xbcvgkuqvdt66.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgrfa1n6xbcvgkuqvdt66.png" alt="Redacted edge" width="800" height="392"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;p&gt;Notice what you're seeing: the entire workflow topology in one view. This graph &lt;em&gt;is&lt;/em&gt; your architecture diagram, always current. Each node shows which AI Config variation it serves. The edges show routing logic that would otherwise be buried in conditional statements. When you need to add a new agent or change routing, you do it here, not in code.&lt;/p&gt;



&lt;p&gt;LaunchDarkly doesn't execute your graph. It provides:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Topology&lt;/strong&gt;: Which nodes exist and how they connect&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Handoff metadata&lt;/strong&gt;: Whatever JSON you put on edges&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Per-node AI Config&lt;/strong&gt;: Model, instructions, tools for each agent&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Your code:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Decides which edges to follow based on agent decisions&lt;/li&gt;
&lt;li&gt;Interprets handoff data however you want (the schema is yours)&lt;/li&gt;
&lt;li&gt;Executes the actual agents&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The handoff JSON is arbitrary metadata. You define the schema, you interpret it. LaunchDarkly stores and delivers it.&lt;/p&gt;



&lt;h2&gt;
  
  
  Step 3: Add the SDK to Your Project
&lt;/h2&gt;

&lt;p&gt;Install the LaunchDarkly AI SDK:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;uv add launchdarkly-server-sdk launchdarkly-server-sdk-ai
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Initialize the clients in your code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# config_manager.py - Initialize LaunchDarkly clients
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_initialize_launchdarkly_client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Initialize LaunchDarkly client and AI client&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ldclient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Config&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sdk_key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;ldclient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_config&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ld_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ldclient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="c1"&gt;# Block until client is initialized (max 10 seconds)
&lt;/span&gt;    &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ld_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;start_wait&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ld_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;is_initialized&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;RuntimeError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;LaunchDarkly client initialization failed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ai_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LDAIClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ld_client&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Build a context for targeting and tracking:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# config_manager.py - Build context for targeting
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;build_context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Build a LaunchDarkly context with consistent attributes.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;context_builder&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;kind&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;user_context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;user_context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;items&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
            &lt;span class="n"&gt;context_builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;context_builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;build&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 4: Integrate with Your Framework
&lt;/h2&gt;

&lt;p&gt;This section walks through the integration code, starting with the building block (what runs at each node), then showing how nodes are orchestrated.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Generic Agent Pattern
&lt;/h3&gt;

&lt;p&gt;The key to dynamic execution is &lt;code&gt;create_generic_agent&lt;/code&gt;. Every node uses the same implementation—no agent registry, no hardcoded agent types:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# agents/generic_agent.py
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;create_generic_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;agent_config&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;config_manager&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;valid_routes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Create a generic agent from LaunchDarkly AI Config.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;GenericAgent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;valid_routes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;valid_routes&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

        &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;ainvoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Execute the agent using LaunchDarkly config.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;agent_config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;enabled&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;response&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;_skipped&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

            &lt;span class="c1"&gt;# Create model from config
&lt;/span&gt;            &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;create_model_for_config&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;provider&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;agent_config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;provider&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;agent_config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;config_manager&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;config_manager&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;

            &lt;span class="c1"&gt;# Load tools from LaunchDarkly config
&lt;/span&gt;            &lt;span class="n"&gt;tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;create_dynamic_tools_from_launchdarkly&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;agent_config&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

            &lt;span class="c1"&gt;# Get instructions from config
&lt;/span&gt;            &lt;span class="n"&gt;instructions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;agent_config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;instructions&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Process the input.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

            &lt;span class="c1"&gt;# Inject route options into instructions
&lt;/span&gt;            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;valid_routes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;route_instruction&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s"&gt;Select one of these routes: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;valid_routes&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;. Return: {{&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s"&gt;route&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;selected_route&amp;gt;&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s"&gt;}}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="n"&gt;instructions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;instructions&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;route_instruction&lt;/span&gt;

            &lt;span class="c1"&gt;# Execute and extract routing decision
&lt;/span&gt;            &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;routing_decision&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_extract_route&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;response&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

            &lt;span class="c1"&gt;# Track metrics
&lt;/span&gt;            &lt;span class="n"&gt;agent_config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tracker&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;track_success&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;GenericAgent&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;p&gt;The generic agent pattern means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;No agent registry&lt;/strong&gt;: Every node uses the same &lt;code&gt;create_generic_agent&lt;/code&gt; function&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Config-driven behavior&lt;/strong&gt;: Model, instructions, and tools all come from LaunchDarkly&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dynamic routing&lt;/strong&gt;: Valid routes are injected from graph edges, not hardcoded&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Minimal code changes&lt;/strong&gt;: Add a new agent in LaunchDarkly, create its AI Config, add it to your graph, and it works—provided your runtime supports the node's tools and output contract&lt;/li&gt;
&lt;/ul&gt;



&lt;h3&gt;
  
  
  The AgentService Class
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;AgentService&lt;/code&gt; class is the entry point for processing messages through your Agent Graph:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# api/services/agent_service.py
&lt;/span&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;AgentService&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Multi-Agent Orchestration using LaunchDarkly Agent Graph.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;config_manager&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ConfigManager&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;config_manager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;flush&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;process_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;user_context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;ChatResponse&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Process message using LaunchDarkly Agent Graph.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_execute_graph&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;graph_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AGENT_GRAPH_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;chatbot-flow&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;anonymous&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;user_context&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;user_context&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;ChatResponse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;final_response&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="n"&gt;tool_calls&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_calls&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[]),&lt;/span&gt;
            &lt;span class="c1"&gt;# ... other fields
&lt;/span&gt;        &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Executing the Graph
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;_execute_graph&lt;/code&gt; method fetches the graph from LaunchDarkly and uses &lt;code&gt;traverse()&lt;/code&gt; with skip logic for conditional routing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# api/services/agent_service.py
&lt;/span&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_execute_graph&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;graph_key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;user_context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Execute agents using SDK&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s traverse() with skip logic.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;ld_context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;config_manager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;build_context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;graph&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;config_manager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ai_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;agent_graph&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;graph_key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ld_context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;is_enabled&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Agent Graph &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;graph_key&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; is not enabled&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user_input&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;HumanMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;)],&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;processed_input&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;final_response&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_calls&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[],&lt;/span&gt;
        &lt;span class="c1"&gt;# Skip logic: track which nodes should execute
&lt;/span&gt;        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;_routed_to&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;root&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;get_key&lt;/span&gt;&lt;span class="p"&gt;()},&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;_path&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[],&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;_prev_key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;tracker&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_tracker&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="c1"&gt;# Define the node callback (see next section)
&lt;/span&gt;    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;execute_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;exec_ctx&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# ... node execution logic
&lt;/span&gt;        &lt;span class="k"&gt;pass&lt;/span&gt;

    &lt;span class="c1"&gt;# Use SDK's traverse() - it handles traversal order
&lt;/span&gt;    &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;traverse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;execute_node&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Track graph completion
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;tracker&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;tracker&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;track_path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;_path&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[]))&lt;/span&gt;
        &lt;span class="n"&gt;tracker&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;track_invocation_success&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Skip Logic for Conditional Routing
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;execute_node&lt;/code&gt; callback implements skip logic—the core pattern that enables conditional routing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# api/services/agent_service.py - inside _execute_graph
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;execute_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;exec_ctx&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Execute a single node if it was routed to.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_key&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="c1"&gt;# Skip logic: only execute if parent routed to this node
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;exec_ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;_routed_to&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;()):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;_skipped&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;exec_ctx&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;_path&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Track node invocation
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;tracker&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;tracker&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;track_node_invocation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;exec_ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;_prev_key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;tracker&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;track_handoff_success&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;exec_ctx&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;_prev_key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Get edges and valid routes for this node
&lt;/span&gt;    &lt;span class="n"&gt;edges&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_edges&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;valid_routes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;handoff&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;route&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;edges&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;handoff&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;handoff&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;route&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;

    &lt;span class="c1"&gt;# Execute agent with config from this node
&lt;/span&gt;    &lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;create_generic_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_config&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;config_manager&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;valid_routes&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;valid_routes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;_run_async&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ainvoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;exec_ctx&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="c1"&gt;# Track tool calls
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;tracker&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_calls&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_calls&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
            &lt;span class="n"&gt;tracker&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;track_tool_call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Route to next node: add to _routed_to set
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;edges&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;next_key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_select_next_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;edges&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tracker&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;next_key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;exec_ctx&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;_routed_to&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;next_key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;exec_ctx&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;_prev_key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;p&gt;The &lt;code&gt;_routed_to&lt;/code&gt; set tracks which nodes should execute:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Start&lt;/strong&gt;: Add root node to &lt;code&gt;_routed_to&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;traverse() visits each node&lt;/strong&gt;: If node is in &lt;code&gt;_routed_to&lt;/code&gt;, execute it; otherwise skip&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;After execution&lt;/strong&gt;: Add the next node (based on routing decision) to &lt;code&gt;_routed_to&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This enables conditional routing: the supervisor routes to either security OR support, and only the chosen path executes.&lt;/p&gt;



&lt;h3&gt;
  
  
  Routing Between Nodes
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;_select_next_node&lt;/code&gt; method determines which node to route to based on the agent's routing decision:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# api/services/agent_service.py
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_select_next_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;edges&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tracker&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Select next node key based on routing decision.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;routing&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;routing_decision&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;routing_decision&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;

    &lt;span class="c1"&gt;# Build route map: route -&amp;gt; target_config
&lt;/span&gt;    &lt;span class="n"&gt;route_map&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;edge&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;edges&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;route&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;edge&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;handoff&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;route&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;edge&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;handoff&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;route&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;route_map&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;route&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;edge&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;target_config&lt;/span&gt;

    &lt;span class="c1"&gt;# Exact match
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;routing&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;routing&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;route_map&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;route_map&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;routing&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;routing&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;tracker&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;tracker&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;track_handoff_failure&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="c1"&gt;# Default: first edge
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;edges&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;edges&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;target_config&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key insight: your graph topology comes from LaunchDarkly, not hardcoded orchestration. Change the graph in the UI, and your code picks up the new structure on the next request.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 5: Run It
&lt;/h2&gt;

&lt;p&gt;With the &lt;code&gt;AgentService&lt;/code&gt; wired up (as shown in Step 4), you can now process messages through your Agent Graph. The service handles:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Building the LaunchDarkly context for targeting&lt;/li&gt;
&lt;li&gt;Fetching the graph and executing nodes via &lt;code&gt;traverse()&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Tracking metrics for monitoring&lt;/li&gt;
&lt;li&gt;Returning the final response&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Test it by sending a message:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;service&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AgentService&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;service&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;process_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user-123&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s the status of my order?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;user_context&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;plan&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;premium&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now go back to the LaunchDarkly UI. Add a new node or change an edge. Run your code again. Topology changes are picked up by your traversal code on subsequent SDK evaluations.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 6: Monitor Agent Performance
&lt;/h2&gt;

&lt;p&gt;This is the key differentiator: monitoring happens &lt;strong&gt;on the graph itself&lt;/strong&gt;, not in a separate dashboard. You see metrics overlaid on the same visual topology you built, so bottlenecks are immediately obvious.&lt;/p&gt;

&lt;p&gt;The sample repo includes full instrumentation: calls to &lt;code&gt;tracker.track_success()&lt;/code&gt;, &lt;code&gt;tracker.track_error()&lt;/code&gt;, and &lt;code&gt;tracker.track_tool_call()&lt;/code&gt; in the agent execution path. After running some traffic, open your Agent Graph to see the results.&lt;/p&gt;

&lt;p&gt;Navigate to &lt;strong&gt;AI &amp;gt; Agent graphs &amp;gt; chatbot-flow&lt;/strong&gt;. You'll see a metrics bar at the top of the graph view where you can toggle different metrics on and off.&lt;/p&gt;

&lt;h3&gt;
  
  
  Metrics on the graph
&lt;/h3&gt;

&lt;p&gt;Here's what makes this different from traditional APM: the metrics appear &lt;strong&gt;directly on your workflow visualization&lt;/strong&gt;. No mental mapping between a dashboard and your code. No correlating trace IDs. The slow node lights up on the graph.&lt;/p&gt;

&lt;p&gt;Turn on &lt;strong&gt;Latency&lt;/strong&gt; to see duration data overlaid directly on your graph:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Total duration&lt;/strong&gt;: The combined time for the entire graph invocation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Per-node duration&lt;/strong&gt;: How long each individual agent takes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Turn on &lt;strong&gt;Invocations&lt;/strong&gt; to see how often each node is reached. This reveals which paths your users take most frequently. In a routing graph, you'll quickly see whether most queries go through security or skip directly to support.&lt;/p&gt;

&lt;p&gt;Turn on &lt;strong&gt;Tool calls&lt;/strong&gt; to see the average number of tool invocations per node. If an agent is calling tools excessively, you'll spot it here.&lt;/p&gt;

&lt;h3&gt;
  
  
  Monitoring page
&lt;/h3&gt;

&lt;p&gt;Click &lt;strong&gt;Monitoring&lt;/strong&gt; to see all metrics over time. This view shows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Latency trends&lt;/strong&gt;: Duration per node over hours, days, or weeks&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Invocation patterns&lt;/strong&gt;: Traffic flow through your graph&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tool call breakdown&lt;/strong&gt;: Which specific tools are being called and how often&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fifux67pmlldhocvq0gmu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fifux67pmlldhocvq0gmu.png" alt="Monitoring dashboard" width="800" height="367"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;



&lt;p&gt;To see which specific tools are called, you need to track them in your code using the tracker. The SDK sends this data to LaunchDarkly, which displays it in the monitoring view.&lt;/p&gt;



&lt;h3&gt;
  
  
  Generate traffic to see metrics
&lt;/h3&gt;

&lt;p&gt;Run the traffic generator from the sample repo to send queries through your graph:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;uv run python tools/traffic_generator.py &lt;span class="nt"&gt;--queries&lt;/span&gt; 20 &lt;span class="nt"&gt;--delay&lt;/span&gt; 2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This sends a mix of queries (some with PII, some without) to exercise both the security and support paths. After a few minutes, you'll see metrics populate on the graph.&lt;/p&gt;

&lt;h3&gt;
  
  
  Detecting a slow agent
&lt;/h3&gt;

&lt;p&gt;With traffic flowing, suppose the security-agent starts averaging 5 seconds per call. With latency metrics enabled on the graph, you see it immediately: the security-agent node shows a high duration value while other nodes stay fast.&lt;/p&gt;

&lt;p&gt;The invocation numbers also tell a story. If security-agent shows 50 invocations and support-agent shows 80, you know ~30 queries are bypassing security (the clean path). This helps you understand whether the slow agent is affecting most users or just a subset.&lt;/p&gt;

&lt;p&gt;Without Agent Graphs, you'd need custom logging, Datadog queries, and manual correlation. With Agent Graphs, you see the problem in 30 seconds.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 7: Fix Without Deploying
&lt;/h2&gt;

&lt;p&gt;The security-agent is slow because it's using &lt;code&gt;claude-sonnet-4&lt;/code&gt; for PII detection. A smaller, faster model may be sufficient for this task.&lt;/p&gt;

&lt;p&gt;In the LaunchDarkly dashboard, update the &lt;code&gt;pii-detector&lt;/code&gt; variation:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Change model from &lt;code&gt;Anthropic.claude-sonnet-4-20250514&lt;/code&gt; to &lt;code&gt;Anthropic.claude-3-haiku-20240307&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Or use &lt;a href="https://launchdarkly.com/docs/tutorials/agent-skills-quickstart" rel="noopener noreferrer"&gt;Agent Skills&lt;/a&gt; to make the change from your coding assistant:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;The security-agent pii-detector variation is averaging 5 seconds.
Change the model to claude-3-haiku-20240307.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No code changes. No deploy. Changes are picked up on subsequent SDK evaluations.&lt;/p&gt;

&lt;p&gt;Run the traffic generator again and watch the latency drop.&lt;/p&gt;

&lt;h3&gt;
  
  
  What just happened
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Traffic generator&lt;/strong&gt; sent queries through the graph&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Monitoring&lt;/strong&gt; showed the slow agent on the graph&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Model swap&lt;/strong&gt; happened in the UI (or via Agent Skills)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Your code&lt;/strong&gt; automatically used the new configuration&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;No deploys. No PRs. The fix is live.&lt;/p&gt;

&lt;h2&gt;
  
  
  OpenAI Agents SDK Integration (Conceptual)
&lt;/h2&gt;

&lt;p&gt;Agent Graphs work with multiple frameworks. This conceptual example shows how the pattern translates to OpenAI Agents SDK:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Conceptual example showing how Agent Graph SDK methods work with OpenAI Agents
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;agents&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Runner&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;handle_traversal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_config&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;tracker&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tracker&lt;/span&gt;
    &lt;span class="n"&gt;edges&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_edges&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="c1"&gt;# Child agents are already in state (reverse traversal builds bottom-up)
&lt;/span&gt;    &lt;span class="n"&gt;handoffs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;edge&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;target_config&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;edge&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;edges&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;on_handoff&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Track handoff events
&lt;/span&gt;        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;handoffs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;handoffs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;on_handoff&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;on_handoff&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;agent_graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;is_enabled&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;root&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;agent_graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reverse_traverse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;handle_traversal&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{})&lt;/span&gt;

&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;Runner&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;root&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Tell me about your engineering team&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same graph definition, adapted to each framework's execution model. The topology metadata lives in LaunchDarkly; your code interprets and executes it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Best Practices
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Start simple&lt;/strong&gt;: Begin with a linear graph (A → B → C) before adding conditional routing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use handoff data for context passing&lt;/strong&gt;: Include metadata like action type, reason, or state that the next agent needs to continue the workflow.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Track everything&lt;/strong&gt;: Call &lt;code&gt;tracker.track_success()&lt;/code&gt; and &lt;code&gt;tracker.track_error()&lt;/code&gt; in every node for complete visibility. Use &lt;code&gt;graph_tracker.track_tool_call(tool_name)&lt;/code&gt; to track which tools agents invoke.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Test with targeting&lt;/strong&gt;: Use LaunchDarkly targeting to route test users to experimental graph configurations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Handle missing edges&lt;/strong&gt;: Decide what happens when no edge matches a routing decision or when a target node is disabled. Recommend: fail closed, log diagnostics, and track routing failures.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Keep execution state request-scoped&lt;/strong&gt;: Store execution state inside the context object (&lt;code&gt;ctx&lt;/code&gt;) passed through traversal, not in instance-level variables. Treat graph traversal as request-scoped to avoid concurrency issues.&lt;/p&gt;

&lt;h2&gt;
  
  
  What You've Built
&lt;/h2&gt;

&lt;p&gt;You now have a multi-agent system where:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Graph topology&lt;/strong&gt; is externalized and self-documenting&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Routing logic&lt;/strong&gt; is visible on edges, not buried in code&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Monitoring&lt;/strong&gt; appears on the graph itself, not a separate dashboard&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Node-level control&lt;/strong&gt; lets you disable a single agent without touching others, provided your executor checks node availability&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multiple frameworks&lt;/strong&gt; can consume the same graph metadata&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When you spot a slow agent in monitoring, you can swap the model from the dashboard without a deploy.&lt;/p&gt;

&lt;h2&gt;
  
  
  Next Steps
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://launchdarkly.com/docs/home/ai-configs/agent-graphs" rel="noopener noreferrer"&gt;Agent Graphs Reference&lt;/a&gt;&lt;/strong&gt;: SDK methods for &lt;code&gt;traverse&lt;/code&gt;, &lt;code&gt;reverse_traverse&lt;/code&gt;, &lt;code&gt;get_edges()&lt;/code&gt;, and handoff data&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://launchdarkly.com/docs/home/ai-configs" rel="noopener noreferrer"&gt;AI Configs Documentation&lt;/a&gt;&lt;/strong&gt;: Learn more about variations, targeting, and experiments&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="/https://launchdarkly.com/docs/tutorials/agent-skills-quickstart"&gt;Agent Skills Tutorial&lt;/a&gt;&lt;/strong&gt;: Manage AI Configs from your coding assistant&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://launchdarkly.com/docs/home/ai-configs/monitor" rel="noopener noreferrer"&gt;Monitor AI Configs&lt;/a&gt;&lt;/strong&gt;: Deep dive into metrics and dashboards&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/launchdarkly-labs/devrel-agents-tutorial/tree/tutorial/agent-graphs" rel="noopener noreferrer"&gt;Sample Repository&lt;/a&gt;&lt;/strong&gt;: Complete code from this tutorial&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Hardcoded orchestration was fine when you had one agent. With multi-agent systems, it becomes a liability. Every change requires a deploy. Every incident requires a developer.&lt;/p&gt;

&lt;p&gt;Agent Graphs flip this. Define your workflow in LaunchDarkly, integrate it with your framework, and fix many problems without touching code. Your agents become as dynamic as your feature flags.&lt;/p&gt;

&lt;p&gt;Ready to stop hardcoding? &lt;a href="https://app.launchdarkly.com/signup" rel="noopener noreferrer"&gt;Get started with AI Configs&lt;/a&gt; and create your first Agent Graph.&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>aiops</category>
      <category>architecture</category>
    </item>
    <item>
      <title>LLM evaluation guide: When to add online evals to your AI application</title>
      <dc:creator>Scarlett Attensil</dc:creator>
      <pubDate>Wed, 17 Dec 2025 17:42:49 +0000</pubDate>
      <link>https://forem.com/launchdarkly/llm-evaluation-guide-when-to-add-online-evals-to-your-ai-application-mo5</link>
      <guid>https://forem.com/launchdarkly/llm-evaluation-guide-when-to-add-online-evals-to-your-ai-application-mo5</guid>
      <description>&lt;h2&gt;
  
  
  The quick decision framework
&lt;/h2&gt;



&lt;p&gt;Online evals for AI Configs is currently in closed beta. Judges must be installed in your project before they can be attached to AI Config variations.&lt;/p&gt;



&lt;p&gt;Online evals provide real-time quality monitoring for LLM applications. Using LLM-as-a-judge methodology, they run automated quality checks on a configurable percentage of your production traffic, producing structured scores and pass/fail judgments you can act on programmatically. LaunchDarkly includes three built-in judges: &lt;strong&gt;accuracy&lt;/strong&gt;, &lt;strong&gt;relevance&lt;/strong&gt;, and &lt;strong&gt;toxicity&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Skip online evals if:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Your checks are purely deterministic (schema validation, compile tests)&lt;/li&gt;
&lt;li&gt;You have low volume and can manually review outputs in observability dashboards&lt;/li&gt;
&lt;li&gt;You're primarily debugging execution problems&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Add online evals when:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You need quantified quality scores to trigger automated actions (rollback, rerouting, alerts)&lt;/li&gt;
&lt;li&gt;Manual quality review doesn't scale to your traffic volume&lt;/li&gt;
&lt;li&gt;You're measuring multiple quality dimensions (accuracy, relevance, toxicity)&lt;/li&gt;
&lt;li&gt;You want statistical quality trends across segments for AI governance and compliance&lt;/li&gt;
&lt;li&gt;You need to monitor token usage and cost alongside quality metrics&lt;/li&gt;
&lt;li&gt;You're running A/B tests or guarded releases and need automated quality gates&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Most teams add them within 2-3 sprints when manual quality review becomes the bottleneck. Configurable sampling rates let you balance evaluation coverage with cost and latency.&lt;/p&gt;

&lt;h2&gt;
  
  
  Online evals vs. LLM observability
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;LLM observability shows you what happened. Online evals automatically assess quality and trigger actions based on those assessments.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  LLM observability: your security camera
&lt;/h3&gt;

&lt;p&gt;LLM observability shows you everything that happened through distributed tracing: full conversations, tool calls, token usage, latency breakdowns, and cost attribution. Perfect for debugging and understanding what went wrong. But when you're handling 10,000 conversations daily, manually reviewing them for quality patterns doesn't scale.&lt;/p&gt;

&lt;h3&gt;
  
  
  Online evals: your security guard
&lt;/h3&gt;

&lt;p&gt;Automatically scores every sampled request using LLM-as-a-judge methodology across your quality rubric (accuracy, relevance, toxicity) and takes action. Instead of exporting conversations to spreadsheets for manual review, you get real-time quality monitoring with drift detection that triggers alerts, rollbacks, or rerouting.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The 3 AM difference&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Without evals: "Let's meet tomorrow to review samples and decide if we should rollback."&lt;/p&gt;

&lt;p&gt;With evals: "Quality dropped below threshold, automatic rollback triggered, here's what failed..."&lt;/p&gt;

&lt;h2&gt;
  
  
  How online evals actually work
&lt;/h2&gt;

&lt;p&gt;LaunchDarkly's online evals use LLM-as-a-judge methodology with three built-in judges you can configure directly in the dashboard. No code changes required.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Getting started:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Install judges from the AI Configs menu&lt;/li&gt;
&lt;li&gt;Attach judges to AI Config variations&lt;/li&gt;
&lt;li&gt;Configure sampling rates (balance coverage with cost/latency)&lt;/li&gt;
&lt;li&gt;Evaluation metrics are automatically emitted as custom events&lt;/li&gt;
&lt;li&gt;Metrics are automatically available for A/B tests and guarded releases&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;What you get from each built-in judge:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Accuracy judge:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"score"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.85&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"reasoning"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Response correctly answered the question but missed one edge case regarding error handling"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Relevance judge:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"score"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.92&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"reasoning"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Response directly addressed the user's query with appropriate context and examples"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Toxicity judge:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"score"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"reasoning"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Content is professional and appropriate with no toxic language detected"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each judge returns a score from 0.0 to 1.0 plus reasoning that explains the assessment. LaunchDarkly's built-in judges (accuracy, relevance, toxicity) have fixed evaluation criteria and are configured only by selecting the provider and model.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Configuration:&lt;/strong&gt;&lt;br&gt;
Install judges from the AI Configs menu in your LaunchDarkly dashboard. They appear as pre-configured AI configs (AI Judge - Accuracy, AI Judge - Toxicity, AI Judge - Relevance). When configuring your AI Config variations in completion mode, select which judges to attach with your desired sampling rate. Use different judge combinations for different environments to match your quality requirements and cost constraints.&lt;/p&gt;

&lt;h2&gt;
  
  
  Real problems online evals solve
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Scale for production applications:&lt;/strong&gt; Your SQL generator handles 50,000 queries daily. LLM observability shows you every query through distributed tracing. Online evals tell you the proportion that are semantically wrong, automatically, with hallucination detection built in.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Multi-dimensional quality monitoring:&lt;/strong&gt; Customer service AI applications aren't just "did it respond?" It's accuracy, relevance, toxicity, compliance, and appropriateness. Online evals score all dimensions simultaneously, each with its own threshold and reasoning.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;RAG pipeline validation:&lt;/strong&gt; Your retrieval-augmented generation system needs continuous monitoring of both retrieval quality and generation accuracy. Online evals can assess whether retrieved context is relevant and whether the response accurately uses that context, preventing hallucinations and ensuring factual grounding.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost and performance optimization:&lt;/strong&gt; Monitor token usage alongside quality metrics. If certain queries consume 10x more tokens than others, online evals help identify these patterns so you can optimize prompts or routing logic to reduce costs without sacrificing quality.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Actionable metrics for AI governance:&lt;/strong&gt; Transform 10,000 responses from data to decisions with evaluator-driven quality gates:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Accuracy trending below 0.8? Automated alerts to the team&lt;/li&gt;
&lt;li&gt;Toxicity above 0.2? Immediate review and potential rollback&lt;/li&gt;
&lt;li&gt;Relevance dropping for specific user segments? Targeted configuration updates&lt;/li&gt;
&lt;li&gt;Metrics automatically feed A/B tests and guarded releases for continuous improvement&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Example implementation path
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Week 1-2: Define quality dimensions and install judges.&lt;/strong&gt;&lt;br&gt;
Use LLM observability alone first. Manually review samples to understand your system. Define your quality dimensions: accuracy, relevance, toxicity, or other criteria specific to your application. Install the built-in judges from the AI Configs menu in LaunchDarkly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Week 3-4: Attach judges with sampling.&lt;/strong&gt;&lt;br&gt;
Attach judges to AI Config variations in LaunchDarkly. Start with one or two key judges (accuracy and relevance are good defaults). Configure sampling rates between 10-20% of traffic to balance coverage with cost and latency. Compare automated scores with human judgment to validate the judges work for your use case.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Week 5+: Operationalize with quality gates.&lt;/strong&gt;&lt;br&gt;
Add more evaluation dimensions as you learn. Connect scores to automated actions and evaluator-driven quality gates: when accuracy drops below 0.7, trigger alerts; when toxicity exceeds 0.2, investigate immediately. Leverage the custom events and metrics for A/B testing and guarded releases to continuously improve your application's performance.&lt;/p&gt;

&lt;h2&gt;
  
  
  The bottom line
&lt;/h2&gt;

&lt;p&gt;You don't need online evals on day one. Start with LLM observability to understand your AI system through distributed tracing. Add evaluations when you hear yourself saying "we need to review more conversations" or "how do we know if quality is degrading?"&lt;/p&gt;

&lt;p&gt;LaunchDarkly's three built-in judges (accuracy, relevance, toxicity) provide LLM-as-a-judge evaluation that you can attach to any AI Config variation in &lt;strong&gt;completion mode&lt;/strong&gt; with configurable sampling rates. Note that online evals currently only work with completion mode AI Configs. Agent-based configs are not yet supported. Evaluation metrics are automatically emitted as custom events and feed directly into A/B tests and guarded releases, enabling continuous AI governance and quality improvement without code changes. Start simple with one judge, learn what matters for your application, and expand from there.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;LLM observability is your security camera. Online evals are your security guard.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Next steps
&lt;/h2&gt;

&lt;p&gt;Ready to get started? &lt;a href="https://launchdarkly.com/start-trial/" rel="noopener noreferrer"&gt;Sign up for a free LaunchDarkly account&lt;/a&gt; if you haven't already.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Build a complete quality pipeline:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://launchdarkly.com/docs/tutorials/aic-cicd" rel="noopener noreferrer"&gt;AI Config CI/CD Pipeline&lt;/a&gt; - Add automated quality gates and LLM-as-a-judge testing to your deployment process&lt;/li&gt;
&lt;li&gt;Combine offline evaluation (in CI/CD) with online evals (in production) for comprehensive quality coverage&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Learn more about AI Configs:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://launchdarkly.com/docs/home/ai-configs" rel="noopener noreferrer"&gt;AI Config documentation&lt;/a&gt; - Understand how AI Configs enable real-time LLM configuration&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://launchdarkly.com/docs/home/ai-configs/online-evaluations" rel="noopener noreferrer"&gt;Online evals documentation&lt;/a&gt; - Deep dive into judge installation and configuration&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://launchdarkly.com/docs/home/metrics/guardrail-metrics" rel="noopener noreferrer"&gt;Guardrail metrics&lt;/a&gt; - Monitor quality during A/B tests and guarded releases&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;See it in action:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://launchdarkly.com/docs/home/observability/llm-observability" rel="noopener noreferrer"&gt;Check LLM observability in the LaunchDarkly dashboard&lt;/a&gt; to track your AI application performance with distributed tracing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Industry standards:&lt;/strong&gt;&lt;br&gt;
LaunchDarkly's approach aligns with emerging AI observability standards, including OpenTelemetry's semantic conventions for AI monitoring, ensuring your evaluation infrastructure integrates with the broader observability ecosystem.&lt;/p&gt;

</description>
      <category>evals</category>
      <category>agents</category>
      <category>ai</category>
      <category>observability</category>
    </item>
    <item>
      <title>When to Use Prompt-Based vs Agent Mode in LaunchDarkly for AI Applications</title>
      <dc:creator>Scarlett Attensil</dc:creator>
      <pubDate>Wed, 17 Dec 2025 17:39:09 +0000</pubDate>
      <link>https://forem.com/launchdarkly/when-to-use-prompt-based-vs-agent-mode-in-launchdarkly-for-ai-applications-5f3g</link>
      <guid>https://forem.com/launchdarkly/when-to-use-prompt-based-vs-agent-mode-in-launchdarkly-for-ai-applications-5f3g</guid>
      <description>&lt;h1&gt;
  
  
  A Guide for LangGraph, OpenAI, and Multi-Agent Systems
&lt;/h1&gt;

&lt;p&gt;The broader tech industry can't agree on what the term "agents" even means. &lt;a href="https://www.anthropic.com/research/building-effective-agents" rel="noopener noreferrer"&gt;Anthropic defines agents&lt;/a&gt; as systems where "LLMs dynamically direct their own processes," while Vercel's AI SDK enables &lt;a href="https://sdk.vercel.ai/docs/concepts/tools" rel="noopener noreferrer"&gt;multi-step agent loops with tools&lt;/a&gt;, and &lt;a href="https://platform.openai.com/docs/guides/agents-sdk" rel="noopener noreferrer"&gt;OpenAI provides an Agents SDK&lt;/a&gt; with built-in orchestration. So when you're creating an AI Config in LaunchDarkly and see "prompt-based mode" vs. "agent mode," you might reasonably expect this choice to determine whether you get automatic tool execution loops, server-side state management, or some other fundamental capability difference.&lt;/p&gt;

&lt;p&gt;But LaunchDarkly's distinction is different and more practical. Understanding it will save you from confusion and help you ship AI features faster.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;LaunchDarkly's "prompt-based vs. agent" choice is about &lt;strong&gt;input schemas and framework compatibility&lt;/strong&gt;, not execution automation. &lt;strong&gt;Prompt-based mode&lt;/strong&gt; returns a messages array (perfect for chat UIs), while &lt;strong&gt;agent mode&lt;/strong&gt; returns an instructions string (optimized for LangGraph/CrewAI frameworks). Both provide the same core benefits: provider abstraction, A/B testing, metrics tracking, and the ability to change AI behavior without deploying code.&lt;/p&gt;

&lt;p&gt;&lt;br&gt;
&lt;strong&gt;Ready to start?&lt;/strong&gt; &lt;a href="http://app.launchdarkly.com/signup" rel="noopener noreferrer"&gt;Sign up for a free trial&lt;/a&gt; → &lt;a href="https://launchdarkly.com/docs/home/ai-configs/create" rel="noopener noreferrer"&gt;create your first AI Config&lt;/a&gt; → Choose your mode → Configure and ship.&lt;br&gt;
&lt;/p&gt;

&lt;h2&gt;
  
  
  The fragmented AI landscape
&lt;/h2&gt;

&lt;p&gt;LaunchDarkly supports 20+ AI providers: OpenAI, Anthropic, Gemini, Azure, Bedrock, Cohere, Mistral, DeepSeek, Perplexity, and more. Each has their own interpretation of "completions" vs "agents," creating a chaotic ecosystem with different API endpoints, execution behaviors, state management approaches, and capability limitations. This fragmentation makes it difficult to switch providers or even understand what capabilities you're getting. That's where LaunchDarkly's abstraction layer comes in.&lt;/p&gt;

&lt;h2&gt;
  
  
  LaunchDarkly's approach: provider-agnostic input schemas
&lt;/h2&gt;

&lt;p&gt;LaunchDarkly's AI Configs are a &lt;strong&gt;configuration layer&lt;/strong&gt; that abstracts provider differences. When you choose prompt-based mode or agent mode, you're selecting an &lt;strong&gt;input schema&lt;/strong&gt; (messages array vs. instructions string), not execution behavior. LaunchDarkly provides the configuration; you handle orchestration with your own code or frameworks like LangGraph. This gives you provider abstraction, A/B testing, metrics tracking, and online evals (prompt-based mode only) without locking you into any specific provider's execution model.&lt;/p&gt;



&lt;p&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu4llid4exa6z8zge6pza.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu4llid4exa6z8zge6pza.png" alt="AI Config Mode Selection" width="480" height="369"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;



&lt;h3&gt;
  
  
  Prompt-based mode: messages-based
&lt;/h3&gt;

&lt;p&gt;Prompt-based mode uses a &lt;strong&gt;messages array&lt;/strong&gt; format with system/user/assistant roles (some providers like OpenAI also support a "developer" role for more granular control). This is the traditional chat format that works across all AI providers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;UI Input&lt;/strong&gt;: "Messages" section with role-based messages&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;SDK Method&lt;/strong&gt;: &lt;code&gt;aiclient.config()&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Returns&lt;/strong&gt;: Customized prompt + model configuration&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Documentation&lt;/strong&gt;: &lt;a href="https://launchdarkly.com/docs/sdk/features/ai-config" rel="noopener noreferrer"&gt;AI Config docs&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Retrieve prompt-based AI config
&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;aiclient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;config&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;customer-support&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;default_value&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;default_config&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# What you get back: messages array
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# [
#   {
#     "role": "system",
#     "content": "You are a helpful customer support agent for Acme Corp."
#   },
#   {
#     "role": "user",
#     "content": "How can I reset my password?"
#   }
# ]
&lt;/span&gt;
&lt;span class="c1"&gt;# Use with provider SDKs that expect message arrays
&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;  &lt;span class="c1"&gt;# Standard message format
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;When to use prompt-based mode:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;You're building chat-style interactions&lt;/strong&gt;: Traditional message-based conversations where you construct system/user/assistant messages&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You need online evals&lt;/strong&gt;: LaunchDarkly's model-agnostic online evals are currently only available in prompt-based mode&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You want granular control of workflows&lt;/strong&gt;: Discrete steps that need to be accomplished in a specific order, or multi-step asynchronous processes where each step executes independently&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;One-off evaluations&lt;/strong&gt;: Issue individual evaluations of your prompts and completions (not online evals)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Simple processing tasks&lt;/strong&gt;: Summarization, name suggestions, or other non-context-exceeding data processing&lt;/li&gt;
&lt;/ol&gt;



&lt;p&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm4kdb0ces45b66xc3ag6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm4kdb0ces45b66xc3ag6.png" alt="Prompt-Based Mode Messages UI" width="800" height="567"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;



&lt;h3&gt;
  
  
  Agent mode: goal/instructions-based
&lt;/h3&gt;

&lt;p&gt;Agent mode uses a &lt;strong&gt;single instructions string&lt;/strong&gt; format that describes the agent's goal or task. This format is optimized for agent orchestration frameworks that expect high-level objectives rather than conversational messages.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;UI Input&lt;/strong&gt;: "Goal or task" field with instructions&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;SDK Method&lt;/strong&gt;: &lt;code&gt;aiclient.agent()&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Returns&lt;/strong&gt;: Customized instructions + model configuration&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Examples&lt;/strong&gt;: &lt;a href="https://github.com/launchdarkly/hello-python-ai/blob/main/examples" rel="noopener noreferrer"&gt;hello-python-ai examples&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Retrieve agent-based AI config
&lt;/span&gt;&lt;span class="n"&gt;agent_config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;aiclient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;research-assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;default_value&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;default_config&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# What you get back: instructions string
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;agent_config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# "You are a research assistant. Your goal is to gather comprehensive
# information on the requested topic using available search tools.
# Search multiple sources, synthesize findings, and provide a detailed
# summary with citations."
&lt;/span&gt;
&lt;span class="c1"&gt;# Use with agent frameworks that expect instructions
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langgraph.prebuilt&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;create_react_agent&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;init_chat_model&lt;/span&gt;

&lt;span class="n"&gt;llm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;init_chat_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;agent_config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;model_provider&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;agent_config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;provider&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;create_react_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;search_tool&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;citation_tool&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;agent_config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;instructions&lt;/span&gt;  &lt;span class="c1"&gt;# Goal/task instructions
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Execute and track
&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}]})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;When to use agent mode:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;You're using agent frameworks&lt;/strong&gt;: LangGraph, LangChain, CrewAI, AutoGen, or LlamaIndex Workflows expect goal/instruction-based inputs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Goal-oriented tasks&lt;/strong&gt;: "Research X and create Y" rather than conversational message exchange&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tool-driven workflows&lt;/strong&gt;: While both modes support tools, agent mode's format is optimized for frameworks that orchestrate tool usage&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Open-ended exploration&lt;/strong&gt;: The output is open-ended and you don't know the actual answer you're trying to get to&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data as an application&lt;/strong&gt;: You want to treat your data as an application to feed in arbitrary data and ask questions about it&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Provider agent endpoints&lt;/strong&gt;: LaunchDarkly may route to provider-specific agent APIs when available (note: not all models support agent mode; check your model's capabilities)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;See example:&lt;/strong&gt; &lt;a href="https://launchdarkly.com/docs/tutorials/agents-langgraph" rel="noopener noreferrer"&gt;Build a LangGraph Multi-Agent System with LaunchDarkly&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick comparison
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
  &lt;tr&gt;
    &lt;th&gt;Feature&lt;/th&gt;
    &lt;th&gt;Prompt-Based Mode&lt;/th&gt;
    &lt;th&gt;Agent Mode&lt;/th&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;Input format&lt;/td&gt;
    &lt;td&gt;Messages (system/user/assistant)&lt;/td&gt;
    &lt;td&gt;Goal/task + instructions&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;Tools support&lt;/td&gt;
    &lt;td&gt;✅ Yes&lt;/td&gt;
    &lt;td&gt;✅ Yes&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;SDK method&lt;/td&gt;
    &lt;td&gt;&lt;code&gt;config()&lt;/code&gt;&lt;/td&gt;
    &lt;td&gt;&lt;code&gt;agent()&lt;/code&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;Automatic execution loop&lt;/td&gt;
    &lt;td&gt;❌ No (you orchestrate)&lt;/td&gt;
    &lt;td&gt;❌ No (you orchestrate)&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;Online evals&lt;/td&gt;
    &lt;td&gt;✅ Available&lt;/td&gt;
    &lt;td&gt;❌ Not yet available&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;Best for&lt;/td&gt;
    &lt;td&gt;Chat-style prompting, single completions&lt;/td&gt;
    &lt;td&gt;Agent frameworks, goal-oriented tasks&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;Provider endpoint&lt;/td&gt;
    &lt;td&gt;Standard endpoint&lt;/td&gt;
    &lt;td&gt;May use provider-specific agent endpoint if available&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;Model support&lt;/td&gt;
    &lt;td&gt;All models&lt;/td&gt;
    &lt;td&gt;Most models (check model card for "Agent mode" capability)&lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
&lt;strong&gt;Model compatibility&lt;/strong&gt;: Not all models support agent mode. When selecting a model in LaunchDarkly, check the model card for "Agent mode" capability. Models like GPT-4.1, GPT-5 mini, Claude Haiku 4.5, Claude Sonnet 4.5, Claude Sonnet 4, Grok Code Fast 1, and Raptor mini support agent mode, while models focused on reasoning (like GPT-5, Claude Opus 4.1) may only support prompt-based mode.&lt;br&gt;
&lt;/p&gt;

&lt;h2&gt;
  
  
  How providers handle "completion vs agent"
&lt;/h2&gt;

&lt;p&gt;To understand why LaunchDarkly's abstraction is valuable, let's look at how major AI providers handle the distinction between basic completions and advanced agent capabilities. The table below shows how different providers implement "advanced" modes; generally these are ADDITIVE, including all basic capabilities plus extras. For example, OpenAI's Responses API includes all Chat Completions features plus additional capabilities.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
  &lt;tr&gt;
    &lt;th&gt;Provider&lt;/th&gt;
    &lt;th&gt;"Basic" Mode&lt;/th&gt;
    &lt;th&gt;"Advanced" Mode&lt;/th&gt;
    &lt;th&gt;Key Difference&lt;/th&gt;
    &lt;th&gt;Link&lt;/th&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;&lt;strong&gt;OpenAI&lt;/strong&gt;&lt;/td&gt;
    &lt;td&gt;Chat Completions API&lt;/td&gt;
    &lt;td&gt;Responses API&lt;/td&gt;
    &lt;td&gt;Responses adds built-in tools (web_search, file_search, computer_use, code_interpreter, remote MCP), server-side conversation state with stored IDs, and improved streaming. Chat Completions remains supported.&lt;/td&gt;
    &lt;td&gt;&lt;a href="https://platform.openai.com/docs/guides/responses-vs-chat-completions" rel="noopener noreferrer"&gt;Docs&lt;/a&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;&lt;strong&gt;Anthropic&lt;/strong&gt;&lt;/td&gt;
    &lt;td&gt;Tool Use (client tools)&lt;/td&gt;
    &lt;td&gt;Tool Use (client + server tools)&lt;/td&gt;
    &lt;td&gt;Server tools (web_search, web_fetch) execute on Anthropic's servers. You can use both client and server tools together&lt;/td&gt;
    &lt;td&gt;&lt;a href="https://docs.claude.com/en/docs/agents-and-tools/tool-use/overview" rel="noopener noreferrer"&gt;Docs&lt;/a&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;&lt;strong&gt;Google Gemini&lt;/strong&gt;&lt;/td&gt;
    &lt;td&gt;Manual function calling&lt;/td&gt;
    &lt;td&gt;Automatic function calling (Python SDK)&lt;/td&gt;
    &lt;td&gt;Python SDK auto-converts functions to schemas, runs the execution loop, and supports compositional multi-step calls. Manual mode: full control, all platforms&lt;/td&gt;
    &lt;td&gt;&lt;a href="https://ai.google.dev/gemini-api/docs/function-calling" rel="noopener noreferrer"&gt;Docs&lt;/a&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;&lt;strong&gt;Vercel AI SDK&lt;/strong&gt;&lt;/td&gt;
    &lt;td&gt;&lt;code&gt;generateText()&lt;/code&gt;&lt;/td&gt;
    &lt;td&gt;
&lt;code&gt;generateText()&lt;/code&gt; with multi-step loop&lt;/td&gt;
    &lt;td&gt;Multi-step agent loops with tools; SDK continues until complete; &lt;code&gt;maxSteps&lt;/code&gt; provides loop control to limit steps&lt;/td&gt;
    &lt;td&gt;&lt;a href="https://sdk.vercel.ai/docs/concepts/tools" rel="noopener noreferrer"&gt;Docs&lt;/a&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;&lt;strong&gt;Azure OpenAI&lt;/strong&gt;&lt;/td&gt;
    &lt;td&gt;Assistants API (deprecated)&lt;/td&gt;
    &lt;td&gt;AI Agent Services&lt;/td&gt;
    &lt;td&gt;Enterprise agent runtime with threads, tool orchestration, safety, identity, networking, and observability; includes Responses API and Computer-Using Agent in Azure&lt;/td&gt;
    &lt;td&gt;&lt;a href="https://azure.microsoft.com/en-us/blog/announcing-the-responses-api-and-computer-using-agent-in-azure-ai-foundry/" rel="noopener noreferrer"&gt;Docs&lt;/a&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;&lt;strong&gt;AWS Bedrock (Nova)&lt;/strong&gt;&lt;/td&gt;
    &lt;td&gt;Converse API (tool use)&lt;/td&gt;
    &lt;td&gt;Bedrock Agents&lt;/td&gt;
    &lt;td&gt;Agents: managed service with automatic orchestration + state management + multi-agent collaboration. Converse: manual tool orchestration, full control&lt;/td&gt;
    &lt;td&gt;&lt;a href="https://docs.aws.amazon.com/nova/latest/userguide/agents-use-nova.html" rel="noopener noreferrer"&gt;Docs&lt;/a&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;&lt;strong&gt;Cohere&lt;/strong&gt;&lt;/td&gt;
    &lt;td&gt;Standard chat&lt;/td&gt;
    &lt;td&gt;Command A&lt;/td&gt;
    &lt;td&gt;Command A: enhanced multi-step tool use, REACT agents, ~150% higher throughput&lt;/td&gt;
    &lt;td&gt;&lt;a href="https://docs.cohere.com/docs/command-a" rel="noopener noreferrer"&gt;Docs&lt;/a&gt;&lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This fragmentation across providers is exactly why LaunchDarkly's approach matters: you configure once (messages vs. goals), and LaunchDarkly handles the provider-specific translation. Want to switch from OpenAI to Anthropic? Just change the provider in your AI Config. Your application code stays the same.&lt;/p&gt;

&lt;p&gt;&lt;br&gt;
&lt;strong&gt;Note on OpenAI's ecosystem (Nov 2025)&lt;/strong&gt;: The &lt;a href="https://platform.openai.com/docs/guides/agents-sdk" rel="noopener noreferrer"&gt;Agents SDK&lt;/a&gt; is OpenAI's production-ready orchestration framework. It uses the Responses API by default, and via a built-in LiteLLM adapter it can run against other providers with an OpenAI-compatible shape. Chat Completions is still supported, but OpenAI recommends Responses for new work. The &lt;a href="https://platform.openai.com/docs/assistants/whats-new" rel="noopener noreferrer"&gt;Assistants API is deprecated&lt;/a&gt; and scheduled to shut down on August 26, 2026.&lt;br&gt;
&lt;/p&gt;

&lt;h2&gt;
  
  
  Common misconceptions
&lt;/h2&gt;

&lt;p&gt;Now that you understand the modes and how they differ from provider-specific implementations, let's clear up some common points of confusion:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;❌ "Agent mode provides automatic execution"&lt;/strong&gt;&lt;br&gt;
No. Both modes require you to orchestrate. Agent mode just provides a different input schema.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;❌ "Agent mode is for complex tasks, prompt-based mode is for simple ones"&lt;/strong&gt;&lt;br&gt;
Not quite. It's about input format and framework compatibility, not task complexity.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;❌ "I can only use tools in agent mode"&lt;/strong&gt;&lt;br&gt;
False. Both modes support tools. The difference is how you specify your task (messages vs. goal).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;❌ "LaunchDarkly is an agent framework like LangGraph"&lt;/strong&gt;&lt;br&gt;
No. LaunchDarkly is configuration management for AI. Use it WITH frameworks like LangGraph, not instead of them.&lt;/p&gt;
&lt;h2&gt;
  
  
  Why LaunchDarkly's abstraction matters
&lt;/h2&gt;

&lt;p&gt;Now that you've seen how fragmented the provider landscape is, let's explore the practical value of LaunchDarkly's abstraction layer.&lt;/p&gt;
&lt;h3&gt;
  
  
  Switching providers without code changes
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Without LaunchDarkly:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Hardcoded provider and prompts in your application
&lt;/span&gt;&lt;span class="n"&gt;openai_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;openai_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Want to switch to Claude? Need to deploy new code
&lt;/span&gt;    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are helpful&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;  &lt;span class="c1"&gt;# Want to A/B test prompts? Deploy again
&lt;/span&gt;        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Hello&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# To switch providers, you need to:
# 1. Write new code for different provider API
# 2. Deploy to production
# 3. Hope nothing breaks
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;With LaunchDarkly:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Get config from LaunchDarkly
&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;aiclient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;config&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;my-ai-config&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# You still write provider-specific code, but only once
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;provider&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;openai&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;      &lt;span class="c1"&gt;# Comes from LaunchDarkly
&lt;/span&gt;        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;      &lt;span class="c1"&gt;# Normalized schema across providers
&lt;/span&gt;        &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parameters&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;temperature&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;provider&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bedrock&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;bedrock-runtime&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;region_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;us-east-1&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;converse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;modelId&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;    &lt;span class="c1"&gt;# Comes from LaunchDarkly
&lt;/span&gt;        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;_convert_to_bedrock_format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;  &lt;span class="c1"&gt;# LaunchDarkly normalizes, you convert
&lt;/span&gt;        &lt;span class="n"&gt;inferenceConfig&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;temperature&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parameters&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;temperature&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Now you can switch providers via LaunchDarkly UI without deployment
# Change prompts, A/B test models, roll out gradually - all via configuration
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The real value:&lt;/strong&gt; Once your code is set up to handle different providers, you can switch between them, change prompts, A/B test models, and roll out changes gradually - all through the LaunchDarkly UI without deploying code. You write the provider handlers once; you manage AI behavior forever.&lt;/p&gt;

&lt;h3&gt;
  
  
  Security and risk management
&lt;/h3&gt;

&lt;p&gt;AI agents can be powerful and potentially risky. With LaunchDarkly AI Configs, you can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Instantly disable problematic models or tools&lt;/strong&gt; without deploying code&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gradually roll out new agent capabilities&lt;/strong&gt; to a small percentage of users first&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Quickly roll back&lt;/strong&gt; if an agent behaves unexpectedly&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Control access by user tier&lt;/strong&gt; (limit powerful tools to trusted users)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Target specific individuals in production&lt;/strong&gt; to test experimental AI behavior in real environments without affecting other users&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When you're not directly coupled to provider APIs, responding to security issues becomes a configuration change instead of an emergency deployment.&lt;/p&gt;

&lt;h2&gt;
  
  
  Advanced: Provider-specific packages (JavaScript/TypeScript)
&lt;/h2&gt;

&lt;p&gt;For JavaScript/TypeScript developers looking to reduce boilerplate even further, LaunchDarkly offers optional provider-specific packages. These work with &lt;strong&gt;both prompt-based and agent modes&lt;/strong&gt; and are purely additive - you don't need them to use LaunchDarkly AI Configs effectively.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Available packages:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://launchdarkly.com/docs/sdk/ai/node-js" rel="noopener noreferrer"&gt;&lt;code&gt;@launchdarkly/server-sdk-ai-openai&lt;/code&gt;&lt;/a&gt; - OpenAI provider&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://launchdarkly.com/docs/sdk/ai/node-js" rel="noopener noreferrer"&gt;&lt;code&gt;@launchdarkly/server-sdk-ai-langchain&lt;/code&gt;&lt;/a&gt; - LangChain provider (works with both LangChain and LangGraph)&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://launchdarkly.com/docs/sdk/ai/node-js" rel="noopener noreferrer"&gt;&lt;code&gt;@launchdarkly/server-sdk-ai-vercel&lt;/code&gt;&lt;/a&gt; - Vercel AI SDK provider&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;What they provide:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Model creation helpers&lt;/strong&gt;: One-line functions like &lt;code&gt;createLangChainModel(aiConfig)&lt;/code&gt; that return fully-configured model instances&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Automatic metrics tracking&lt;/strong&gt;: Integrated metrics collection&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Format conversion utilities&lt;/strong&gt;: Helper functions to translate between schemas&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Example with LangGraph:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Get agent config from LaunchDarkly&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;agentConfig&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;ldClient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;aiAgent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;research-assistant&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// Create LangChain model - config already applied&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;LangChainProvider&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;createLangChainModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;agentConfig&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// Use with LangGraph&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;createReactAgent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;agentConfig&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;instructions&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt; &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Research X&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}]&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;br&gt;
&lt;strong&gt;Production readiness:&lt;/strong&gt; These packages are in &lt;strong&gt;early development&lt;/strong&gt; and not recommended for production. They may change without notice.&lt;br&gt;
&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Python approach:&lt;/strong&gt; The Python SDK takes a different path with built-in convenience methods like &lt;code&gt;track_openai_metrics()&lt;/code&gt; in the single &lt;code&gt;launchdarkly-server-sdk-ai&lt;/code&gt; package. See &lt;a href="https://dev.to/sdk/ai/python"&gt;Python AI SDK reference&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Start building with LaunchDarkly AI configs
&lt;/h2&gt;

&lt;p&gt;You now understand how LaunchDarkly's prompt-based and agent modes provide provider-agnostic configuration for your AI applications. Whether you're building chat interfaces or complex multi-agent systems, LaunchDarkly gives you the flexibility to experiment, iterate, and ship AI features without the complexity of managing multiple provider APIs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Choosing your mode:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Start with prompt-based mode if:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You're building a chat interface or conversational UI&lt;/li&gt;
&lt;li&gt;You need online evaluations for quality monitoring&lt;/li&gt;
&lt;li&gt;You want precise control over multi-step workflows&lt;/li&gt;
&lt;li&gt;You're uncertain which mode fits your use case (it's the more flexible starting point)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Choose agent mode if:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You're integrating with LangGraph, LangChain, CrewAI, or similar frameworks&lt;/li&gt;
&lt;li&gt;Your task is goal-oriented rather than conversational ("Research X and create Y")&lt;/li&gt;
&lt;li&gt;You're feeding arbitrary data and asking open-ended questions about it&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Remember:&lt;/strong&gt; Both modes give you the same core benefits: provider abstraction, A/B testing, and runtime configuration changes. The choice is about input format, not capabilities.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Get started:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;a href="https://app.launchdarkly.com/signup" rel="noopener noreferrer"&gt;Sign up for a free LaunchDarkly account&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://launchdarkly.com/docs/home/ai-configs/create" rel="noopener noreferrer"&gt;Create your first AI Config&lt;/a&gt;: Takes less than 5 minutes&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/launchdarkly/hello-python-ai/blob/main/examples" rel="noopener noreferrer"&gt;Explore example implementations&lt;/a&gt;: Learn from working code&lt;/li&gt;
&lt;li&gt;Start with prompt-based mode unless you're specifically using an agent framework&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Further reading
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;LaunchDarkly resources:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://launchdarkly.com/docs/home/ai-configs/quickstart" rel="noopener noreferrer"&gt;AI config quickstart guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/home/ai-configs/online-evaluations"&gt;Online evaluations in AI configs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://launchdarkly.com/docs/sdk/ai/python" rel="noopener noreferrer"&gt;Python AI SDK reference&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Provider documentation:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.anthropic.com/research/building-effective-agents" rel="noopener noreferrer"&gt;Anthropic Building Effective Agents&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://ai.google.dev/gemini-api/docs/function-calling" rel="noopener noreferrer"&gt;Google Gemini Function Calling&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://platform.openai.com/docs/guides/responses-vs-chat-completions" rel="noopener noreferrer"&gt;OpenAI Responses API vs Chat Completions&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>agents</category>
      <category>promptengineering</category>
      <category>openai</category>
      <category>ai</category>
    </item>
    <item>
      <title>All I Want for Christmas is Observable Multi-Modal Agentic Systems</title>
      <dc:creator>Scarlett Attensil</dc:creator>
      <pubDate>Wed, 17 Dec 2025 17:31:15 +0000</pubDate>
      <link>https://forem.com/launchdarkly/all-i-want-for-christmas-is-observable-multi-modal-agentic-systems-nk6</link>
      <guid>https://forem.com/launchdarkly/all-i-want-for-christmas-is-observable-multi-modal-agentic-systems-nk6</guid>
      <description>&lt;h1&gt;
  
  
  How Session Replay + Online Evals Revealed How My Holiday Pet App Actually Works
&lt;/h1&gt;

&lt;p&gt;&lt;a href="https://launchdarkly.com/docs/tutorials/observability-multimodal-agents" rel="noopener noreferrer"&gt;Original article&lt;/a&gt; published on December 17, 2025.&lt;/p&gt;

&lt;p&gt;I added LaunchDarkly observability to my Christmas-play pet casting app thinking I'd catch bugs. Instead, I unwrapped the perfect gift 🎁. Session replay shows me WHAT users do, and online evaluations show me IF my model made the right casting decision with real-time accuracy scores. Together, they're like milk 🥛 and cookies 🍪 - each good alone, but magical together for production AI monitoring.&lt;/p&gt;

&lt;h2&gt;
  
  
  See the App in Action
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ful4r0ks4bb4p2tctvy6r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ful4r0ks4bb4p2tctvy6r.png" alt="Welcome Screen" width="800" height="541"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi60zu37vqrktszrvtvd9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi60zu37vqrktszrvtvd9.png" alt="Personality Quiz" width="800" height="508"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fffvekx0yroqmvzpfhjn4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fffvekx0yroqmvzpfhjn4.png" alt="Image Upload" width="800" height="416"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvw7qcmst4s49crskfyon.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvw7qcmst4s49crskfyon.png" alt="Results" width="800" height="544"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdziqlleyug2r4do7qf64.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdziqlleyug2r4do7qf64.png" alt="Results" width="800" height="593"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Discovery #1: Users' 40-second patience threshold
&lt;/h2&gt;

&lt;p&gt;I decided to use session replay to evaluate the average time it took users to go through each step in the AI casting process. Session replay is LaunchDarkly's tool that records user interactions in your app - every click, hover, and page navigation - so you can watch exactly what users experience in real-time.&lt;/p&gt;

&lt;p&gt;The complete AI casting process takes 30-45 seconds: personality analysis (2-3s), role matching (1-2s), DALL-E 3 costume generation (25-35s), and evaluation scoring (2-3s). That's a long time to stare at a loading spinner wondering if something broke.&lt;/p&gt;

&lt;h3&gt;
  
  
  What are progress steps?
&lt;/h3&gt;

&lt;p&gt;Progress steps are UI elements I added to the app - not terminal commands or backend processes, but actual visual indicators in the web interface that show users which phase of the AI generation is currently running. These appear as a simple list in the loading screen, updating in real-time as each AI task completes. No commands needed - they automatically display when the user clicks "Get My Role!" and the AI processing begins.&lt;/p&gt;

&lt;h3&gt;
  
  
  Session replay revealed:
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;WITHOUT Progress Steps (n=20 early sessions):
0-10 seconds: 20/20 still watching (100%)
10-20 seconds: 18/20 still watching (90%)
20-30 seconds: 14/20 still watching (70%) - rage clicks begin
30-40 seconds: 9/20 still watching (45%) - tab switching detected
40+ seconds: 7/20 still watching (35% stay)

WITH Progress Steps (n=30 after adding them):
0-10 seconds: 30/30 still watching (100%)
10-20 seconds: 29/30 still watching (97%)
20-30 seconds: 25/30 still watching (83%)
30-40 seconds: 23/30 still watching (77%)
40+ seconds: 24/30 still watching (80% stay!)

Critical Discovery: Progress steps more than DOUBLED
completion rate (35% → 80%)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  This made the difference:
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Clear progress steps:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Step 1: AI Casting Decision
Step 2: Generating Costume Image (10-30s)
Step 3: Evaluation

As each completes:
✅ Step 1: AI Casting Decision
Step 2: Generating Costume Image (10-30s)
Step 3: Evaluation
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Session replay showed users hovering over the back button at 25 seconds, then relaxing when they saw "Step 2: Generating Costume Image (10-30s)." The moment they understood DALL-E was creating their pet's costume (not the app freezing), they were willing to wait. Clear progress indicators transform anxiety into patience.&lt;/p&gt;

&lt;h2&gt;
  
  
  Discovery #2: Observability + online evaluations give the complete picture
&lt;/h2&gt;

&lt;p&gt;Session replay shows user behavior and experience. Online evaluations expose AI output quality through accuracy scoring. Together, they form a solid strategy for AI observability.&lt;/p&gt;

&lt;p&gt;To see this in action, let's take a closer look at an example.&lt;/p&gt;

&lt;h3&gt;
  
  
  Example: The speed-running corgi owner
&lt;/h3&gt;

&lt;p&gt;In this scenario, a user blazes through the entire pet app setup from the initial quiz to the final results, completing the process in record time. So fast, in fact, that instead of this leading to a favorable outcome, it led to an instance of speed killing quality.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Session Replay Showed:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Quiz completed in 8 seconds (world record) - they clicked the first option for every question&lt;/li&gt;
&lt;li&gt;Skipped photo upload entirely&lt;/li&gt;
&lt;li&gt;Waited the full 31 seconds for processing&lt;/li&gt;
&lt;li&gt;Got their result: "Sheep"&lt;/li&gt;
&lt;li&gt;Started rage clicking on the sheep image immediately&lt;/li&gt;
&lt;li&gt;Left the site without saving or sharing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Why did their energetic corgi get cast as a sheep? The rushed quiz responses created a contradictory personality profile that confused the AI. Without a photo to provide visual context, the model defaulted to its safest, most generic casting choice.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Online Evaluation Results:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Evaluation Score: 38/100 ❌&lt;/li&gt;
&lt;li&gt;Reasoning: "Costume contains unsafe elements: eyeliner, ribbons"&lt;/li&gt;
&lt;li&gt;Wait, what? The AI suggested face paint and ribbons, evaluation said NO&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Online evaluations use a model-agnostic evaluation (MAE) - an AI agent that evaluates other AI outputs for quality, safety, or accuracy. The out-of-the-box evaluation judge is overly cautious about physical safety. For the above scenario, the evaluation comments:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"Costume includes eyeliner which could be harmful to pets" (It's a DALL-E image!)&lt;/li&gt;
&lt;li&gt;"Ribbons pose entanglement risk"&lt;/li&gt;
&lt;li&gt;"Bells are a choking hazard" (It's AI-generated art!)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;About 40% of low scores are actually the evaluation being overprotective about imaginary safety issues, not bad casting.&lt;/p&gt;

&lt;p&gt;Speed-runners get generic roles AND the evaluation writes safety warnings about digital costumes. Users see these low scores and think the app doesn't work well.&lt;/p&gt;

&lt;p&gt;But speed-running isn't the whole story. To truly understand the relationship between user engagement and AI quality, we need to see the flip side. The perfect user. One who gives the AI everything it needed to succeed. What happens when a user takes their time and engages thoughtfully with every step?&lt;/p&gt;

&lt;h3&gt;
  
  
  Example: The perfect match
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Session Replay Showed:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;45 seconds on quiz (reading each option)&lt;/li&gt;
&lt;li&gt;Uploaded photo, waited for processing&lt;/li&gt;
&lt;li&gt;Spent 2 minutes on results page&lt;/li&gt;
&lt;li&gt;Downloaded image multiple times&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Online Evaluation Results:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Evaluation Score: 96/100 ⭐⭐⭐⭐⭐&lt;/li&gt;
&lt;li&gt;Reasoning: "Personality perfectly matches role archetype"&lt;/li&gt;
&lt;li&gt;Photo bonus: "Visual traits enhanced casting accuracy"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Time invested = Quality received. The AI rewards thoughtfulness.&lt;/p&gt;

&lt;h2&gt;
  
  
  Discovery #3: The photo upload comedy gold mine
&lt;/h2&gt;

&lt;p&gt;Session replay revealed what photos people ACTUALLY upload. Without it, you'd never know that one in three photo uploads are problematic, and you'd be flying blind on whether to add validation or trust your model.&lt;/p&gt;

&lt;h3&gt;
  
  
  Example: The surprising photo upload analysis
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Session Replay Showed:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Photo Upload Analysis (n=18 who uploaded):
- 12 (67%) Normal pet photos
- 2 (11%) Screenshots of pet photos on their phone
- 1 (6%) Multiple pets in one photo (chaos)
- 1 (6%) Blurry "pet in motion" disaster
- 1 (6%) Stock photo of their breed (cheater!)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Despite 33% problematic inputs, evaluation scores remained high (87-91/100). The AI is remarkably resilient.&lt;/p&gt;

&lt;h3&gt;
  
  
  Example: When "bad" photos produce great results
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;My Favorite Session:&lt;/strong&gt; Someone uploaded a photo of their cat mid-yawn. The AI vision model described it as "displaying fierce predatory behavior." The cat was cast as a "Protective Father." Evaluation score: 91/100. The owner downloaded it immediately.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Winner:&lt;/strong&gt; Someone's hamster photo that was 90% cage bars. The AI somehow extracted "small fuzzy creature behind geometric patterns" and cast it as "Shepherd" because "clearly experienced at navigating barriers." Evaluation score: 87/100.&lt;/p&gt;

&lt;p&gt;Without session replay, you'd only see evaluation scores and think "the AI is working well." But session replay reveals users are uploading screenshots and blurry photos—input quality issues that could justify adding photo validation.&lt;/p&gt;

&lt;p&gt;However, the high evaluation scores prove the AI handles imperfect real-world data gracefully. This insight saved me from over-engineering photo validation that would have slowed down the user experience for minimal quality gains.&lt;/p&gt;

&lt;p&gt;Session replay + online evaluations together answered the question "Should I add photo validation?" The answer: No. Trust the model's resilience and keep the experience frictionless.&lt;/p&gt;

&lt;h2&gt;
  
  
  The magic formula: Why this combo works (and what surprised me)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Without Observability:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;"The app seems slow" → ¯\&lt;em&gt;(ツ)&lt;/em&gt;/¯&lt;/li&gt;
&lt;li&gt;"We have 20 visitors but 7 completions" → Where do they drop?&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  With Session Replay ONLY:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;"User got sheep and rage clicked; maybe left angry" → Was this a bad match?&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  With Model-Agnostic Evaluation ONLY:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;"Evaluation: 22/100 - Eyeliner unsafe for pets" → How did the user react?&lt;/li&gt;
&lt;li&gt;"Evaluation: 96/100 - Perfect match!" → How did this compare to the image they uploaded?&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  With BOTH:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;"User rushed, got sheep with ribbons, evaluation panicked about safety"&lt;br&gt;
→ The OOTB evaluation treats image generation prompts like real costume instructions&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;"40% of low scores are costume safety, not bad matching"&lt;br&gt;
→ Need custom evaluation criteria (coming soon!)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;"Users might think low score = bad casting, but it's often = protective evaluation"&lt;br&gt;
→ Would benefit from custom evaluation criteria to avoid this confusion&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The evaluation thinks we're putting actual ribbons on actual cats. It doesn't realize these are AI-generated images. So when the casting suggests "sparkly collar with bells," the evaluation judge practically calls animal services.&lt;/p&gt;

&lt;p&gt;Now that you've seen what's possible when you combine user behavior tracking with AI quality scoring, let's walk through how to add this same observability magic to your own multi-modal AI app.&lt;/p&gt;

&lt;h2&gt;
  
  
  Your turn: See the complete picture
&lt;/h2&gt;

&lt;p&gt;Want to add this observability magic to your own app? Here's how:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Install the packages
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; @launchdarkly/observability
npm &lt;span class="nb"&gt;install&lt;/span&gt; @launchdarkly/session-replay
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Initialize with observability
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;initialize&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;launchdarkly-js-client-sdk&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;Observability&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@launchdarkly/observability&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;SessionReplay&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@launchdarkly/session-replay&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;ldClient&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;initialize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;clientId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;user&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;plugins&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Observability&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;SessionReplay&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;privacySetting&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;strict&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="c1"&gt;// Masks all data on the page - see https://launchdarkly.com/docs/sdk/features/session-replay-config#expand-javascript-code-sample&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;
  &lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Configure online evaluations in dashboard
&lt;/h3&gt;

&lt;p&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffp7fj2jn6ivm01v5ggvq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffp7fj2jn6ivm01v5ggvq.png" alt="Install Judges" width="800" height="471"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Create your AI Config in LaunchDarkly for LLM evaluation&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Enable automatic accuracy scoring for production monitoring&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F30nx5ecilcvjwj95imw0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F30nx5ecilcvjwj95imw0.png" alt="Configure Judges" width="800" height="217"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Set accuracy weight to 100% for production AI monitoring&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Monitor your AI outputs with real-time evaluation scoring&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  4. Connect the dots
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Session replay shows you:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Where users drop off&lt;/li&gt;
&lt;li&gt;What confuses them&lt;/li&gt;
&lt;li&gt;When they rage click&lt;/li&gt;
&lt;li&gt;How long they wait&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Online evaluations show you:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AI decision accuracy scores&lt;/li&gt;
&lt;li&gt;Why certain outputs scored low&lt;/li&gt;
&lt;li&gt;Pattern of good vs bad castings&lt;/li&gt;
&lt;li&gt;Safety concerns (even for pixels!)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Together they reveal the complete story of your AI app.&lt;/p&gt;

&lt;h3&gt;
  
  
  Resources to get started:
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/launchdarkly-labs/scarlett-critter-casting" rel="noopener noreferrer"&gt;Full Implementation Guide&lt;/a&gt;&lt;/strong&gt; - See how this pet app implements both features&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://launchdarkly.com/docs/tutorials/detecting-user-frustration-session-replay" rel="noopener noreferrer"&gt;Session Replay Tutorial&lt;/a&gt;&lt;/strong&gt; - Official LaunchDarkly guide for detecting user frustration&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://launchdarkly.com/docs/tutorials/when-to-add-online-evals" rel="noopener noreferrer"&gt;When to Add Online Evals&lt;/a&gt;&lt;/strong&gt; - Learn when and how to implement AI evaluation&lt;/p&gt;

&lt;p&gt;The real magic is in having observability AND online evaluations.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it yourself
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Cast your pet:&lt;/strong&gt; &lt;a href="https://scarlett-critter-casting.onrender.com/" rel="noopener noreferrer"&gt;https://scarlett-critter-casting.onrender.com/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;See your evaluation score ⭐. Understand why your cat is a shepherd and your dog is an angel. The AI has spoken, and now you can see exactly how much to trust it!&lt;/p&gt;




&lt;h2&gt;
  
  
  Ready to add AI observability to your multi-modal agents?
&lt;/h2&gt;

&lt;p&gt;Don't let your AI operate in the dark this holiday season. Get complete visibility into your multi-modal AI systems with LaunchDarkly's online evaluations and session replay.&lt;/p&gt;

&lt;p&gt;&lt;br&gt;
&lt;strong&gt;Get started:&lt;/strong&gt; &lt;a href="https://app.launchdarkly.com/signup" rel="noopener noreferrer"&gt;Sign up for a free trial&lt;/a&gt; → &lt;a href="https://launchdarkly.com/docs/home/ai-configs/create" rel="noopener noreferrer"&gt;Create your first AI Config&lt;/a&gt; → Enable session replay and online evaluations → Ship with confidence.&lt;br&gt;
&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbm74wwnjxo6bzom8u6nk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbm74wwnjxo6bzom8u6nk.png" alt="Another Result" width="800" height="612"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Further reading
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;LaunchDarkly resources:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://launchdarkly.com/docs/home/ai-configs/quickstart" rel="noopener noreferrer"&gt;AI Config Quickstart Guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://launchdarkly.com/docs/home/ai-configs/online-evaluations" rel="noopener noreferrer"&gt;Online Evaluations in AI Configs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://launchdarkly.com/docs/home/observability/session-replay" rel="noopener noreferrer"&gt;Session Replay Documentation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Related tutorials:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://launchdarkly.com/docs/tutorials/detecting-user-frustration-session-replay" rel="noopener noreferrer"&gt;Detecting User Frustration with Session Replay&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://launchdarkly.com/docs/tutorials/agents-langgraph" rel="noopener noreferrer"&gt;Building Multi-Agent Systems with LangGraph&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://launchdarkly.com/docs/tutorials/when-to-add-online-evals" rel="noopener noreferrer"&gt;When to Add Online Evaluations&lt;/a&gt;`&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>observability</category>
      <category>ai</category>
      <category>evals</category>
      <category>agents</category>
    </item>
    <item>
      <title>Day 7 | 🎄✨The Rockefeller tree in NYC: SLOs that actually drive decisions</title>
      <dc:creator>Alexis Roberson</dc:creator>
      <pubDate>Wed, 17 Dec 2025 04:47:02 +0000</pubDate>
      <link>https://forem.com/launchdarkly/day-7-the-rockefeller-tree-in-nyc-slos-that-actually-drive-decisions-1l58</link>
      <guid>https://forem.com/launchdarkly/day-7-the-rockefeller-tree-in-nyc-slos-that-actually-drive-decisions-1l58</guid>
      <description>&lt;p&gt;Originally published in the LaunchDarkly &lt;a href="https://launchdarkly.com/docs/tutorials/o11y-that-drives-decisions" rel="noopener noreferrer"&gt;Docs&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzq0zownrqg56knpwe0k8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzq0zownrqg56knpwe0k8.png" alt=" " width="601" height="630"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Most Subject Level Objectives or SLOs sit in dashboards gathering dust. Seeing that SLOS are performance targets that can be measured, they're extremely important in understanding the quality of a service or system. You define them, measure them, but when your conditions are not met, there’s no followup.&lt;/p&gt;

&lt;p&gt;The biggest drawback is that SLOs are created to add value but if they’re never reinforced, it’s impossible to drive decisions, influence roadmaps, or help during incidents.&lt;/p&gt;

&lt;p&gt;When it comes to defining SLOs, many folks often start at the top of the funnel by picking general metrics to measure, but in order to create SLOs that work, it’s important to understand how the roots impact the leaves.&lt;/p&gt;

&lt;p&gt;In this post, we'll cover the pitfalls leading to out of sync SLOs with a few tips and tricks to ensure what you measure produces business value. You'll also see an example of how to set SLOs in realtime for a flag evaluation feature that you can implement in your own planning process.&lt;/p&gt;

&lt;p&gt;But first, we'll explore a tree metaphor to recap key observability components and how they're influence expands from the roots all the way to the leaves. What if the popular Rockefeller tree in NYC represented the relationship between telemetry data and SLOs?&lt;/p&gt;

&lt;h2&gt;
  
  
  Understanding the Observability Tree
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi31gdfj7opky992d5918.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi31gdfj7opky992d5918.png" alt=" " width="800" height="525"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;SLOs would essentially be the leaves on the Response branch as shown in the above image, meaning they are visible, measurable targets that everyone can see. The leaves would not be possible without the support of the trunk. &lt;/p&gt;

&lt;p&gt;This is your telemetry data or the traces, logs, events you collect from your system. The data you collect acts as the foundation and support for the branches and leaves.&lt;/p&gt;

&lt;p&gt;The roots represent the things you cannot see but are still vital to overall health of your system. This is the hard part of understanding system behavior, debugging unknown unknowns, and making data-driven decisions.&lt;/p&gt;

&lt;p&gt;Most teams skip the roots entirely. They define SLOs using only the trunk (logs, traces, events), measuring things that can already be measured. However, the business outcomes and user behaviors are buried in what would be defined as the roots. Sometimes you have to dig through the soil to ensure your SLOs don't end up technically accurate yet strategically ineffective.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Makes an SLO Decision-Worthy
&lt;/h2&gt;

&lt;p&gt;So what makes a good SLO? The goal of a SLO is to bridge engineering and business needs to support a high quality user experience. And a good SLO depends on three things, business clarity or asking the right questions, measurability, or can these components actually be measured, and actionable targets, or the  game plan for when things go wrong.&lt;/p&gt;

&lt;p&gt;First, you need business clarity, which are the roots of the previously mentioned observability tree. This means articulating why something matters in concrete terms like dollars, users, retention and also avoiding vague statements like "uptime is important." For instance, if I were measuring the impact of downtime on a checkout feature, I could establish the SLO scope with “each minute of checkout downtime costs us $12,000 in lost revenue based on our average transaction volume.” It is essential to be able to explain the business impact in one clear sentence.&lt;/p&gt;

&lt;p&gt;Second, you need measurability. This is like the trunk of the tree. Your SLO must connect to your golden signals such as latency, traffic, errors, saturation. This is where a lot of aspirational SLOs fall apart. Upper management might want to measure user happiness, but how can engineering translate this into actual metrics? Try to express the business impact in one clear sentence. If that's difficult, it’s usually a sign the problem definition needs a bit more shaping before defining the SLO.&lt;/p&gt;

&lt;p&gt;Third, you need actionable targets, which represent the leaves on the observability tree. This is where most SLOs fail even when they get the first two right. There's a number, maybe even a threshold, but no clear action plan. What happens when you miss it? Who gets paged? What gets paused? Decision-worthy SLOs specify exactly what happens at different levels of degradation, and more importantly, they give everyone the confidence to make decisions based on those levels.&lt;/p&gt;

&lt;h2&gt;
  
  
  Building production resilient SLOs: LaunchDarkly’s Flag evaluation example
&lt;/h2&gt;

&lt;p&gt;We can apply these same principles of building a production-worthy SLO using &lt;a href="https://launchdarkly.com/docs/home/releases/flag-evaluations" rel="noopener noreferrer"&gt;LaunchDarkly’s flag evaluation feature&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The flag evaluation feature in the monitoring tab is an extension of observability where it tracks how often each flag variation is served to different contexts over time, and highlights flag changes that might affect evaluation patterns.&lt;/p&gt;

&lt;p&gt;Now, let’s build a SLO.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Start with the business question
&lt;/h3&gt;

&lt;p&gt;What would be impacted if the flag evaluations monitoring feature broke? Customers use these charts to understand rollout progress, debug targeting issues, and verify that their flags are working as expected. If evaluation data is delayed or missing, they can't trust what they're seeing. They might roll back a working feature thinking it's broken, or fail to catch a real problem because the charts show stale data. This undermines confidence in the platform and increases support load.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Translate to user experience terms
&lt;/h3&gt;

&lt;p&gt;What does "working well" look like? When a customer makes a flag change and checks the monitoring tab, they see updated evaluation counts within a couple minutes. The charts load quickly (under 3 seconds). The data is accurate meaning evaluation counts match what's actually happening in their application. If there's a delay, we tell them explicitly rather than showing stale data as if it's current.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: Connect to telemetry
&lt;/h3&gt;

&lt;p&gt;We track several golden signals for this feature. &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Data pipeline latency&lt;/strong&gt;: time from evaluation event to appearing in charts. &lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Chart load time&lt;/strong&gt;: how long it takes to render the monitoring page. &lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data accuracy&lt;/strong&gt;: comparing our recorded evaluations against a known sample. &lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Error rate&lt;/strong&gt;: failed queries or chart rendering errors.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For the sake of this example will set arbitrary numbers for these signals. Let’s say you had a median pipeline latency of 45 seconds, p95 at 2 minutes, p99 at 5 minutes. And a chart load time averages 1.2 seconds. Data accuracy is 99.7 percent (some evaluations drop due to sampling) and error rate is 0.3 percent.&lt;/p&gt;

&lt;p&gt;Using this data, we can set the target.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4: Set the target
&lt;/h3&gt;

&lt;p&gt;Based on that data, here's our SLO: 98 percent of flag evaluation events will appear in monitoring charts within 3 minutes, with chart load times under 3 seconds at p95.&lt;/p&gt;

&lt;p&gt;Why these numbers? Customer research shows they expect "near real-time" monitoring, which they define as 2-3 minutes. Anything longer feels like stale data. Three seconds for chart loading is the threshold where users perceive delay and start questioning if something's broken. &lt;/p&gt;

&lt;p&gt;We chose 98 percent instead of 99.9 percent because some evaluation events get sampled out intentionally for cost reasons, and occasional data pipeline delays from third-party dependencies are acceptable.&lt;/p&gt;

&lt;p&gt;Now that we have our targets, we can use those thresholds to set conditional responses based on alerts or indicators.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 5: Define operational responses
&lt;/h3&gt;

&lt;p&gt;Responses for Green, Red, or Yellow indicators in production:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If Green &lt;code&gt;(&amp;gt;98%, &amp;lt;3 min, &amp;lt;3 sec load)&lt;/code&gt;, continue normal operations.&lt;/li&gt;
&lt;li&gt;If Yellow &lt;code&gt;(95-98%, or 3-5 min, or 3-5 sec load)&lt;/code&gt;, alert on-call, investigate within 4 hours.&lt;/li&gt;
&lt;li&gt;If Red &lt;code&gt;(&amp;lt;95%, or &amp;gt;5 min, or &amp;gt;5 sec load)&lt;/code&gt;, page immediately, update status page if widespread.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Step 6: Drive decisions
&lt;/h3&gt;

&lt;p&gt;Now the SLO becomes your decision-making framework. When engineering proposes adding a new feature like "evaluations by SDK" breakdown, the first question is: "Will this keep us within our 3-second chart load SLO?" If the answer is no, we either optimize the implementation or push back on the feature.&lt;/p&gt;

&lt;p&gt;Infrastructure changes get evaluated the same way. Before migrating the data pipeline to a new system, we load tests against both our latency and accuracy targets. If the migration risks our SLO, we either fix the architecture or delay the migration. Another way I've seen SLOs used is planning future work. ie. if a team knows they are in the yellow this month, they may avoid picking up other risky work.&lt;/p&gt;

&lt;p&gt;The SLO transforms from a monitoring target into a decision filter, helping to determine what gets shipped and what doesn’t.&lt;/p&gt;

&lt;h2&gt;
  
  
  Bringing it all together
&lt;/h2&gt;

&lt;p&gt;Great SLOs aren't just leaves you pluck and add to dashboards. They're connected to everything below them from the trunk of solid telemetry to the roots of understanding what actually matters to your business and users. If you skip those foundational layers, your SLOs become technically accurate but strategically useless.&lt;/p&gt;

&lt;p&gt;Start with the roots. Ask what would be impacted if this feature were to break. Work your way up through user experience and technical measurement. Build SLOs that bridge engineering and business with clear thresholds and clear consequences. And finally, make them specific enough to drive real decisions.&lt;/p&gt;

</description>
      <category>observability</category>
      <category>slos</category>
      <category>evaluations</category>
      <category>flags</category>
    </item>
    <item>
      <title>Day 6 | 💸 The famous green character that stole your cloud budget: the cardinality problem</title>
      <dc:creator>Alexis Roberson</dc:creator>
      <pubDate>Tue, 16 Dec 2025 01:17:54 +0000</pubDate>
      <link>https://forem.com/launchdarkly/day-6-the-famous-green-character-that-stole-your-cloud-budget-the-cardinality-problem-420k</link>
      <guid>https://forem.com/launchdarkly/day-6-the-famous-green-character-that-stole-your-cloud-budget-the-cardinality-problem-420k</guid>
      <description>&lt;p&gt;Originally published in the LaunchDarkly &lt;a href="https://launchdarkly.com/docs/tutorials/cloud-budget-observability-holiday" rel="noopener noreferrer"&gt;Docs&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fomjjhax47q8ccjhl7t52.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fomjjhax47q8ccjhl7t52.png" alt=" " width="601" height="630"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Every December, engineering teams unwrap the same unwanted gift: their annual observability bill. And every year, it's bigger than the last.&lt;/p&gt;

&lt;p&gt;You know the pattern. Services multiply. Traffic grows. Someone discovers OpenTelemetry and suddenly every microservice is emitting 50 spans per request instead of 5. Then January rolls around and your observability platform sends an invoice that's 30% higher than last quarter.&lt;/p&gt;

&lt;p&gt;Your VP of Engineering wants to know why.&lt;/p&gt;

&lt;p&gt;You could blame it on the famous green character who hates Christmas, or you could join other teams who are getting serious about cost-efficient observability. That is, collecting telemetry data based on &lt;em&gt;value,&lt;/em&gt; not &lt;em&gt;volume.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Why "collect everything" no longer works
&lt;/h2&gt;

&lt;p&gt;The old playbook was simple: instrument everything, store it all, figure out what you need later. Storage was cheap enough. Queries were fast enough. No need to overthink it.&lt;/p&gt;

&lt;p&gt;Then, three things happened:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;OpenTelemetry went mainstream. Teams migrated from vendor agents to OTel and began adding spans for everything. This added more visibility, but with 10x the data.&lt;/li&gt;
&lt;li&gt;AI observability tools arrived. Platforms started using LLMs to analyze traces and suggest root causes. Powerful, but also expensive to run against terabytes of unfiltered trace data.&lt;/li&gt;
&lt;li&gt;CFOs started asking questions. &lt;em&gt;"Our_traffic grew 15% but observability costs grew 40%. Explain."&lt;/em&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To stop instrumenting wouldn't be an option and also you want to make informed decisions, but still the biggest culprit, hiding in your telemetry stack is cardinality.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cardinality will eat your budget
&lt;/h2&gt;

&lt;p&gt;Cardinality is the observability villain. It sneaks in quietly, one innocent-looking label at a time, and before you know it, it's stolen your entire cloud budget. What is cardinality? It's just the number of unique time series your metrics generate, but it's also the main driver of observability costs that nobody sees coming.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fag89tm3jc40l11b4pjbl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fag89tm3jc40l11b4pjbl.png" alt=" " width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Low cardinality: &lt;code&gt;http_requests_total&lt;/code&gt; tracked by method and &lt;code&gt;status_code&lt;/code&gt;. Maybe 20 unique combinations. Fairly manageable.&lt;/p&gt;

&lt;p&gt;High cardinality: Same counter, but now you've added &lt;code&gt;user_id&lt;/code&gt;, &lt;code&gt;request_id&lt;/code&gt;, and &lt;code&gt;session_token&lt;/code&gt; as labels. By simply adding these labels, you’ve just created millions of unique time series. Each one needs storage, indexing, and query compute. This will compound your bill faster than you can say deck the halls, except you wouldn’t be able to deck the halls, you’d be paying off your usage bill.&lt;/p&gt;

&lt;h2&gt;
  
  
  Stopping the Green character: set cardinality budgets
&lt;/h2&gt;

&lt;p&gt;Most teams don't set limits on how many time series a service can create even though they should., but you can.&lt;/p&gt;

&lt;p&gt;Start by auditing what you're currently generating. Look for metrics with &amp;gt;100K unique time series, or labels that include UUIDs, request IDs, or email addresses. These are your problem children.&lt;/p&gt;

&lt;p&gt;Then set budgets. Give each service a limit, like 50K time series max. Assign team quotas so the checkout team knows they get 200K total across all their services. Create attribute allowlists that define exactly which labels are allowed in production. Yes, this feels restrictive at first. Your developers will complain. They'll argue that they need that user_id label for debugging. And sometimes they're right. But forcing that conversation up front means they have to justify the cost, not just add labels reflexively.&lt;/p&gt;

&lt;p&gt;Finally, enforce budgets through linters that flag high-cardinality attributes in code review, CI checks that fail if estimates get too high, and dashboards that alert when cardinality spikes. This isn't about being restrictive. It's about being intentional. If you're adding a label, you should know why and what it'll cost.&lt;/p&gt;

&lt;p&gt;Cardinality budgets solve the metrics problem, but what about traces? That's where sampling comes in.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sampling: without the guilt
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F27n1ylb4nl8o6thm50co.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F27n1ylb4nl8o6thm50co.png" alt=" " width="800" height="830"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Not all sampling strategies are created equal, and picking the right one depends on what you're trying to protect.&lt;/p&gt;

&lt;p&gt;Head-based sampling is pretty strict. You decide whether to keep a trace at the very start of a request, before you know if it'll be interesting. Fast checkout gets dropped. Slow checkout that timeout also gets dropped, because the decision happened too early. Not great.&lt;/p&gt;

&lt;p&gt;Tail-based sampling is smarter. Wait until the trace completes, then decide based on what actually happened. Keep errors, high latency, or specific user cohorts. Sample down the boring stuff. This costs more (you have to buffer complete traces) but you keep what matters.&lt;/p&gt;

&lt;p&gt;Probabilistic sampling is the middle ground. Keep 10% of everything, regardless of content. Predictable cost reduction, but you'll still lose some critical events. Works fine for stable services where trends matter more than individual traces.&lt;/p&gt;

&lt;p&gt;Now rule-based sampling is where things get interesting, and honestly where most teams should be spending their energy. The idea is dead simple: different traffic deserves different sampling rates. You keep 100% of traces during feature rollouts because you actually care about every request when you're validating a new flow. &lt;/p&gt;

&lt;p&gt;If you're using LaunchDarkly for &lt;a href="https://launchdarkly.com/docs/home/releases/progressive-rollouts" rel="noopener noreferrer"&gt;progressive rollouts&lt;/a&gt;, you can tie sampling rates directly to flag evaluations. 100% sampling for users in the new variant, 10% for the control group. Your main API endpoints can run at 50% since they're stable and high-volume. Internal health checks that just verify the service is alive need maybe 5%, or even less. I've seen teams go down to 1% for health checks and never miss it. &lt;/p&gt;

&lt;p&gt;The key is that you're making these decisions based on the actual value of the signal, not just applying a blanket rate across everything. Adjust based on context: feature flags, experiments, specific endpoints, user cohorts, whatever makes sense for your system.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc07v622ux2ey9khxs3za.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc07v622ux2ey9khxs3za.png" alt=" " width="800" height="478"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Sampling isn't about compromising visibility. It's about amplifying signals. The noisy 90% of traces you're storing never get looked at anyway.&lt;/p&gt;

&lt;p&gt;Once you've decided what to keep, you still need to decide how long to keep it and at what resolution.&lt;/p&gt;

&lt;h2&gt;
  
  
  Downsample vs. Discard: know when to do which
&lt;/h2&gt;

&lt;p&gt;Not all data reduction is the same, and mixing up downsampling with discarding is how teams accidentally delete data they actually need.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Downsample&lt;/strong&gt; when you need historical context but not full precision. SLO burn rates don't need second-by-second granularity so you can downsample to 1-minute intervals and still catch every trend. An additional practice is to keep high-res data for a week, then downsample to hourly for long-term retention.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Discard&lt;/strong&gt; when the data is redundant or has served its purpose. For instance, debug spans from a canary that passed three days ago can be deleted. Or if you captured an error in both a trace and a log, you can pick one source of truth and drop the duplicate.&lt;/p&gt;

&lt;p&gt;The rule of thumb here is If you'll never query it, don't store it. If you might need it for trends in six months, downsample it. If you need it immediately when something breaks, keep it at full resolution with an aggressive retention policy.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this actually looks like
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5w1x7kksz9086r6zu0c1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5w1x7kksz9086r6zu0c1.png" alt=" " width="800" height="480"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Cost-efficient observability isn't about cutting capabilities. It's about cutting waste.&lt;/p&gt;

&lt;p&gt;Start by auditing your cardinality. Find the metrics generating hundreds of thousands of time series because someone added user_id as a label. Then, set budgets like 50K per service, 200K per team and enforce them through linters and CI checks. Create ways to encourage developers to justify high-cardinality labels before they ship, not after the bill arrives.&lt;/p&gt;

&lt;p&gt;Then you’ll be ready to tackle sampling. Drop the blanket 10% probabilistic rate and switch to rule-based sampling tied to actual value. Keep 100% of traces during feature rollouts. Sample stable endpoints at 10%. Go as low as 1% for health checks. If you're running feature flags, tie sampling to flag evaluations so you capture what matters and discard what doesn't.&lt;/p&gt;

&lt;p&gt;Finally, clean up retention, downsample SLO metrics to 1-minute intervals, discard debug spans from canaries that passed days ago and delete duplicate error data.&lt;/p&gt;

&lt;p&gt;This not only leads to lower bills, but also cleaner dashboards, faster queries, fewer noisy alerts, and teams that spend less time swimming through telemetry and more time fixing actual problems.&lt;/p&gt;

&lt;p&gt;Observability ROI isn't measured in data volume. It's measured in how fast you detect and resolve issues.&lt;/p&gt;

&lt;p&gt;The teams figuring this out in 2025 aren't collecting everything. They're collecting what matters.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fim0qxzs86ogz0hc21wuw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fim0qxzs86ogz0hc21wuw.png" alt=" " width="800" height="169"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>observability</category>
      <category>recap</category>
      <category>2025</category>
      <category>holiday</category>
    </item>
    <item>
      <title>Day 3 | 🔔 Jingle All the Way to Zero-Config Observability</title>
      <dc:creator>Alexis Roberson</dc:creator>
      <pubDate>Wed, 10 Dec 2025 19:10:26 +0000</pubDate>
      <link>https://forem.com/launchdarkly/day-3-jingle-all-the-way-to-zero-config-observability-m0p</link>
      <guid>https://forem.com/launchdarkly/day-3-jingle-all-the-way-to-zero-config-observability-m0p</guid>
      <description>&lt;p&gt;Originally published in the LaunchDarkly &lt;a href="https://launchdarkly.com/docs/tutorials/zero-config-observability-holiday" rel="noopener noreferrer"&gt;Docs&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F291b1p70d0qf85hqz353.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F291b1p70d0qf85hqz353.png" alt=" " width="601" height="630"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For years, auto-instrumentation promised effortless observability but kept falling short. You'd still end up manually adding spans to business logic, hunting down missing metadata, or trying to piece together how a feature rollout was affecting customers.&lt;/p&gt;

&lt;p&gt;That finally shifted in 2025. With OTel auto-instrumentation maturing and LaunchDarkly adding built-in OTel support to server-side SDKs, teams started getting feature flag context baked into their traces without writing instrumentation code. The zero-config promise actually started delivering.&lt;/p&gt;

&lt;p&gt;Auto-instrumentation has always had a blind spot: it shows you what happened, but not why. You'd see a latency spike, but had no idea which feature flag was active, which users hit it, or what experiment was running.&lt;/p&gt;

&lt;p&gt;Without that context, you're doing detective work. Digging through logs, matching up timestamps, guessing at what caused what. Manual instrumentation helped, but you paid for it in engineering time, inconsistent coverage, and mounting technical debt.&lt;/p&gt;

&lt;h2&gt;
  
  
  Auto-instrumentation that actually knows about your features
&lt;/h2&gt;

&lt;p&gt;The game changed when OTel auto-instrumentation actually got good. Instead of just capturing basic HTTP calls, it now handles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Framework-level request tracing.&lt;/li&gt;
&lt;li&gt;Automatic context propagation across services.&lt;/li&gt;
&lt;li&gt;Runtime metadata and environment details.&lt;/li&gt;
&lt;li&gt;Errors and exceptions without manual try-catch blocks.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;LaunchDarkly takes this further by injecting flag evaluation data straight into OTel spans. Every time you evaluate a flag, you automatically get the flag key, user context, which variation served, and the targeting rule that fired. That data feeds into your existing OTel pipeline, so your traces finally show which features were active and who was affected - not just database queries and API calls.&lt;/p&gt;

&lt;p&gt;So how do you actually set this up?&lt;/p&gt;

&lt;p&gt;&lt;a href="https://launchdarkly.com/docs/sdk/features/opentelemetry-server-side" rel="noopener noreferrer"&gt;To get started with Otel trace hooks and feature flag data&lt;/a&gt;, simply add the hooks to your LaunchDarkly client config.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ldclient&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;ldclient&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Config&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;ldotel.tracing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Hook&lt;/span&gt;

&lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Config&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;YOUR_SDK_KEY&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;hooks&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;Hook&lt;/span&gt;&lt;span class="p"&gt;()])&lt;/span&gt;
&lt;span class="n"&gt;ldclient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_config&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ldclient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This flows into your existing OpenTelemetry pipeline, enriching every trace with feature-aware context.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;TracingHook&lt;/strong&gt; automatically decorates your OpenTelemetry spans with flag evaluation events. When your application evaluates flags during a request, those evaluations become part of the trace along with the full context about what was evaluated and for whom.&lt;/p&gt;

&lt;p&gt;You can also configure your OpenTelemetry collector or exporter to point to LaunchDarkly's OTLP endpoint, and you're done. &lt;/p&gt;

&lt;p&gt;For HTTP:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;https://otel.observability.app.launchdarkly.com:4318 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For gRPC:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;https://otel.observability.app.launchdarkly.com:4317
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This feature is also available in .Net, Go, Java, Node.JS and Ruby.&lt;/p&gt;

&lt;p&gt;Auto-instrumentation handles the rest, HTTP spans, database calls, framework-level tracing, error capture, and now, feature flag context.&lt;/p&gt;

&lt;h2&gt;
  
  
  What auto-instrumentation unlocks
&lt;/h2&gt;

&lt;p&gt;When you ship a new feature variant, you immediately see how it performs per cohort. If there's a latency spike in the "new-checkout-flow" variation, you'll know within minutes before it affects user experience.&lt;/p&gt;

&lt;p&gt;That same visibility matters during incidents. When an outage hits, filter traces by flag evaluation to see which features were active when errors occurred. The trace shows you whether it was the new recommendation engine, the optimized query path, or something else entirely.&lt;/p&gt;

&lt;p&gt;This is especially powerful for experimentation. LaunchDarkly processes your OTel traces into metrics automatically, so when you run an A/B test, you get latency, error rate, and throughput calculated per variation without extra config. The same telemetry powering your dashboards powers your experiments.&lt;/p&gt;

&lt;p&gt;The best part of this setup is that it scales without additional work. As teams ship more features behind flags, the telemetry gets more valuable without getting more expensive to maintain. New services inherit feature-aware tracing just by initializing the SDK.&lt;/p&gt;

&lt;h2&gt;
  
  
  When to add custom spans
&lt;/h2&gt;

&lt;p&gt;Zero-config doesn't mean never-config. You'll still want custom spans for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Business logic milestones&lt;/strong&gt;. If you need to measure time-to-recommendation or search-to-purchase, custom spans make that explicit.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ML pipeline stages&lt;/strong&gt;. Feature extraction, model inference, and post-processing often warrant their own spans for detailed performance analysis.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cross-service boundaries&lt;/strong&gt;. Queue producers, stream processors, and async workers may need manual context propagation and span creation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Experiment-specific KPIs&lt;/strong&gt;. If your A/B test measures "items added to cart" or "video completion rate," you'll instrument those as custom metrics.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The important part is you're writing these spans to capture business value, not to patch holes in your instrumentation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Delivering real value
&lt;/h2&gt;

&lt;p&gt;Combining mature auto-instrumentation with feature-aware enrichment changes how teams approach observability. It's no longer a separate investment that competes with feature development. It's a byproduct of how you ship features.&lt;/p&gt;

&lt;p&gt;When you evaluate a flag, you get telemetry. When you roll out a feature, you get performance data segmented by variation. When you run an experiment, you get metrics derived from production traces. The instrumentation you would have written manually is now embedded in the tools you already use.&lt;/p&gt;

&lt;p&gt;Observability stops being something you retrofit after launch and becomes something you inherit by default. Which means teams spend less time debugging instrumentation gaps and more time acting on insights.&lt;/p&gt;

&lt;p&gt;That's the promise of zero-config, finally delivered.&lt;/p&gt;

&lt;p&gt;Ready to try it? Explore LaunchDarkly's &lt;a href="https://launchdarkly.com/docs/sdk/features/opentelemetry-server-side" rel="noopener noreferrer"&gt;OpenTelemetry integration documentation&lt;/a&gt; or &lt;a href="http://app.launchdarkly.com/signup" rel="noopener noreferrer"&gt;sign up for a free trial account&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Enable Observability or Experimentation in your LaunchDarkly dashboard and start seeing feature-aware telemetry from your existing traces.&lt;/p&gt;

</description>
      <category>zeroconfig</category>
      <category>observability</category>
      <category>instrumentation</category>
      <category>python</category>
    </item>
    <item>
      <title>Day 2 | 🎅 He knows if you have been bad or good... But what if he gets it wrong?</title>
      <dc:creator>Alexis Roberson</dc:creator>
      <pubDate>Tue, 09 Dec 2025 20:21:47 +0000</pubDate>
      <link>https://forem.com/launchdarkly/day-2-he-knows-if-you-have-been-bad-or-good-but-what-if-he-gets-it-wrong-17k6</link>
      <guid>https://forem.com/launchdarkly/day-2-he-knows-if-you-have-been-bad-or-good-but-what-if-he-gets-it-wrong-17k6</guid>
      <description>&lt;p&gt;Originally published in the LaunchDarkly &lt;a href="https://launchdarkly.com/docs/tutorials/day-two-holiday-campaign_2025" rel="noopener noreferrer"&gt;Docs&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsivhq28jj3txxoit0m8z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsivhq28jj3txxoit0m8z.png" alt=" " width="601" height="630"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;"He knows if you've been bad or good..."&lt;/p&gt;

&lt;p&gt;As kids, we accepted the magic. As engineers in 2025, we need to understand the mechanism. So let's imagine Santa's "naughty or nice" system as a modern AI architecture running at scale. What would it take to make it observable when things go wrong?&lt;/p&gt;

&lt;h2&gt;
  
  
  The architecture: Santa's distributed AI system
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feomikg5wgdirdkj7g90s.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feomikg5wgdirdkj7g90s.png" alt=" " width="800" height="549"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Santa's operation would need three layers. The input layer handles behavioral data from 2 billion children on a point system. "Shared toys with siblings" gets +10 points, "Threw tantrum at store" loses 5.&lt;/p&gt;

&lt;p&gt;The processing layer runs multiple AI agents working together. A Data Agent collects and organizes behavioral events. A Context Agent retrieves relevant history: letters to Santa, past behavior, family situation. A Judgment Agent analyzes everything and makes the Nice/Naughty determination. And a Gift Agent recommends appropriate presents based on the decision.&lt;/p&gt;

&lt;p&gt;The integration layer connects to MCP servers for Toy Inventory, Gift Preferences, Delivery Routes, and Budget Tracking.&lt;/p&gt;

&lt;p&gt;It's elegant. It scales. And when it breaks, it's a nightmare to debug.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem: A good child on the Naughty List
&lt;/h2&gt;

&lt;p&gt;It's Christmas Eve at 11:47 PM.&lt;/p&gt;

&lt;p&gt;A parent calls, furious. Emma, age 7, has been a model child all year. She should be getting the bicycle she asked for. Instead, the system says: &lt;strong&gt;Naughty List - No Gift&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;You pull up the logs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Emma's judgment: 421 NICE points vs 189 NAUGHTY points
Gift Agent tries to check bicycle inventory → TIMEOUT
Gift Agent retries → TIMEOUT  
Gift Agent retries again → TIMEOUT
Gift Agent checks inventory again → Count changed
Gift Agent reasoning: "Inventory uncertain, cannot fulfill request"
Gift Agent defaults to: NAUGHTY LIST
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Emma wasn't naughty. The Toy Inventory MCP was overloaded from Christmas Eve traffic. But the agent's reasoning chain interpreted three timeouts as "this child's request cannot be fulfilled" and failed to the worst possible default.&lt;/p&gt;

&lt;p&gt;With traditional APIs, you'd find the bug on line 47, fix it, and deploy. With AI agents, it's not that simple. The agent decided to interpret timeouts that way. You didn't code that logic. The LLM's 70 billion parameters did.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;This is the core challenge of AI observability: You're debugging decisions, not code.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Why AI Systems are hard to debug
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Black box reasoning and reproducibility go hand in hand&lt;/strong&gt;. With traditional debugging, you step through the code and find the exact line that caused the problem. With AI agents, you only see inputs and outputs. The agent received three timeouts and decided to default to NAUGHTY_LIST. Why? Neural network reasoning you can't inspect.&lt;/p&gt;

&lt;p&gt;And even if you could inspect it, you couldn't reliably reproduce it. Run Emma's case in test four times and you might get:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Run 1: NICE LIST, gift = bicycle ✓
Run 2: NICE LIST, gift = video game ✓
Run 3: NICE LIST, gift = art supplies ✓
Run 4: NAUGHTY LIST, no gift ✗
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Temperature settings and sampling introduce randomness. Same input, different results every time. Traditional logs show you what happened. AI observability needs to show you why, and in a way you can actually verify.&lt;/p&gt;

&lt;p&gt;Then there's the question of quality. Consider this child:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Refused to eat vegetables (10 times) but helped put away dishes&lt;/li&gt;
&lt;li&gt;Yelled at siblings (3 times) but defended a classmate from a bully&lt;/li&gt;
&lt;li&gt;Skipped homework (5 times) but cared for a sick puppy&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Is this child naughty or nice? The answer depends on context, values, and interpretation. Your agent returns NICE (312 points), gift = books about empathy. A traditional API would return 200 OK and call it success. For an AI agent, you need to ask: Did it judge correctly?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Costs can spiral out of control&lt;/strong&gt;. Mrs. Claus (Santa's CFO) sees the API bill jump from 5,000 in Week 1 to 890,000 on December 24th. What happened? One kid didn't write a letter. They wrote a 15,000-word philosophical essay. Instead of flagging it, the agent processed every last word, burning through 53,500 tokens for a single child. At scale, this bankrupts the workshop.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;And failures cascade in unexpected ways&lt;/strong&gt;. The Gift Agent doesn't just fail when it hits a timeout. It reasons through failure. It interpreted three timeouts as "system is unreliable," then saw the inventory count change and concluded "inventory is volatile, cannot guarantee fulfillment." Each interpretation fed into the next, creating a chain of reasoning that led to: "Better to disappoint than make a promise I can't keep. Default to NAUGHTY_LIST."&lt;/p&gt;

&lt;p&gt;With traditional code, you debug line by line. With AI agents, you need to debug the entire reasoning chain. Not just what APIs were called, but why the agent called them and how it interpreted each result.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Santa Actually Needs
&lt;/h2&gt;

&lt;p&gt;The answer isn't to throw out traditional observability, but to build on top of it. Think of it as three layers.&lt;/p&gt;

&lt;p&gt;This is exactly what we've built at LaunchDarkly. Our platform combines &lt;a href="https://launchdarkly.com/blog/llm-observability-in-ai-configs/" rel="noopener noreferrer"&gt;AI observability&lt;/a&gt;, &lt;a href="https://launchdarkly.com/docs/eu-docs/home/ai-configs/online-evaluations" rel="noopener noreferrer"&gt;online evaluations&lt;/a&gt;, and &lt;a href="https://launchdarkly.com/docs/eu-docs/home/flags/new" rel="noopener noreferrer"&gt;feature management&lt;/a&gt; to help you understand, measure, and control AI agent behavior in production. Let's walk through how each layer works.&lt;/p&gt;

&lt;p&gt;Start with the fundamentals. You still need distributed tracing across your agent network, latency breakdowns showing where time is spent, token usage per request, cost attribution by agent, and tool call success rates for your MCP servers. When the Toy Inventory MCP goes down, you need to see it immediately. When costs spike, you need alerts. This isn't optional. It's table stakes for running any production system.&lt;/p&gt;

&lt;p&gt;For Santa's workshop, this means tracing requests across Data Agent → Context Agent → Judgment Agent → Gift Agent, monitoring MCP server health, tracking token consumption per child evaluation, and alerting when costs spike unexpectedly. It’s important to note, LaunchDarkly's AI observability captures all of this out of the box, providing full visibility into your agent's infrastructure performance and resource consumption.&lt;/p&gt;

&lt;p&gt;Then add semantic observability. This is where AI diverges from traditional systems. You need to capture the reasoning, not just the results. For every decision, log the complete prompt, retrieved context, tool calls and their results, the agent's reasoning chain, and confidence scores.&lt;/p&gt;

&lt;p&gt;When Emma lands on the Naughty List, you can replay the entire decision. The Gift Agent received three timeouts from the Toy Inventory MCP, interpreted "inventory uncertain" as "cannot fulfill request," and defaulted to NAUGHTY_LIST as the "safe" outcome. Now you understand why it happened. And more importantly, you realize this isn't a bug in your code. It's a reasoning pattern the model developed. Reasoning patterns require different fixes than code bugs.&lt;/p&gt;

&lt;p&gt;LaunchDarkly's trace viewer lets you inspect every step of the agent's decision-making process, from the initial prompt to the final output, including all tool calls and the reasoning behind each step. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F660tl39dhmittwnhd0pb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F660tl39dhmittwnhd0pb.png" alt=" " width="800" height="458"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Finally, use &lt;a href="https://launchdarkly.com/docs/eu-docs/home/ai-configs/online-evaluations" rel="noopener noreferrer"&gt;online evals&lt;/a&gt;. Where observability shows what happened, online evals automatically assess quality and take action. Using the LLM-as-a-judge approach, you score every sampled decision. One AI judges another's work:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"accuracy"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"score"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"reasoning"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Timeouts should trigger retry logic, not default to 
      worst-case outcome. System error conflated with behavioral judgment."&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"fairness"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"score"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"reasoning"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Similar timeout patterns resulted in NICE determination 
      for other children. Inconsistent failure handling."&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This changes the conversation from vague to specific.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Without evals&lt;/strong&gt;: "Let's meet tomorrow to review Emma's case and decide if we should rollback."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;With evals&lt;/strong&gt;: "Accuracy dropped below 0.7 for the 'timeout cascade defaults to NAUGHTY' pattern. Automatic rollback triggered. Here are the 23 affected cases."&lt;/p&gt;

&lt;p&gt;LaunchDarkly's online evaluations run continuously in production, automatically scoring your agent's decisions and alerting you when quality degrades. You can define custom evaluation criteria tailored to your use case and set thresholds that trigger automatic actions.&lt;/p&gt;

&lt;p&gt;This is where feature management and experimentation come in. Feature flags paired with guarded rollouts let you control deployments and roll back bad ones. Experimentation lets you A/B test different approaches. With AI agents, you're doing the same thing, but instead of testing button colors or checkout flows, you're testing prompt variations, model versions, and reasoning strategies. When your evals detect accuracy has dropped below threshold, you automatically roll back to the previous agent configuration.&lt;/p&gt;

&lt;p&gt;Use feature flags to control which model version, prompt template, or reasoning strategy your agents use and seamlessly roll back when something goes wrong. Our experimentation platform lets you A/B test different agent configurations and measure which performs better on your custom metrics. Check out our guide on feature flagging AI applications.&lt;/p&gt;

&lt;p&gt;You're not just observing decisions. You're evaluating quality in real-time and taking action.&lt;/p&gt;

&lt;h2&gt;
  
  
  Debugging Emma: all three layers in action
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Traditional observability&lt;/strong&gt; shows the Toy Inventory MCP experienced three timeouts that triggered retry logic. Token usage remained average. From an infrastructure perspective, nothing looked catastrophic.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Semantic observability&lt;/strong&gt; reveals where the reasoning went wrong. The Gift Agent interpreted the timeouts as "inventory uncertain" and made the leap to "cannot fulfill requests." Rather than recognizing this as a temporary system issue, it treated the timeouts as a data problem and defaulted to NAUGHTY_LIST.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Online evals&lt;/strong&gt; reveal this isn't just a one-off problem with Emma, but a pattern happening across multiple cases. The accuracy judge flagged this decision at 0.3, well below acceptable thresholds. Querying for similar low-accuracy decisions reveals 23 other cases where timeout cascades resulted in NAUGHTY_LIST defaults.&lt;/p&gt;

&lt;p&gt;Each layer tells part of the story. Together, they give you everything you need to fix it before more parents call.&lt;/p&gt;

&lt;p&gt;With LaunchDarkly, all three layers work together in a single platform. You can trace the infrastructure issue, inspect the reasoning chain, evaluate the decision quality, and automatically roll back to a safer configuration, all within minutes of Emma's case being flagged.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Every AI agent system faces these exact challenges. Customer service agents making support decisions. Code assistants suggesting fixes. Content moderators judging appropriateness. Recommendation engines personalizing experiences. They all struggle with the same problems.&lt;/p&gt;

&lt;p&gt;Traditional observability tools weren't built for this. AI systems make decisions, and decisions need different observability than code.&lt;/p&gt;

&lt;p&gt;Santa's system says "He knows if you've been bad or good." But how he knows matters. Because when Emma gets coal instead of a bicycle due to a timeout cascade at 11:47 PM on Christmas Eve, you need to understand what happened, find similar cases, measure if it's systematic, fix it without breaking other cases, and ensure it doesn't happen again.&lt;/p&gt;

&lt;p&gt;You can't do that with traditional observability alone. AI agents aren't APIs. They're decision-makers. Which means you need to observe them differently.&lt;/p&gt;

&lt;p&gt;LaunchDarkly provides the complete platform for building reliable AI agent systems: observability to understand what's happening, online evaluations to measure quality, and feature management to control and iterate safely. Whether you're building Santa's naughty-or-nice system or a production AI application, you need all three layers working together.&lt;/p&gt;

&lt;p&gt;Ready to make your AI agents more reliable? &lt;a href="https://launchdarkly.com/docs/home/ai-configs/quickstart" rel="noopener noreferrer"&gt;Start with our AI quickstart guide&lt;/a&gt; to see how LaunchDarkly can help you ship AI agents with confidence.&lt;/p&gt;

</description>
      <category>observability</category>
      <category>ai</category>
      <category>agents</category>
      <category>monitoring</category>
    </item>
  </channel>
</rss>
