<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Elizabeth Fuentes L</title>
    <description>The latest articles on Forem by Elizabeth Fuentes L (@elizabethfuentes12).</description>
    <link>https://forem.com/elizabethfuentes12</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F717518%2Fb550b165-b8b9-405d-acfb-e5dc846765b0.png</url>
      <title>Forem: Elizabeth Fuentes L</title>
      <link>https://forem.com/elizabethfuentes12</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/elizabethfuentes12"/>
    <language>en</language>
    <item>
      <title>Desbordamiento de Ventana de Contexto de IA: Solución con Puntero de Memoria</title>
      <dc:creator>Elizabeth Fuentes L</dc:creator>
      <pubDate>Thu, 14 May 2026 07:00:00 +0000</pubDate>
      <link>https://forem.com/aws-espanol/desbordamiento-de-ventana-de-contexto-de-ia-solucion-con-puntero-de-memoria-1g76</link>
      <guid>https://forem.com/aws-espanol/desbordamiento-de-ventana-de-contexto-de-ia-solucion-con-puntero-de-memoria-1g76</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;El desbordamiento de ventana de contexto** ocurre cuando las salidas de herramientas de un agente de IA exceden el límite de tokens que el modelo de lenguaje grande (LLM) puede procesar de una vez. El agente no falla: silenciosamente trunca datos, pierde contexto anterior o produce resultados incompletos. Este post muestra cómo el Patrón de Puntero de Memoria lo soluciona: desde agente único hasta coordinación multi-agente donde 145KB de datos nunca entran en ningún contexto de LLM.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Esta demo usa &lt;a href="https://strandsagents.com/?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el" rel="noopener noreferrer"&gt;Strands Agents&lt;/a&gt;. El Patrón de Puntero de Memoria es independiente del framework y se puede aplicar con LangGraph, AutoGen u otros frameworks de agentes que soporten contexto de herramientas.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Código funcional:&lt;/strong&gt; &lt;a href="https://github.com/aws-samples/sample-why-agents-fail/tree/main/stop-ai-agents-wasting-tokens/01-context-overflow-demo" rel="noopener noreferrer"&gt;github.com/aws-samples/sample-why-agents-fail&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Serie: Por Qué Fallan los Agentes de IA
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Desbordamiento de Ventana de Contexto&lt;/strong&gt; (este post) — Patrón de Puntero de Memoria para datos grandes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/aws-samples/sample-why-agents-fail/tree/main/stop-ai-agents-wasting-tokens/02-mcp-timeout-demo" rel="noopener noreferrer"&gt;Herramientas MCP Que Nunca Responden&lt;/a&gt;&lt;/strong&gt; — Patrón asíncrono para APIs externas lentas&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/aws-samples/sample-why-agents-fail/tree/main/stop-ai-agents-wasting-tokens/03-reasoning-loops-demo" rel="noopener noreferrer"&gt;Loops de Razonamiento en Agentes de IA&lt;/a&gt;&lt;/strong&gt; — Detectar y bloquear llamadas repetidas a herramientas&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  El Problema: Los Agentes No Pueden Manejar Salidas Grandes de Herramientas
&lt;/h2&gt;

&lt;p&gt;Cuando un agente de IA llama a una herramienta que devuelve datos grandes (logs del servidor, resultados de bases de datos, contenidos de archivos), la respuesta puede desbordar la ventana de contexto del LLM. El agente no falla con un error claro. Se degrada silenciosamente: trunca datos, pierde contexto o no completa la tarea.&lt;/p&gt;

&lt;p&gt;Una investigación de IBM (&lt;a href="https://arxiv.org/html/2511.22729v1" rel="noopener noreferrer"&gt;Solving Context Window Overflow in AI Agents, 2025&lt;/a&gt;) cuantifica esto:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;En flujos de trabajo de Ciencia de Materiales, las salidas de herramientas pueden alcanzar &lt;strong&gt;más de 2 millones de elementos&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;El enfoque tradicional consumió &lt;strong&gt;20,822,181 tokens&lt;/strong&gt; y falló&lt;/li&gt;
&lt;li&gt;El mismo flujo con punteros de memoria usó &lt;strong&gt;1,234 tokens&lt;/strong&gt; y tuvo éxito&lt;/li&gt;
&lt;li&gt;Eso es una reducción de más de &lt;strong&gt;16,000x&lt;/strong&gt; en este flujo de trabajo&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Una observación comunitaria (&lt;a href="https://airbyte.com/agentic-data/context-window-limit" rel="noopener noreferrer"&gt;Context Window Limits Explained, Airbyte 2025&lt;/a&gt;) confirma que los equipos descubren estos límites "de la manera difícil" a través de errores silenciosos. El agente parece funcionar pero produce resultados incompletos o incorrectos.&lt;/p&gt;

&lt;p&gt;El concepto de pasar referencias en lugar de datos sin procesar también ha sido validado en configuraciones multi-agente. Una investigación de Amazon (&lt;a href="https://arxiv.org/pdf/2412.05449" rel="noopener noreferrer"&gt;Towards Effective GenAI Multi-Agent Collaboration, 2024&lt;/a&gt;) introduce "referenciación de carga útil", donde los agentes intercambian punteros a datos compartidos en lugar de incrustar cargas grandes en mensajes. Esto mejoró el rendimiento en tareas intensivas en código en un 23% y logró tasas de éxito de objetivos de extremo a extremo del 90% en benchmarks empresariales. Esto es exactamente lo que implementamos a continuación con &lt;a href="https://strandsagents.com/docs/user-guide/concepts/multi-agent/swarm/?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el" rel="noopener noreferrer"&gt;Strands Swarm&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Por Qué Sucede Esto
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd0su97tbog85m0q3srr5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd0su97tbog85m0q3srr5.png" alt="El bucle del agente: User Query fluye a LLM, luego Tool Call, luego Tool Output (214KB), luego de regreso a LLM. La salida grande de herramienta causa desbordamiento de contexto" width="800" height="283"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Cuando la salida de la herramienta es pequeña (unos pocos KB), esto funciona bien. Pero cuando una herramienta devuelve 200KB de logs del servidor:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;La salida completa se inyecta en la conversación&lt;/li&gt;
&lt;li&gt;La ventana de contexto del LLM se llena&lt;/li&gt;
&lt;li&gt;El contexto más antiguo (incluida la pregunta original) se expulsa&lt;/li&gt;
&lt;li&gt;El LLM no puede razonar sobre los datos porque no puede verlos todos&lt;/li&gt;
&lt;li&gt;El agente falla o produce respuestas incompletas&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Solución 1: Agente Único con Strands ToolContext
&lt;/h2&gt;

&lt;p&gt;El primer enfoque usa &lt;a href="https://strandsagents.com/docs/user-guide/concepts/agents/state/?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el" rel="noopener noreferrer"&gt;&lt;code&gt;agent.state&lt;/code&gt;&lt;/a&gt;, un almacén clave-valor nativo con alcance para cada instancia de agente. Las herramientas escriben datos grandes allí vía &lt;a href="https://strandsagents.com/docs/user-guide/concepts/tools/custom-tools/?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el" rel="noopener noreferrer"&gt;&lt;code&gt;ToolContext&lt;/code&gt;&lt;/a&gt; y devuelven una cadena de puntero corta al contexto:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ToolContext&lt;/span&gt;

&lt;span class="c1"&gt;# context=True inyecta ToolContext como el último parámetro — requerido para acceder a agent.state
&lt;/span&gt;&lt;span class="nd"&gt;@tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;fetch_application_logs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;app_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tool_context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ToolContext&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;hours&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;24&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Obtiene logs de aplicación. Devuelve un puntero de memoria para datasets grandes.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;logs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;generate_logs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;app_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;hours&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Podría ser 200KB+
&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;logs&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;20_000&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="c1"&gt;# Umbral: almacenar externamente por encima de 20KB
&lt;/span&gt;        &lt;span class="n"&gt;pointer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;logs-&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;app_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="c1"&gt;# Almacena la carga útil completa en agent.state — nunca entra al contexto del LLM
&lt;/span&gt;        &lt;span class="n"&gt;tool_context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pointer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;logs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="c1"&gt;# Devuelve solo la clave del puntero (52 bytes) — esto es todo lo que ve el LLM
&lt;/span&gt;        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Datos almacenados como puntero &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;pointer&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;. Usa herramientas de análisis para consultarlo.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;logs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Suficientemente pequeño para devolver directamente
&lt;/span&gt;
&lt;span class="nd"&gt;@tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;analyze_error_patterns&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data_pointer&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tool_context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ToolContext&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Analiza errores — resuelve puntero desde agent.state.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="c1"&gt;# Recupera el dataset completo desde agent.state usando la clave del puntero
&lt;/span&gt;    &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tool_context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data_pointer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;errors&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;level&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ERROR&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="c1"&gt;# Devuelve un resumen (no datos sin procesar) — mantiene la respuesta pequeña
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Se encontraron &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;errors&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; errores en &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;service&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;errors&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; servicios&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;El LLM nunca ve los 200KB. Solo ve &lt;code&gt;"Datos almacenados como puntero 'logs-payment-service'"&lt;/code&gt; (52 bytes). La siguiente herramienta lee los datos completos desde &lt;code&gt;agent.state&lt;/code&gt; y devuelve un resumen. Strands proporciona esta capacidad nativamente, sin diccionarios globales, sin hashlib, sin infraestructura externa.&lt;/p&gt;

&lt;h3&gt;
  
  
  Resultados de Agente Único
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Métrica&lt;/th&gt;
&lt;th&gt;Sin Punteros&lt;/th&gt;
&lt;th&gt;Con Punteros de Memoria&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Datos en contexto&lt;/td&gt;
&lt;td&gt;214KB (logs completos)&lt;/td&gt;
&lt;td&gt;52 bytes (puntero)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Comportamiento del agente&lt;/td&gt;
&lt;td&gt;Trunca/falla&lt;/td&gt;
&lt;td&gt;Procesa todos los datos&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Errores detectados&lt;/td&gt;
&lt;td&gt;Parcial&lt;/td&gt;
&lt;td&gt;Completo (todos los servicios)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9922q2u1b8wl6miejb79.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9922q2u1b8wl6miejb79.png" alt="Gráfico de barras comparando uso de tokens con y sin Patrón de Puntero de Memoria en cuatro estrategias de gestión de contexto" width="800" height="478"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Solución 2: Multi-Agente con Strands Swarm
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fih2nb4e6m3tkocud75ol.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fih2nb4e6m3tkocud75ol.png" alt="Flujo de datos de Strands Swarm: agentes Collector, Analyzer y Reporter compartiendo 145KB de datos a través de invocation_state sin entrar a ninguna ventana de contexto de LLM" width="800" height="397"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Un solo agente funciona para pipelines lineales. Pero la respuesta a incidentes del mundo real involucra roles especializados: alguien obtiene datos, alguien los analiza, alguien escribe el reporte. &lt;a href="https://strandsagents.com/docs/user-guide/concepts/multi-agent/swarm/?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el" rel="noopener noreferrer"&gt;Strands Swarm&lt;/a&gt; coordina múltiples agentes autónomamente: define agentes con diferentes herramientas, y el Swarm maneja los traspasos.&lt;/p&gt;

&lt;p&gt;Este es el mismo patrón de "referenciación de carga útil" del &lt;a href="https://arxiv.org/pdf/2412.05449" rel="noopener noreferrer"&gt;paper de colaboración multi-agente de Amazon&lt;/a&gt;. Los agentes intercambian punteros a datos compartidos en lugar de pasar cargas sin procesar. La diferencia es que Strands Swarm maneja la coordinación automáticamente, y proporciona &lt;a href="https://strandsagents.com/docs/user-guide/concepts/multi-agent/multi-agent-patterns/?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el" rel="noopener noreferrer"&gt;&lt;code&gt;invocation_state&lt;/code&gt;&lt;/a&gt; como la API oficial para compartir datos entre agentes.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ToolContext&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands.multiagent&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Swarm&lt;/span&gt;

&lt;span class="c1"&gt;# invocation_state es un dict compartido entre todos los agentes en el Swarm — el almacén entre agentes
&lt;/span&gt;&lt;span class="nd"&gt;@tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;fetch_application_logs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;app_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tool_context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ToolContext&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;hours&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;logs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;generate_logs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;hours&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# 145KB+
&lt;/span&gt;    &lt;span class="n"&gt;pointer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;logs-&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;app_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="c1"&gt;# Almacena en invocation_state para que todos los agentes descendentes puedan acceder sin re-obtener
&lt;/span&gt;    &lt;span class="n"&gt;tool_context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;invocation_state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;pointer&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;logs&lt;/span&gt;
    &lt;span class="c1"&gt;# Solo la cadena de puntero viaja a través del contexto del LLM al siguiente agente
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Almacenado como &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;pointer&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;. Traspasar a analyzer.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="nd"&gt;@tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;analyze_error_patterns&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;logs_pointer&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tool_context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ToolContext&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Resuelve el puntero al dataset completo — sin contexto de LLM consumido
&lt;/span&gt;    &lt;span class="n"&gt;logs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tool_context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;invocation_state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;logs_pointer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;errors&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;l&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;logs&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;level&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ERROR&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;total_errors&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;errors&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;  &lt;span class="c1"&gt;# campos adicionales omitidos por brevedad
&lt;/span&gt;    &lt;span class="c1"&gt;# Almacena resultados de análisis como otro puntero para el agente reporter
&lt;/span&gt;    &lt;span class="n"&gt;tool_context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;invocation_state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error_analysis&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Cada agente tiene un rol enfocado; el Swarm decide el orden de traspaso autónomamente
&lt;/span&gt;&lt;span class="n"&gt;collector&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;collector&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;fetch_application_logs&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;MODEL&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;analyzer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;analyzer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;analyze_error_patterns&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;detect_latency_anomalies&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;MODEL&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;reporter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;reporter&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;generate_incident_report&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;MODEL&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;swarm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Swarm&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;collector&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;analyzer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;reporter&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;entry_point&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;collector&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;swarm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Obtén logs, analiza y genera reporte de incidente.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;El Swarm automáticamente:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Comienza con el collector, que obtiene 145KB de logs y los almacena en &lt;code&gt;invocation_state&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;El collector traspasa al analyzer con el puntero &lt;code&gt;"logs-payment-service"&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;El analyzer ejecuta análisis de errores y latencia, almacena resultados en &lt;code&gt;invocation_state&lt;/code&gt;, traspasa al reporter&lt;/li&gt;
&lt;li&gt;El reporter genera el reporte de incidente final&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;No se necesita código de orquestación ni lógica de traspaso manual. Cada agente tiene sus propias herramientas y el Swarm determina el flujo a partir de las descripciones de agentes y la tarea. Todo el intercambio de datos ocurre vía &lt;code&gt;tool_context.invocation_state&lt;/code&gt;, la misma API de &lt;code&gt;ToolContext&lt;/code&gt; usada en agente único, con un almacén diferente.&lt;/p&gt;

&lt;h3&gt;
  
  
  Resultados de Swarm
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Status: COMPLETED
Agents: collector → analyzer → reporter
Time: ~14s
Shared store:
  logs-payment-service: 145,310 bytes
  error_analysis: 135 bytes
  latency_analysis: 70 bytes
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;145KB de logs procesados por tres agentes. Nada de eso entró nunca a ninguna ventana de contexto de LLM.&lt;/p&gt;

&lt;h3&gt;
  
  
  Investigación de Seguimiento
&lt;/h3&gt;

&lt;p&gt;Después de que el swarm se completa, los datos permanecen en el almacén compartido. Un agente investigador separado puede profundizar en servicios específicos sin re-obtener:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# El investigator reutiliza invocation_state poblado por el swarm — sin re-obtención de datos
&lt;/span&gt;&lt;span class="n"&gt;investigator&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;investigator&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;get_error_details&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;analyze_error_patterns&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;MODEL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Cada pregunta resuelve el puntero desde invocation_state y ejecuta análisis en memoria
&lt;/span&gt;&lt;span class="nf"&gt;investigator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;¿Qué servicio tuvo más errores?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;investigator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Muéstrame los logs de error de cache-layer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;investigator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;¿Qué códigos de estado devuelven esos errores?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# Todas las consultas leen de los mismos 145KB ya en invocation_state — sin re-obtención, sin desbordamiento de contexto
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Cuándo Usar Cada Enfoque
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Agente único + &lt;code&gt;agent.state&lt;/code&gt;&lt;/strong&gt; — pipelines lineales donde un agente maneja obtención + análisis + reporte. Usa &lt;a href="https://strandsagents.com/docs/user-guide/concepts/tools/custom-tools/?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el" rel="noopener noreferrer"&gt;&lt;code&gt;ToolContext&lt;/code&gt;&lt;/a&gt; para acceder a &lt;code&gt;tool_context.agent.state&lt;/code&gt; desde herramientas.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Swarm + &lt;code&gt;invocation_state&lt;/code&gt;&lt;/strong&gt; — roles especializados, flujos complejos, o cuando quieres coordinación autónoma. Usa &lt;a href="https://strandsagents.com/docs/user-guide/concepts/tools/custom-tools/?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el" rel="noopener noreferrer"&gt;&lt;code&gt;ToolContext&lt;/code&gt;&lt;/a&gt; para acceder a &lt;code&gt;tool_context.invocation_state&lt;/code&gt;: la API oficial de Strands para &lt;a href="https://strandsagents.com/docs/user-guide/concepts/multi-agent/multi-agent-patterns/?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el" rel="noopener noreferrer"&gt;intercambio de datos multi-agente&lt;/a&gt;. El &lt;a href="https://strandsagents.com/docs/user-guide/concepts/multi-agent/swarm/?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el" rel="noopener noreferrer"&gt;Swarm&lt;/a&gt; gestiona traspasos, timeouts y detección de traspasos repetitivos.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ambos&lt;/strong&gt; — usa &lt;a href="https://strandsagents.com/docs/user-guide/concepts/agents/conversation-management/?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el" rel="noopener noreferrer"&gt;&lt;code&gt;SlidingWindowConversationManager&lt;/code&gt;&lt;/a&gt; como protección adicional. Recorta automáticamente el historial de conversación y maneja &lt;code&gt;ContextWindowOverflowException&lt;/code&gt; con reintento.&lt;/p&gt;

&lt;p&gt;Estos enfoques son parte de &lt;strong&gt;ingeniería de contexto&lt;/strong&gt; para agentes de IA: la práctica de decidir qué información entra a la ventana de contexto del LLM y cuándo.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pruébalo Tú Mismo
&lt;/h2&gt;

&lt;p&gt;Necesitas &lt;a href="https://python.org/downloads" rel="noopener noreferrer"&gt;Python 3.9+&lt;/a&gt;, &lt;a href="https://docs.astral.sh/uv/" rel="noopener noreferrer"&gt;uv&lt;/a&gt;, y una &lt;a href="https://platform.openai.com/api-keys" rel="noopener noreferrer"&gt;clave API de OpenAI&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/aws-samples/sample-why-agents-fail
&lt;span class="nb"&gt;cd &lt;/span&gt;sample-why-agents-fail/stop-ai-agents-wasting-tokens/01-context-overflow-demo
uv venv &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; uv pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"tu-clave-aquí"&lt;/span&gt;

uv run python test_context_overflow.py   &lt;span class="c"&gt;# Agente único: 4 escenarios&lt;/span&gt;
uv run python swarm_demo.py              &lt;span class="c"&gt;# Multi-agente: Collector → Analyzer → Reporter&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;O abre &lt;code&gt;test_context_overflow.ipynb&lt;/code&gt; en &lt;a href="https://kiro.dev?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el" rel="noopener noreferrer"&gt;Kiro&lt;/a&gt;, VS Code, o tu entorno de notebook preferido.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusiones Clave
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;El desbordamiento de contexto es silencioso&lt;/strong&gt; — los agentes no fallan, producen resultados incorrectos&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Los punteros de memoria lo solucionan&lt;/strong&gt; — almacena datos grandes externamente, pasa referencias&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reducción de &amp;gt;16,000x en tokens&lt;/strong&gt; — validado por IBM Research en el benchmark de Ciencia de Materiales&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agente único usa &lt;code&gt;agent.state&lt;/code&gt;&lt;/strong&gt; — &lt;code&gt;@tool(context=True)&lt;/code&gt; + &lt;code&gt;ToolContext&lt;/code&gt; para almacenar y recuperar datos fuera del contexto&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-agente usa &lt;code&gt;invocation_state&lt;/code&gt;&lt;/strong&gt; — misma API de &lt;code&gt;ToolContext&lt;/code&gt;, compartida entre todos los agentes en el Swarm. No se necesita código de orquestación&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Los datos persisten para seguimiento&lt;/strong&gt; — después de que el pipeline se completa, los datos almacenados están disponibles para investigación sin re-obtención&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Preguntas Frecuentes
&lt;/h2&gt;

&lt;h3&gt;
  
  
  ¿Por qué los agentes de IA se quedan sin contexto?
&lt;/h3&gt;

&lt;p&gt;Los agentes de IA se quedan sin contexto cuando las respuestas de herramientas se inyectan directamente en el historial de conversación del LLM. Cada respuesta consume tokens. Cuando las salidas acumuladas de herramientas exceden el límite de ventana de contexto del modelo, el LLM pierde contexto anterior, trunca datos o falla por completo. Esto sucede silenciosamente: el agente parece funcionar pero produce resultados incompletos o incorrectos.&lt;/p&gt;

&lt;h3&gt;
  
  
  ¿Qué es el Patrón de Puntero de Memoria para agentes de IA?
&lt;/h3&gt;

&lt;p&gt;El Patrón de Puntero de Memoria almacena salidas grandes de herramientas (logs, datasets, resultados de consultas) en estado externo en lugar de en la ventana de contexto del LLM. Las herramientas devuelven una clave de referencia corta (el "puntero") que herramientas subsiguientes usan para recuperar los datos completos. IBM Research validó este patrón con una reducción de más de 16,000x en el benchmark de Ciencia de Materiales.&lt;/p&gt;

&lt;h3&gt;
  
  
  ¿En qué se diferencia agent.state de invocation_state en Strands Agents?
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;agent.state&lt;/code&gt; tiene alcance para una sola instancia de agente. Úsalo para pipelines lineales donde un agente maneja todos los pasos. &lt;code&gt;invocation_state&lt;/code&gt; se comparte entre todos los agentes en un Strands Swarm. Úsalo cuando múltiples agentes especializados necesitan intercambiar datos sin pasar cargas grandes a través del contexto del LLM.&lt;/p&gt;

&lt;h3&gt;
  
  
  ¿Puedo usar el Patrón de Puntero de Memoria con LangGraph u otros frameworks?
&lt;/h3&gt;

&lt;p&gt;Sí. El patrón requiere dos capacidades: un almacén clave-valor compartido accesible desde herramientas, y la capacidad de pasar cadenas de referencia cortas a través del contexto del LLM. LangGraph proporciona esto a través de su gestión de estado, AutoGen a través de memoria compartida, y CrewAI a través de contexto de tareas. La implementación de Strands usa &lt;code&gt;ToolContext&lt;/code&gt; como la API nativa.&lt;/p&gt;

&lt;h2&gt;
  
  
  Referencias
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Investigación
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://arxiv.org/html/2511.22729v1" rel="noopener noreferrer"&gt;Solving Context Window Overflow in AI Agents&lt;/a&gt; — IBM Research, Nov 2025&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://arxiv.org/pdf/2412.05449" rel="noopener noreferrer"&gt;Towards Effective GenAI Multi-Agent Collaboration&lt;/a&gt; — Amazon, Dec 2024&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://airbyte.com/agentic-data/context-window-limit" rel="noopener noreferrer"&gt;Context Window Limits Explained&lt;/a&gt; — Airbyte blog (observación comunitaria), Dec 2025&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://arxiv.org/html/2511.03728v1" rel="noopener noreferrer"&gt;Efficient On-Device Agents via Adaptive Context Management&lt;/a&gt; — Nov 2025&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Implementación
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://strandsagents.com/docs/user-guide/concepts/agents/state/?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el" rel="noopener noreferrer"&gt;Strands Agent State&lt;/a&gt; — ToolContext and agent.state&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://strandsagents.com/docs/user-guide/concepts/multi-agent/swarm/?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el" rel="noopener noreferrer"&gt;Strands Swarm&lt;/a&gt; — Multi-agent orchestration&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://strandsagents.com/docs/user-guide/concepts/agents/conversation-management/?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el" rel="noopener noreferrer"&gt;Strands Conversation Management&lt;/a&gt; — Sliding window and context overflow&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;¿Has alcanzado límites de ventana de contexto en tus agentes? ¿Qué estrategias funcionaron para ti? Comparte en los comentarios.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Siguiente en esta serie:&lt;/strong&gt; &lt;a href="https://github.com/aws-samples/sample-why-agents-fail/tree/main/stop-ai-agents-wasting-tokens/02-mcp-timeout-demo" rel="noopener noreferrer"&gt;Herramientas MCP Que Nunca Responden&lt;/a&gt; — patrones asíncronos para APIs externas lentas.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Todo el código en esta serie es open source bajo la Licencia MIT-0. &lt;a href="https://github.com/aws-samples/sample-why-agents-fail" rel="noopener noreferrer"&gt;Dale estrella al repositorio&lt;/a&gt; para seguir las actualizaciones.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;Gracias!&lt;/p&gt;

&lt;p&gt;🇻🇪&lt;a href="https://dev.to/elizabethfuentes12"&gt;Dev.to&lt;/a&gt; - &lt;a href="https://www.linkedin.com/in/lizfue/" rel="noopener noreferrer"&gt;Linkedin&lt;/a&gt; - &lt;a href="https://github.com/elizabethfuentes12/" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; - &lt;a href="https://twitter.com/elizabethfue12" rel="noopener noreferrer"&gt;Twitter&lt;/a&gt; - &lt;a href="https://www.instagram.com/elifue.tech" rel="noopener noreferrer"&gt;Instagram&lt;/a&gt; - &lt;a href="https://www.youtube.com/channel/UCr0Gnc-t30m4xyrvsQpNp2Q" rel="noopener noreferrer"&gt;Youtube&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>tutorial</category>
      <category>python</category>
      <category>programming</category>
    </item>
    <item>
      <title>Built-in Token Counting: Telemetry for Production AI Agents</title>
      <dc:creator>Elizabeth Fuentes L</dc:creator>
      <pubDate>Wed, 13 May 2026 07:00:00 +0000</pubDate>
      <link>https://forem.com/aws/built-in-token-counting-telemetry-for-production-ai-agents-26ic</link>
      <guid>https://forem.com/aws/built-in-token-counting-telemetry-for-production-ai-agents-26ic</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;Strands Agents provides native telemetry and cost tracking out of the box. Stop writing custom token counters. &lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Building AI agents is easy. &lt;strong&gt;Deploying them to production&lt;/strong&gt; is where most teams hit a wall.&lt;/p&gt;

&lt;p&gt;One of the first questions from finance: &lt;em&gt;"How much will this cost per request?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Most agent frameworks make you build your own token counter. Strands Agents gives you one.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem with Custom Token Counting
&lt;/h2&gt;

&lt;p&gt;Every AI application needs cost monitoring. But tracking tokens across:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Multiple model calls&lt;/li&gt;
&lt;li&gt;Tool invocations&lt;/li&gt;
&lt;li&gt;Prompt caching&lt;/li&gt;
&lt;li&gt;Multi-agent workflows&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;...requires custom infrastructure most teams rebuild from scratch.&lt;/p&gt;

&lt;h2&gt;
  
  
  Native Telemetry in &lt;a href="https://strandsagents.com/?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el" rel="noopener noreferrer"&gt;Strands Agents&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;Strands Agents includes &lt;a href="https://strandsagents.com/docs/user-guide/observability-evaluation/metrics/?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el" rel="noopener noreferrer"&gt;production-grade telemetry&lt;/a&gt; by default:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands_tools&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;calculator&lt;/span&gt;

&lt;span class="c1"&gt;# Create an agent with tools
&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;calculator&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="c1"&gt;# Invoke the agent with a prompt and get an AgentResult
&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What is the square root of 144?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Access metrics through the AgentResult
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Total tokens: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;accumulated_usage&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;totalTokens&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Execution time: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cycle_durations&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; seconds&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Tools used: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_metrics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;keys&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Cache metrics (when available)
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;cacheReadInputTokens&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;accumulated_usage&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Cache read tokens: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;accumulated_usage&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;cacheReadInputTokens&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;cacheWriteInputTokens&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;accumulated_usage&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Cache write tokens: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;accumulated_usage&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;cacheWriteInputTokens&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;&lt;strong&gt;No configuration. No custom code. It just works.&lt;/strong&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  What You Get
&lt;/h2&gt;

&lt;p&gt;Every &lt;code&gt;AgentResult&lt;/code&gt; includes:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;inputTokens&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Tokens sent to the model&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;outputTokens&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Tokens generated by the model&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;totalTokens&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Total cost (input + output)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;cacheReadInputTokens&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Tokens read from cache (Bedrock prompt caching)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;cacheWriteInputTokens&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Tokens written to cache&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;h2&gt;
  
  
  Multi-Agent Token Tracking
&lt;/h2&gt;

&lt;p&gt;For multi-agent systems (executor → validator → critic), aggregate metrics across all agents:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands.multiagent&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Swarm&lt;/span&gt;

&lt;span class="n"&gt;swarm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Swarm&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;executor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;validator&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;critic&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;swarm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;total_tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;node_result&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;values&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;usage&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;node_result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;accumulated_usage&lt;/span&gt;
    &lt;span class="n"&gt;total_tokens&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;totalTokens&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Total cost across all agents: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;total_tokens&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h2&gt;
  
  
  Per-Cycle Tracking
&lt;/h2&gt;

&lt;p&gt;For agents that run multiple reasoning cycles, track tokens per cycle:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands_tools&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;calculator&lt;/span&gt;

&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;calculator&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="c1"&gt;# First invocation
&lt;/span&gt;&lt;span class="n"&gt;result1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What is 5 + 3?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Second invocation
&lt;/span&gt;&lt;span class="n"&gt;result2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What is the square root of 144?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Access metrics for the latest invocation
&lt;/span&gt;&lt;span class="n"&gt;latest_invocation&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;latest_agent_invocation&lt;/span&gt;
&lt;span class="n"&gt;cycles&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;latest_invocation&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cycles&lt;/span&gt;
&lt;span class="n"&gt;usage&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;latest_invocation&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;usage&lt;/span&gt;

&lt;span class="c1"&gt;# Or access all invocations
&lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;invocation&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;agent_invocations&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Invocation usage: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;invocation&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;cycle&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;invocation&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cycles&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  Cycle &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;cycle&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;event_loop_cycle_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;cycle&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Or print the summary (includes all invocations)
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_summary&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;For a complete list of attributes and their types, see the &lt;a href="https://strandsagents.com/docs/api/python/strands.telemetry.metrics/#EventLoopMetrics/?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el" rel="noopener noreferrer"&gt;EventLoopMetrics&lt;/a&gt; API reference.&lt;/p&gt;
&lt;h2&gt;
  
  
  Why This Matters
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Cost visibility&lt;/strong&gt; is the difference between a prototype and production AI.&lt;/p&gt;

&lt;p&gt;With Strands telemetry:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ Budget AI workloads before deployment&lt;/li&gt;
&lt;li&gt;✅ Identify expensive queries in production&lt;/li&gt;
&lt;li&gt;✅ Optimize prompts with real token data&lt;/li&gt;
&lt;li&gt;✅ Track prompt caching savings&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All without writing a single line of telemetry code.&lt;/p&gt;
&lt;h2&gt;
  
  
  Works with All Model Providers
&lt;/h2&gt;

&lt;p&gt;Token tracking works regardless of your model provider:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Amazon Bedrock (Claude, Llama, Mistral)&lt;/li&gt;
&lt;li&gt;OpenAI (GPT-4, GPT-3.5)&lt;/li&gt;
&lt;li&gt;Anthropic API&lt;/li&gt;
&lt;li&gt;Ollama (local models)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Same API, same metrics, zero config changes.&lt;/p&gt;
&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;strands-agents
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Full documentation: &lt;a href="https://strandsagents.com/docs/user-guide/concepts/agents/?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el" rel="noopener noreferrer"&gt;strandsagents.com/docs/user-guide/concepts/agents/&lt;/a&gt;&lt;/p&gt;



&lt;p&gt;Gracias!&lt;/p&gt;

&lt;p&gt;🇻🇪🇨🇱 &lt;a href="https://dev.to/elizabethfuentes12"&gt;Dev.to&lt;/a&gt; &lt;a href="https://www.linkedin.com/in/lizfue/" rel="noopener noreferrer"&gt;Linkedin&lt;/a&gt; &lt;a href="https://github.com/elizabethfuentes12/" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; &lt;a href="https://twitter.com/elizabethfue12" rel="noopener noreferrer"&gt;Twitter&lt;/a&gt; &lt;a href="https://www.instagram.com/elifue.tech" rel="noopener noreferrer"&gt;Instagram&lt;/a&gt; &lt;a href="https://www.youtube.com/channel/UCr0Gnc-t30m4xyrvsQpNp2Q" rel="noopener noreferrer"&gt;Youtube&lt;/a&gt;&lt;/p&gt;


&lt;div class="ltag__user ltag__user__id__717518"&gt;
    &lt;a href="/elizabethfuentes12" class="ltag__user__link profile-image-link"&gt;
      &lt;div class="ltag__user__pic"&gt;
        &lt;img src="https://media2.dev.to/dynamic/image/width=150,height=150,fit=cover,gravity=auto,format=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F717518%2Fb550b165-b8b9-405d-acfb-e5dc846765b0.png" alt="elizabethfuentes12 image"&gt;
      &lt;/div&gt;
    &lt;/a&gt;
  &lt;div class="ltag__user__content"&gt;
    &lt;h2&gt;
&lt;a class="ltag__user__link" href="/elizabethfuentes12"&gt;Elizabeth Fuentes L&lt;/a&gt;Follow
&lt;/h2&gt;
    &lt;div class="ltag__user__summary"&gt;
      &lt;a class="ltag__user__link" href="/elizabethfuentes12"&gt;I help developers build production-ready AI applications through hands-on tutorials and open-source projects.&lt;/a&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;



</description>
      <category>ai</category>
      <category>python</category>
      <category>opensource</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Cómo Monitorear Costos de Agentes IA sin Configuración</title>
      <dc:creator>Elizabeth Fuentes L</dc:creator>
      <pubDate>Wed, 13 May 2026 07:00:00 +0000</pubDate>
      <link>https://forem.com/aws-espanol/como-monitorear-costos-de-agentes-ia-sin-configuracion-1fca</link>
      <guid>https://forem.com/aws-espanol/como-monitorear-costos-de-agentes-ia-sin-configuracion-1fca</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;Strands Agents proporciona telemetría nativa y seguimiento de costos desde el primer momento. Deja de escribir contadores de tokens personalizados. &lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Construir agentes de IA es fácil. &lt;strong&gt;Desplegarlos a producción&lt;/strong&gt; es donde la mayoría de los equipos se encuentran con un muro.&lt;/p&gt;

&lt;p&gt;Una de las primeras preguntas de finanzas: &lt;em&gt;"¿Cuánto costará esto por solicitud?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;La mayoría de los frameworks de agentes te obligan a construir tu propio contador de tokens. Strands Agents te proporciona uno.&lt;/p&gt;

&lt;h2&gt;
  
  
  El Problema con el Conteo Personalizado de Tokens
&lt;/h2&gt;

&lt;p&gt;Cada aplicación de IA necesita monitoreo de costos. Pero rastrear tokens a través de:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Múltiples llamadas al modelo&lt;/li&gt;
&lt;li&gt;Invocaciones de herramientas&lt;/li&gt;
&lt;li&gt;Caché de prompts&lt;/li&gt;
&lt;li&gt;Flujos de trabajo multi-agente&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;...requiere infraestructura personalizada que la mayoría de los equipos reconstruyen desde cero.&lt;/p&gt;

&lt;h2&gt;
  
  
  Telemetría Nativa en &lt;a href="https://strandsagents.com/?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el" rel="noopener noreferrer"&gt;Strands Agents&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;Strands Agents incluye &lt;a href="https://strandsagents.com/docs/user-guide/observability-evaluation/metrics/?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el" rel="noopener noreferrer"&gt;telemetría de grado de producción&lt;/a&gt; por defecto:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands_tools&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;calculator&lt;/span&gt;

&lt;span class="c1"&gt;# Crear un agente con herramientas
&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;calculator&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="c1"&gt;# Invocar el agente con un prompt y obtener un AgentResult
&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;¿Cuál es la raíz cuadrada de 144?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Acceder a métricas a través del AgentResult
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Tokens totales: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;accumulated_usage&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;totalTokens&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Tiempo de ejecución: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cycle_durations&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; segundos&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Herramientas usadas: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_metrics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;keys&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Métricas de caché (cuando estén disponibles)
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;cacheReadInputTokens&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;accumulated_usage&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Tokens leídos de caché: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;accumulated_usage&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;cacheReadInputTokens&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;cacheWriteInputTokens&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;accumulated_usage&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Tokens escritos en caché: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;accumulated_usage&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;cacheWriteInputTokens&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;&lt;strong&gt;Sin configuración. Sin código personalizado. Simplemente funciona.&lt;/strong&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Lo Que Obtienes
&lt;/h2&gt;

&lt;p&gt;Cada &lt;code&gt;AgentResult&lt;/code&gt; incluye:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Métrica&lt;/th&gt;
&lt;th&gt;Descripción&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;inputTokens&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Tokens enviados al modelo&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;outputTokens&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Tokens generados por el modelo&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;totalTokens&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Costo total (entrada + salida)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;cacheReadInputTokens&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Tokens leídos desde caché (caché de prompts de Bedrock)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;cacheWriteInputTokens&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Tokens escritos en caché&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;h2&gt;
  
  
  Seguimiento de Tokens Multi-Agente
&lt;/h2&gt;

&lt;p&gt;Para sistemas multi-agente (ejecutor → validador → crítico), agrega métricas a través de todos los agentes:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands.multiagent&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Swarm&lt;/span&gt;

&lt;span class="n"&gt;swarm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Swarm&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;executor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;validator&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;critic&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;swarm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Consulta&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;total_tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;node_result&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;values&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;usage&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;node_result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;accumulated_usage&lt;/span&gt;
    &lt;span class="n"&gt;total_tokens&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;totalTokens&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Costo total a través de todos los agentes: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;total_tokens&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h2&gt;
  
  
  Seguimiento por Ciclo
&lt;/h2&gt;

&lt;p&gt;Para agentes que ejecutan múltiples ciclos de razonamiento, rastrea tokens por ciclo:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands_tools&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;calculator&lt;/span&gt;

&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;calculator&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="c1"&gt;# Primera invocación
&lt;/span&gt;&lt;span class="n"&gt;result1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;¿Cuánto es 5 + 3?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Segunda invocación
&lt;/span&gt;&lt;span class="n"&gt;result2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;¿Cuál es la raíz cuadrada de 144?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Acceder a métricas de la última invocación
&lt;/span&gt;&lt;span class="n"&gt;latest_invocation&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;latest_agent_invocation&lt;/span&gt;
&lt;span class="n"&gt;cycles&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;latest_invocation&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cycles&lt;/span&gt;
&lt;span class="n"&gt;usage&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;latest_invocation&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;usage&lt;/span&gt;

&lt;span class="c1"&gt;# O acceder a todas las invocaciones
&lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;invocation&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;agent_invocations&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Uso de invocación: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;invocation&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;cycle&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;invocation&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cycles&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  Ciclo &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;cycle&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;event_loop_cycle_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;cycle&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# O imprimir el resumen (incluye todas las invocaciones)
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_summary&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Para una lista completa de atributos y sus tipos, consulta la referencia de API de &lt;a href="https://strandsagents.com/docs/api/python/strands.telemetry.metrics/#EventLoopMetrics/?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el" rel="noopener noreferrer"&gt;EventLoopMetrics&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;
  
  
  Por Qué Esto Importa
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;La visibilidad de costos&lt;/strong&gt; es la diferencia entre un prototipo y una IA en producción.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F38wu531cmi145pfs5vva.png%2F%3Ftrk%3D87c4c426-cddf-4799-a299-273337552ad8%26sc_channel%3Del" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F38wu531cmi145pfs5vva.png%2F%3Ftrk%3D87c4c426-cddf-4799-a299-273337552ad8%26sc_channel%3Del" alt="Casos de Uso" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Con la telemetría de Strands:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ Presupuesta cargas de trabajo de IA antes del despliegue&lt;/li&gt;
&lt;li&gt;✅ Identifica consultas costosas en producción&lt;/li&gt;
&lt;li&gt;✅ Optimiza prompts con datos reales de tokens&lt;/li&gt;
&lt;li&gt;✅ Rastrea ahorros del caché de prompts&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Todo sin escribir una sola línea de código de telemetría.&lt;/p&gt;
&lt;h2&gt;
  
  
  Funciona con Todos los Proveedores de Modelos
&lt;/h2&gt;

&lt;p&gt;El seguimiento de tokens funciona independientemente de tu proveedor de modelo:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Amazon Bedrock (Claude, Llama, Mistral)&lt;/li&gt;
&lt;li&gt;OpenAI (GPT-4, GPT-3.5)&lt;/li&gt;
&lt;li&gt;Anthropic API&lt;/li&gt;
&lt;li&gt;Ollama (modelos locales)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Misma API, mismas métricas, cero cambios de configuración.&lt;/p&gt;
&lt;h2&gt;
  
  
  Pruébalo
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;strands-agents
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Documentación completa: &lt;a href="https://strandsagents.com/docs/user-guide/concepts/agents/?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el" rel="noopener noreferrer"&gt;strandsagents.com/docs/user-guide/concepts/agents/&lt;/a&gt;&lt;/p&gt;



&lt;p&gt;Gracias!&lt;/p&gt;

&lt;p&gt;🇻🇪🇨🇱 &lt;a href="https://dev.to/elizabethfuentes12"&gt;Dev.to&lt;/a&gt; &lt;a href="https://www.linkedin.com/in/lizfue/" rel="noopener noreferrer"&gt;Linkedin&lt;/a&gt; &lt;a href="https://github.com/elizabethfuentes12/" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; &lt;a href="https://twitter.com/elizabethfue12" rel="noopener noreferrer"&gt;Twitter&lt;/a&gt; &lt;a href="https://www.instagram.com/elifue.tech" rel="noopener noreferrer"&gt;Instagram&lt;/a&gt; &lt;a href="https://www.youtube.com/channel/UCr0Gnc-t30m4xyrvsQpNp2Q" rel="noopener noreferrer"&gt;Youtube&lt;/a&gt;&lt;/p&gt;


&lt;div class="ltag__user ltag__user__id__717518"&gt;
    &lt;a href="/elizabethfuentes12" class="ltag__user__link profile-image-link"&gt;
      &lt;div class="ltag__user__pic"&gt;
        &lt;img src="https://media2.dev.to/dynamic/image/width=150,height=150,fit=cover,gravity=auto,format=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F717518%2Fb550b165-b8b9-405d-acfb-e5dc846765b0.png" alt="elizabethfuentes12 image"&gt;
      &lt;/div&gt;
    &lt;/a&gt;
  &lt;div class="ltag__user__content"&gt;
    &lt;h2&gt;
&lt;a class="ltag__user__link" href="/elizabethfuentes12"&gt;Elizabeth Fuentes L&lt;/a&gt;Follow
&lt;/h2&gt;
    &lt;div class="ltag__user__summary"&gt;
      &lt;a class="ltag__user__link" href="/elizabethfuentes12"&gt;I help developers build production-ready AI applications through hands-on tutorials and open-source projects.&lt;/a&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;



</description>
      <category>ai</category>
      <category>python</category>
      <category>opensource</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Cómo Guiar Asistentes de IA para Construir Agentes Listos para Producción: 8 Patrones Esenciales</title>
      <dc:creator>Elizabeth Fuentes L</dc:creator>
      <pubDate>Mon, 11 May 2026 18:54:52 +0000</pubDate>
      <link>https://forem.com/aws-espanol/como-guiar-asistentes-de-ia-para-construir-agentes-listos-para-produccion-8-patrones-esenciales-1ifd</link>
      <guid>https://forem.com/aws-espanol/como-guiar-asistentes-de-ia-para-construir-agentes-listos-para-produccion-8-patrones-esenciales-1ifd</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;Cuando le pides a un asistente de IA como &lt;a href="https://kiro.dev?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el" rel="noopener noreferrer"&gt;Kiro&lt;/a&gt; (el asistente de IA de AWS), Claude Code o ChatGPT que "construya un agente," obtienes código funcional. Pero no ves las decisiones de arquitectura que ocurren detrás de escena. El agente responde a consultas, pero podría desperdiciar tokens en bucles de razonamiento, alucinar respuestas a partir de datos incompletos, o congelarse con APIs lentas. Estas fallas son silenciosas hasta llegar a producción.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Cuando le pides a asistentes de IA que construyan agentes, toman decisiones de arquitectura silenciosamente—eligiendo estrategias de recuperación, enfoques de validación y patrones de manejo de errores. Estos 8 patrones te dan el vocabulario para especificar decisiones de grado producción en tus prompts, previniendo alucinaciones y desperdicio de tokens antes de que se genere código.&lt;/p&gt;

&lt;p&gt;Este post cierra dos series que escribí documentando las fallas de agentes más costosas en producción: &lt;a href="https://dev.to/aws/stop-ai-agent-hallucinations-4-essential-techniques-2i94?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el"&gt;Stop AI Agent Hallucinations (5 técnicas)&lt;/a&gt; y &lt;a href="https://dev.to/aws/why-ai-agents-fail-3-failure-modes-that-cost-you-tokens-and-time-1flb?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el"&gt;Why AI Agents Fail (3 modos de falla)&lt;/a&gt;. &lt;strong&gt;Si conoces estos 8 patrones, puedes guiar a los asistentes de IA para evitarlos desde el inicio.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Esto no es una guía de implementación paso a paso. Es una referencia para saber qué existe y así reconocer cuándo usar cada patrón según tu caso de uso.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Código funcional para las 8 técnicas:&lt;/strong&gt; Enlazado en cada sección&lt;/p&gt;




&lt;h2&gt;
  
  
  Por Qué Esto Importa
&lt;/h2&gt;

&lt;p&gt;Los asistentes de IA generan código de agentes en segundos. &lt;a href="https://kiro.dev?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el" rel="noopener noreferrer"&gt;Kiro&lt;/a&gt;, Claude Code, Cursor y ChatGPT pueden crear estructuras de herramientas, configurar llamadas a LLM y conectar sistemas de recuperación más rápido que programar manualmente.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pero la velocidad crea un problema: obtienes código funcional sin ver las concesiones.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Cuando escribes "construye un agente de reservas con RAG," el asistente toma decisiones:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;¿Qué estrategia de recuperación? (similitud vectorial, consultas de grafos, híbrido)&lt;/li&gt;
&lt;li&gt;¿Cómo manejar salidas grandes? (truncar, resumir, almacenamiento externo)&lt;/li&gt;
&lt;li&gt;¿Qué validación se ejecuta antes de usar una herramienta? (ninguna, prompts, hooks de framework)&lt;/li&gt;
&lt;li&gt;¿Cómo manejar APIs lentas? (bloquear, timeout, patrones asíncronos)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Tu prompt no especifica esto. El asistente elige valores por defecto. Esos valores por defecto crean los modos de falla que este post documenta.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Los 8 Patrones de Falla (Referencia Rápida)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Fallas por Alucinación (5 patrones):&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;GraphRAG&lt;/strong&gt; - RAG vectorial fabrica estadísticas a partir de fragmentos incompletos&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Semantic Tool Selection&lt;/strong&gt; - Demasiadas herramientas, el agente elige las equivocadas&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Neurosymbolic Guardrails&lt;/strong&gt; - El agente ignora reglas de negocio en los prompts&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Runtime Guardrails (Steering)&lt;/strong&gt; - El agente viola reglas, necesita corrección no bloqueo&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-Agent Validation&lt;/strong&gt; - Un solo agente afirma éxito cuando las operaciones fallan&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Desperdicio Silencioso de Tokens (3 patrones):&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Memory Pointer Pattern&lt;/strong&gt; - Datos grandes desbordan el contexto, causan truncamiento&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Async HandleId Pattern&lt;/strong&gt; - APIs lentas bloquean el agente indefinidamente&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;DebounceHook + Explicit States&lt;/strong&gt; - El agente hace bucle con la misma llamada sin progreso&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;No implementas los 8. Aprendes qué resuelven, luego especificas los que tu caso de uso necesita al hacer prompts.&lt;/p&gt;

&lt;h2&gt;
  
  
  ¿Qué Son Estos 8 Patrones?
&lt;/h2&gt;

&lt;p&gt;Estos patrones resuelven las fallas de producción más costosas: alucinaciones por datos incompletos (GraphRAG, Semantic Tool Selection, Guardrails, Steering, Multi-Agent), y desperdicio silencioso de tokens (Memory Pointers, Async HandleId, DebounceHook). Aprendes qué resuelve cada uno, luego especificas los que tu caso de uso necesita al pedir a asistentes de IA. Esto previene depurar código de caja negra en producción.&lt;/p&gt;

&lt;h2&gt;
  
  
  Impacto Medido en Producción
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Patrón&lt;/th&gt;
&lt;th&gt;Resultado&lt;/th&gt;
&lt;th&gt;Fuente&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;GraphRAG&lt;/td&gt;
&lt;td&gt;Conteos exactos vs aproximaciones fabricadas&lt;/td&gt;
&lt;td&gt;&lt;a href="https://dev.to/aws/rag-vs-graphrag-when-agents-hallucinate-answers-2mcb?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el"&gt;RAG vs GraphRAG&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Semantic Tool Selection&lt;/td&gt;
&lt;td&gt;86.4% menos errores, 89% menos costos de tokens&lt;/td&gt;
&lt;td&gt;&lt;a href="https://dev.to/aws/reduce-agent-errors-and-token-costs-with-semantic-tool-selection-7mf?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el"&gt;Tool Selection&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Memory Pointers&lt;/td&gt;
&lt;td&gt;20M tokens reducidos a 1,234 tokens&lt;/td&gt;
&lt;td&gt;&lt;a href="https://dev.to/aws/ai-context-window-overflow-memory-pointer-fix-22bk?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el"&gt;estudio IBM Materials Science&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Async HandleId&lt;/td&gt;
&lt;td&gt;Bloqueo de 18 segundos eliminado, sin timeouts 424&lt;/td&gt;
&lt;td&gt;&lt;a href="https://dev.to/aws/fix-mcp-timeouts-async-handleid-pattern-8ek?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el"&gt;MCP Timeouts&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Explicit States&lt;/td&gt;
&lt;td&gt;14 llamadas reducidas a 2 (mejora de 7x)&lt;/td&gt;
&lt;td&gt;&lt;a href="https://dev.to/aws/how-to-prevent-ai-agent-reasoning-loops-from-wasting-tokens-2652?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el"&gt;Reasoning Loops&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Patrón 1: GraphRAG para Consultas Precisas
&lt;/h2&gt;

&lt;h3&gt;
  
  
  ¿Qué Es GraphRAG?
&lt;/h3&gt;

&lt;p&gt;GraphRAG reemplaza la similitud vectorial con consultas a bases de datos de grafos para datos estructurados. Cuando tu agente necesita conteos exactos, agregaciones o recorrido de relaciones, GraphRAG traduce lenguaje natural a consultas Cypher que retornan resultados precisos desde datos estructurados en lugar de estadísticas alucinadas desde fragmentos de texto. Úsalo para consultas estructuradas, mantén RAG vectorial para búsqueda semántica.&lt;/p&gt;

&lt;h3&gt;
  
  
  Qué Se Rompe
&lt;/h3&gt;

&lt;p&gt;RAG vectorial fabrica estadísticas. Preguntas "¿Cuántos hoteles en Miami tienen piscina y desayuno?" y la similitud vectorial recupera 3 fragmentos de texto que mencionan piscinas y desayuno. El LLM ve datos incompletos, calcula a partir de muestras y retorna "aproximadamente 120 hoteles" (fabricado a partir de 3 fragmentos de 200 hoteles).&lt;/p&gt;

&lt;p&gt;Las consultas fuera de dominio retornan respuestas alucinadas en lugar de admitir que no existen datos.&lt;/p&gt;

&lt;h3&gt;
  
  
  La Solución
&lt;/h3&gt;

&lt;p&gt;Reemplaza la recuperación vectorial con consultas de grafos para datos estructurados. Almacena hoteles, amenidades y relaciones en Neo4j. El LLM traduce "hoteles con piscinas y desayuno" a Cypher:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cypher"&gt;&lt;code&gt;&lt;span class="k"&gt;MATCH&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="py"&gt;h:&lt;/span&gt;&lt;span class="n"&gt;Hotel&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="ss"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;:HAS_AMENITY&lt;/span&gt;&lt;span class="ss"&gt;]&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="py"&gt;a:&lt;/span&gt;&lt;span class="n"&gt;Amenity&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;a.name&lt;/span&gt; &lt;span class="ow"&gt;IN&lt;/span&gt; &lt;span class="ss"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'pool'&lt;/span&gt;&lt;span class="ss"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'breakfast'&lt;/span&gt;&lt;span class="ss"&gt;]&lt;/span&gt;
&lt;span class="k"&gt;RETURN&lt;/span&gt; &lt;span class="nf"&gt;count&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="n"&gt;h&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Resultado: 133 hoteles (conteo exacto desde la base de datos).&lt;/p&gt;

&lt;p&gt;Consulta fuera de dominio: "No se encontraron hoteles en la Antártida" en lugar de fabricar resultados.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcstm4df3sgeyh6d9zbkb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcstm4df3sgeyh6d9zbkb.png" alt="Vector RAG fabricates statistics from text chunks. GraphRAG returns exact counts from structured database queries" width="800" height="990"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Qué Decirle a Tu Asistente de IA
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"Construye un agente de viajes usando GraphRAG con Neo4j. Para consultas 
estructuradas (hoteles, amenidades, disponibilidad), traduce a Cypher 
y ejecuta contra el grafo. Solo usa RAG vectorial para descripciones 
no estructuradas. Retorna conteos exactos desde recorrido del grafo."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  Cuándo Usar
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Datos estructurados con relaciones (productos, inventario, ubicaciones)&lt;/li&gt;
&lt;li&gt;Consultas que requieren conteos, agregaciones o recorrido multi-salto&lt;/li&gt;
&lt;li&gt;Dominios donde fabricar estadísticas crea riesgo legal/financiero&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Detalles completos:&lt;/strong&gt; &lt;a href="https://dev.to/aws/rag-vs-graphrag-when-agents-hallucinate-answers-2mcb?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el"&gt;RAG vs GraphRAG: When Agents Hallucinate Answers&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Aprende más:&lt;/strong&gt; &lt;a href="https://neo4j.com/docs/cypher-manual/current/" rel="noopener noreferrer"&gt;Documentación Neo4j Cypher&lt;/a&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  Patrón 2: Semantic Tool Selection
&lt;/h2&gt;
&lt;h3&gt;
  
  
  ¿Qué Es Semantic Tool Selection?
&lt;/h3&gt;

&lt;p&gt;Semantic tool selection usa embeddings vectoriales para filtrar herramientas antes de que el LLM las vea. Cuando tu agente tiene más de 10 herramientas, enviar todas las descripciones en cada llamada aumenta las tasas de error (el agente elige herramientas incorrectas) y los costos de tokens (pagando por descripciones no usadas). El filtrado semántico inserta descripciones de herramientas offline, luego en tiempo de ejecución compara la consulta con las 5 herramientas más relevantes, reduciendo errores en 86.4% y costos en 89%.&lt;/p&gt;
&lt;h3&gt;
  
  
  Qué Se Rompe
&lt;/h3&gt;

&lt;p&gt;Con 50 herramientas, ocurren dos fallas: (1) el agente elige herramientas incorrectas porque las descripciones se superponen, y (2) los costos de tokens explotan por enviar las 50 descripciones de herramientas en cada llamada al LLM.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Impacto medido:&lt;/strong&gt; Las tasas de error aumentan con el conteo de herramientas, los costos de tokens escalan linealmente.&lt;/p&gt;
&lt;h3&gt;
  
  
  La Solución
&lt;/h3&gt;

&lt;p&gt;Usa embeddings vectoriales para filtrar herramientas antes de que el LLM las vea. Inserta descripciones de herramientas offline. En tiempo de ejecución, inserta la consulta del usuario, calcula similitud, pasa solo las 5 herramientas más relevantes al agente.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Resultados en producción:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Errores reducidos: 86.4%&lt;/li&gt;
&lt;li&gt;Costos de tokens reducidos: 89%&lt;/li&gt;
&lt;li&gt;Latencia: &amp;lt;10ms para filtrado de herramientas&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Qué Decirle a Tu Asistente de IA
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"Construye un agente multi-herramienta con semantic tool selection. Usa FAISS 
y SentenceTransformers para insertar descripciones de herramientas offline. En 
tiempo de ejecución, inserta la consulta, recupera las 5 herramientas más similares, 
pasa solo esas al agente. Mantén memoria de conversación, intercambia herramientas dinámicamente."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  Cuándo Usar
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Agentes con más de 10 herramientas&lt;/li&gt;
&lt;li&gt;Herramientas con descripciones que se superponen&lt;/li&gt;
&lt;li&gt;Aplicaciones sensibles a costos&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Detalles completos:&lt;/strong&gt; &lt;a href="https://dev.to/aws/reduce-agent-errors-and-token-costs-with-semantic-tool-selection-7mf?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el"&gt;Reduce Agent Errors and Token Costs with Semantic Tool Selection&lt;/a&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  Patrón 3: Neurosymbolic Guardrails (Bloqueo)
&lt;/h2&gt;
&lt;h3&gt;
  
  
  ¿Qué Son Neurosymbolic Guardrails?
&lt;/h3&gt;

&lt;p&gt;Neurosymbolic guardrails aplican reglas de negocio a nivel de framework, por debajo del control del LLM. Cuando los prompts solos no pueden aplicar restricciones (máximo de huéspedes, fechas válidas, límites de presupuesto), los guardrails usan hooks de pre-ejecución para validar parámetros y cancelar operaciones inválidas. Las reglas viven en código, no en prompts, así que el LLM no puede evadirlas. Usa guardrails de bloqueo para restricciones duras que no pueden violarse.&lt;/p&gt;
&lt;h3&gt;
  
  
  Qué Se Rompe
&lt;/h3&gt;

&lt;p&gt;Los prompts no pueden aplicar reglas de negocio. Incluso con docstrings claros ("max_guests debe ser ≤10"), el LLM pasa &lt;code&gt;max_guests=15&lt;/code&gt; bajo presión porque los prompts son sugerencias, no restricciones. El agente viola reglas silenciosamente.&lt;/p&gt;
&lt;h3&gt;
  
  
  La Solución
&lt;/h3&gt;

&lt;p&gt;Usa hooks de framework para validar parámetros antes de la ejecución de herramientas. Si la validación falla, cancela la llamada de herramienta y retorna guía correctiva. Las reglas viven en código a nivel de framework, por debajo del control del LLM.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Impacto medido:&lt;/strong&gt; Cero violaciones en prueba de 100 consultas (vs. 12 violaciones solo con prompts).&lt;/p&gt;
&lt;h3&gt;
  
  
  Qué Decirle a Tu Asistente de IA
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"Construye un agente de reservas con guardrails usando hooks de Strands Agents. 
Crea un hook BeforeToolCallEvent que valide:
- max_guests ≤ 10
- check_in_date &amp;gt; hoy
- budget &amp;gt; 0

Si la validación falla, cancela la llamada de herramienta con event.cancel_tool() 
y retorna mensaje de error. No confíes en prompts para validación."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1jlee9gupk90w1xg00hx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1jlee9gupk90w1xg00hx.png" alt="Prompts can be ignored by LLM (top layer). Framework hooks enforce rules at code level (bottom layer, unbypassable)" width="800" height="651"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Cuándo Usar
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Reglas de negocio que no pueden violarse (cumplimiento, legales, financieras)&lt;/li&gt;
&lt;li&gt;Validación que requiere cálculo (matemáticas de fechas, verificaciones de inventario)&lt;/li&gt;
&lt;li&gt;Reglas que cambian frecuentemente&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Detalles completos:&lt;/strong&gt; &lt;a href="https://dev.to/aws/ai-agent-guardrails-rules-that-llms-cannot-bypass-596d?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el"&gt;AI Agent Guardrails: Rules That LLMs Cannot Bypass&lt;/a&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  Patrón 4: Runtime Guardrails (Dirigir, No Bloquear)
&lt;/h2&gt;
&lt;h3&gt;
  
  
  ¿Qué Es Dirigir vs Bloquear?
&lt;/h3&gt;

&lt;p&gt;Steering guardrails retornan guía correctiva en lugar de bloquear operaciones. Cuando el agente viola una regla suave (problemas de formato, ajustes de parámetros, redacción de datos), el steering retorna instrucciones vía Guide() para que el agente se autocorrija y reintente. Esto difiere de los guardrails de bloqueo (Patrón 3) que detienen flujos de trabajo completamente. Usa steering para reglas donde el agente puede corregirse, bloqueo para restricciones duras.&lt;/p&gt;
&lt;h3&gt;
  
  
  Qué Se Rompe
&lt;/h3&gt;

&lt;p&gt;Los guardrails duros (Patrón 3) bloquean operaciones y detienen flujos de trabajo. Para reglas suaves donde el agente puede autocorregirse (problemas de formato, ajustes de parámetros, redactar datos sensibles), el bloqueo crea fricción. El agente podría arreglar el problema por sí mismo si se le da guía.&lt;/p&gt;
&lt;h3&gt;
  
  
  La Solución
&lt;/h3&gt;

&lt;p&gt;Usa &lt;a href="https://github.com/agentcontrol/agent-control" rel="noopener noreferrer"&gt;Agent Control&lt;/a&gt; para retornar guía correctiva vía &lt;code&gt;Guide()&lt;/code&gt; en lugar de bloquear. Cuando el agente viola una regla suave, el plano de control retorna instrucciones: "Ajusta el parámetro X a Y y reintenta." El agente se autocorrige y completa la tarea sin intervención humana.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Diferencia con el Patrón 3:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Bloquear (Patrón 3):&lt;/strong&gt; Restricciones duras, el flujo de trabajo se detiene&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dirigir (Patrón 4):&lt;/strong&gt; Reglas suaves, el agente se autocorrige&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Qué Decirle a Tu Asistente de IA
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"Construye un agente de reservas con Agent Control para reglas suaves. Conéctate 
al servidor Agent Control. Para reglas suaves (formato de parámetros, ajustes 
de fecha, redacción de datos), retorna Guide() con instrucciones de corrección 
en lugar de bloquear. El agente debe reintentar con la corrección aplicada.

Usa bloqueos duros (Patrón 3) solo para reglas de cumplimiento que no pueden 
violarse bajo ninguna circunstancia."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  Cuándo Usar
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Reglas donde el agente puede autocorregirse (formato, ajustar parámetros)&lt;/li&gt;
&lt;li&gt;Flujos de trabajo donde el bloqueo crea UX pobre&lt;/li&gt;
&lt;li&gt;Reglas gestionadas centralmente vía API/dashboard (actualizar sin redesplegar)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Detalles completos:&lt;/strong&gt; &lt;a href="https://dev.to/aws/runtime-guardrails-for-ai-agents-steer-dont-block-278n?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el"&gt;Runtime Guardrails for AI Agents: Steer, Don't Block&lt;/a&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  Patrón 5: Multi-Agent Validation
&lt;/h2&gt;
&lt;h3&gt;
  
  
  ¿Qué Es Multi-Agent Validation?
&lt;/h3&gt;

&lt;p&gt;Multi-agent validation despliega agentes especializados con diferentes roles (Executor, Validator, Critic) que verifican cruzadamente el trabajo de los demás. Los agentes únicos optimizan para parecer exitosos, no verificar resultados. Múltiples agentes con diferentes funciones de optimización atrapan errores que los demás pierden. El Executor realiza tareas, el Validator verifica contra la verdad fundamental, el Critic proporciona revisión final antes de retornar al usuario.&lt;/p&gt;
&lt;h3&gt;
  
  
  Qué Se Rompe
&lt;/h3&gt;

&lt;p&gt;Los agentes únicos no pueden autovalidarse. Cuando un agente reserva un hotel, afirma "Éxito: Reservado Grand Plaza Hotel" incluso si la API retornó un error o el hotel no existe en la base de datos. El agente optimiza para parecer exitoso, no verificar resultados.&lt;/p&gt;
&lt;h3&gt;
  
  
  La Solución
&lt;/h3&gt;

&lt;p&gt;Despliega múltiples agentes con diferentes roles: el Executor realiza tareas, el Validator verifica contra la verdad fundamental, el Critic proporciona revisión final. Los agentes comparten contexto y transfieren control autónomamente cuando su rol se completa.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Impacto medido:&lt;/strong&gt; Multi-agente atrapa errores que el agente único pierde (p.ej., reservar hoteles inexistentes).&lt;/p&gt;
&lt;h3&gt;
  
  
  Qué Decirle a Tu Asistente de IA
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"Construye un sistema multi-agente usando Strands Swarm con 3 agentes:
1. Executor: Reserva hoteles, busca vuelos
2. Validator: Verifica cruzadamente operaciones contra la base de datos
3. Critic: Revisión final antes de retornar al usuario

Los agentes comparten contexto vía swarm.context. Usa transferencias autónomas. 
Los agentes deciden cuándo transferir según completación de tarea."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  Cuándo Usar
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Operaciones de alto riesgo (financieras, médicas, legales)&lt;/li&gt;
&lt;li&gt;Tareas donde "parece exitoso" difiere de "realmente exitoso"&lt;/li&gt;
&lt;li&gt;Flujos de trabajo complejos con múltiples puntos de verificación&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Detalles completos:&lt;/strong&gt; &lt;a href="https://dev.to/aws/how-to-stop-ai-agents-from-hallucinating-silently-with-multi-agent-validation-3f7e?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el"&gt;How to Stop AI Agents from Hallucinating Silently with Multi-Agent Validation&lt;/a&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  Patrón 6: Memory Pointer Pattern
&lt;/h2&gt;
&lt;h3&gt;
  
  
  ¿Qué Es el Memory Pointer Pattern?
&lt;/h3&gt;

&lt;p&gt;El Memory Pointer Pattern almacena datos grandes fuera del contexto del LLM y pasa referencias cortas en su lugar. Cuando las herramientas retornan logs de más de 200KB o resultados de base de datos de 1000 filas, pasarlos directamente causa truncamiento silencioso. Los memory pointers almacenan datos en agent.state, retornan un puntero al LLM y proporcionan herramientas separadas que resuelven punteros para acceder a datos completos. IBM redujo de 20M tokens a 1,234 tokens usando este patrón.&lt;/p&gt;
&lt;h3&gt;
  
  
  Qué Se Rompe
&lt;/h3&gt;

&lt;p&gt;El desbordamiento de ventana de contexto ocurre cuando las herramientas retornan más datos de los que el LLM puede procesar (logs de más de 200KB, resultados de base de datos de 1000 filas). El agente no colapsa. Trunca datos silenciosamente, pierde contexto, produce respuestas incompletas.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Caso real de producción (IBM Materials Science):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Antes: 20 millones de tokens, flujo de trabajo falló&lt;/li&gt;
&lt;li&gt;Después: 1,234 tokens, flujo de trabajo exitoso&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  La Solución
&lt;/h3&gt;

&lt;p&gt;Almacena datos grandes en &lt;code&gt;agent.state&lt;/code&gt;, pasa referencias cortas al LLM. Las herramientas retornan punteros como &lt;code&gt;"logs-app-server"&lt;/code&gt;. Las herramientas subsiguientes resuelven punteros para acceder a datos completos. El LLM solo ve: "Datos almacenados como logs-app-server. Usa analyze_errors(pointer)."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Datos en contexto reducidos:&lt;/strong&gt; 214KB → 52 bytes&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsbzqsm6ml5qo4e5lkkjs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsbzqsm6ml5qo4e5lkkjs.png" alt="Before: 20M tokens overflow context. After: Memory pointer reduces to 1,234 tokens, full data stored externally" width="800" height="1022"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Qué Decirle a Tu Asistente de IA
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"Construye un agente de análisis de logs usando Memory Pointer Pattern. Cuando 
fetch_logs retorne más de 20KB:
1. Almacenar en agent.state con ID de puntero único
2. Retornar al LLM: 'Datos almacenados como logs-{app}. Usa analyze_logs(pointer).'
3. Implementar analyze_logs(pointer) que resuelva desde agent.state

Nunca pases datos grandes directamente al contexto del LLM."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  Cuándo Usar
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Herramientas que retornan salidas grandes (logs, consultas de base de datos, archivos)&lt;/li&gt;
&lt;li&gt;Flujos de trabajo con múltiples pasos de procesamiento sobre los mismos datos grandes&lt;/li&gt;
&lt;li&gt;Aplicaciones sensibles a costos&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Detalles completos:&lt;/strong&gt; &lt;a href="https://dev.to/aws/ai-context-window-overflow-memory-pointer-fix-22bk?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el"&gt;AI Context Window Overflow: Memory Pointer Fix&lt;/a&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  Patrón 7: Async HandleId Pattern
&lt;/h2&gt;
&lt;h3&gt;
  
  
  ¿Qué Es el Async HandleId Pattern?
&lt;/h3&gt;

&lt;p&gt;El async handleId pattern previene que APIs externas lentas bloqueen tu agente. Cuando una API toma más de 30 segundos, las llamadas síncronas congelan todo el agente. Async handleId retorna un ID de trabajo inmediatamente, permitiendo que el agente continúe con otras tareas. Una herramienta check_status separada sondea por resultados cuando estén listos. Esto elimina errores de timeout 424 y mantiene los agentes responsivos.&lt;/p&gt;
&lt;h3&gt;
  
  
  Qué Se Rompe
&lt;/h3&gt;

&lt;p&gt;Las APIs externas que toman más de 30 segundos bloquean el agente indefinidamente. Ninguna otra herramienta puede ejecutarse. Después de ~7 segundos, muchas implementaciones retornan errores de timeout 424, congelando el flujo de trabajo.&lt;/p&gt;
&lt;h3&gt;
  
  
  La Solución
&lt;/h3&gt;

&lt;p&gt;Las herramientas retornan inmediatamente con un ID de trabajo en lugar de esperar. El agente almacena handleId y continúa. Una herramienta &lt;code&gt;check_status(job_id)&lt;/code&gt; separada sondea por resultados asincrónicamente.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Impacto medido:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Antes: API de 18 segundos bloquea agente, timeout 424&lt;/li&gt;
&lt;li&gt;Después: Herramienta retorna en menos de 1 segundo, agente sondea cuando está listo&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Qué Decirle a Tu Asistente de IA
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"Construye un agente con async handleId pattern para APIs lentas:

1. start_analysis(data): Envía trabajo, retorna job_id inmediatamente
2. check_status(job_id): Sondea por resultados

El agente llama start_analysis, almacena job_id, continúa con otras 
tareas, llama check_status cuando está listo. No implementes llamadas bloqueantes."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  Cuándo Usar
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;APIs externas con tiempos de respuesta mayores a 5 segundos&lt;/li&gt;
&lt;li&gt;Procesamiento por lotes (análisis de video, transformaciones grandes)&lt;/li&gt;
&lt;li&gt;Cualquier sistema fuera de tu control&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Detalles completos:&lt;/strong&gt; &lt;a href="https://dev.to/aws/fix-mcp-timeouts-async-handleid-pattern-8ek?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el"&gt;Fix MCP Timeouts: Async HandleId Pattern&lt;/a&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  Patrón 8: DebounceHook + Explicit States
&lt;/h2&gt;
&lt;h3&gt;
  
  
  ¿Qué Previene los Bucles de Razonamiento?
&lt;/h3&gt;

&lt;p&gt;Los bucles de razonamiento ocurren cuando retroalimentación ambigua ("más puede estar disponible") señala que reintentar podría ayudar. Dos correcciones funcionan juntas: estados terminales explícitos (retornar SUCCESS/FAILED para que el LLM sepa cuándo detenerse) y DebounceHook (hook de framework que bloquea llamadas duplicadas). Las pruebas de producción mostraron que los estados explícitos redujeron las llamadas de 14 a 2, mientras que DebounceHook proporciona una red de seguridad para casos extremos.&lt;/p&gt;
&lt;h3&gt;
  
  
  Qué Se Rompe
&lt;/h3&gt;

&lt;p&gt;Los agentes hacen bucle llamando a la misma herramienta repetidamente sin progreso. Retroalimentación ambigua como "Se encontraron 3 resultados. Más pueden estar disponibles" señala que reintentar podría ayudar. El agente hace bucle indefinidamente.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Caso real de producción:&lt;/strong&gt; 847 pasos de razonamiento a $47/minuto, sin respuesta entregada.&lt;/p&gt;
&lt;h3&gt;
  
  
  La Solución (Dos Partes)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Parte A: Estados Terminales Explícitos&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Retorna estados claros de SUCCESS o FAILED. Cambia "Más pueden estar disponibles" a "SUCCESS: Se encontraron todos los 3 vuelos coincidentes."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Parte B: Red de Seguridad DebounceHook&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
El hook de framework rastrea llamadas recientes a herramientas. Cuando el mismo par (tool_name, input) aparece dos veces, bloquea el tercer intento.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Impacto medido (demo de reserva de viajes):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Retroalimentación ambigua: 14 llamadas&lt;/li&gt;
&lt;li&gt;SUCCESS explícito: 2 llamadas (reducción de 7x)&lt;/li&gt;
&lt;li&gt;DebounceHook: 12 llamadas (2 bloqueadas)&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Qué Decirle a Tu Asistente de IA
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"Construye un agente de viajes con protección anti-bucle:

1. Todas las herramientas retornan estados explícitos:
   - SUCCESS: [completación clara]
   - FAILED: [error claro]
   Nunca retornes 'más puede estar disponible'

2. Implementa DebounceHook:
   - Rastrea las últimas 3 llamadas de herramientas como (tool_name, input)
   - Si el mismo par aparece dos veces, bloquea el tercer intento
   - Retorna 'BLOCKED: Duplicado detectado'

Esto previene bucles sin límites manuales de reintentos."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  Cuándo Usar
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Agentes propensos a bucles de reintento (búsqueda, agregadores de API)&lt;/li&gt;
&lt;li&gt;Aplicaciones sensibles a costos donde reintentos ilimitados son costosos&lt;/li&gt;
&lt;li&gt;Sistemas de producción donde bucles infinitos crean riesgo de disponibilidad&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Detalles completos:&lt;/strong&gt; &lt;a href="https://dev.to/aws/how-to-prevent-ai-agent-reasoning-loops-from-wasting-tokens-2652?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el"&gt;How to Prevent AI Agent Reasoning Loops from Wasting Tokens&lt;/a&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  Errores Comunes
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Error 1: Asumir Que los Valores Por Defecto Son Mejores Prácticas
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Problema:&lt;/strong&gt; "Construye un agente de producción" asume que el asistente sabe qué significa producción.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solución:&lt;/strong&gt; Especifica patrones: "Usa GraphRAG, guardrails, patrones async, etc..."&lt;/p&gt;
&lt;h3&gt;
  
  
  Error 2: Confiar Solo en Prompts para Validación
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Problema:&lt;/strong&gt; "Asegúrate de que max_guests &amp;lt; 10" en el prompt del sistema es ignorado bajo presión.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solución:&lt;/strong&gt; "Implementa hook BeforeToolCallEvent que valide y cancele llamadas inválidas."&lt;/p&gt;
&lt;h3&gt;
  
  
  Error 3: No Reconocer Cuándo Aplican los Patrones
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Problema:&lt;/strong&gt; El agente funciona en demo, se rompe en casos extremos.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solución:&lt;/strong&gt; Conoce los 8 patrones. Cuando veas alucinaciones, timeouts o bucles, reconocerás qué patrón lo resuelve.&lt;/p&gt;


&lt;h2&gt;
  
  
  Lo Que Esto Significa para el Desarrollo Asistido por IA
&lt;/h2&gt;

&lt;p&gt;Los asistentes de IA seguirán mejorando en generar código funcional. Pero código funcional y arquitectura lista para producción siguen siendo objetivos diferentes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;La brecha no es la capacidad del asistente. Es la especificidad del prompt.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Cuando escribes "construye un agente de reservas," el asistente optimiza para código que compila y responde a consultas.&lt;/p&gt;

&lt;p&gt;Cuando escribes "construye un agente de reservas usando GraphRAG para consultas estructuradas, guardrails para validación y patrones async para APIs de reservas," el asistente optimiza para código que compila, responde a consultas, previene alucinaciones, aplica reglas de negocio y maneja APIs lentas.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Estos 8 patrones son el vocabulario para comunicar intención de producción.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;No implementas los 8. Aprendes qué resuelven. Cuando ves alucinaciones, reconoces que GraphRAG aplica. Cuando ves timeouts, reconoces que async handleId aplica. Cuando ves bucles, reconoces que estados explícitos + DebounceHook aplican.&lt;/p&gt;

&lt;p&gt;Este conocimiento cambia cómo haces prompts a &lt;a href="https://kiro.dev?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el" rel="noopener noreferrer"&gt;Kiro&lt;/a&gt;, Claude Code, Cursor y ChatGPT. En lugar de depurar fallas de caja negra en producción, especificas los patrones que las previenen durante la generación.&lt;/p&gt;


&lt;h3&gt;
  
  
  Aprende Más (Guías de Implementación Completas)
&lt;/h3&gt;

&lt;p&gt;Cada patrón tiene una guía completa con código funcional:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;GraphRAG:&lt;/strong&gt; &lt;a href="https://dev.to/aws/rag-vs-graphrag-when-agents-hallucinate-answers-2mcb?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el"&gt;RAG vs GraphRAG: When Agents Hallucinate Answers&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Semantic Tool Selection:&lt;/strong&gt; &lt;a href="https://dev.to/aws/reduce-agent-errors-and-token-costs-with-semantic-tool-selection-7mf?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el"&gt;Reduce Agent Errors and Token Costs&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Neurosymbolic Guardrails:&lt;/strong&gt; &lt;a href="https://dev.to/aws/ai-agent-guardrails-rules-that-llms-cannot-bypass-596d?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el"&gt;AI Agent Guardrails: Rules That LLMs Cannot Bypass&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Runtime Guardrails (Steering):&lt;/strong&gt; &lt;a href="https://dev.to/aws/runtime-guardrails-for-ai-agents-steer-dont-block-278n?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el"&gt;Runtime Guardrails for AI Agents: Steer, Don't Block&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-Agent Validation:&lt;/strong&gt; &lt;a href="https://dev.to/aws/how-to-stop-ai-agents-from-hallucinating-silently-with-multi-agent-validation-3f7e?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el"&gt;Stop AI Agents from Hallucinating Silently&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Memory Pointers:&lt;/strong&gt; &lt;a href="https://dev.to/aws/ai-context-window-overflow-memory-pointer-fix-22bk?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el"&gt;AI Context Window Overflow: Memory Pointer Fix&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Async HandleId:&lt;/strong&gt; &lt;a href="https://dev.to/aws/fix-mcp-timeouts-async-handleid-pattern-8ek?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el"&gt;Fix MCP Timeouts: Async HandleId Pattern&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;DebounceHook:&lt;/strong&gt; &lt;a href="https://dev.to/aws/how-to-prevent-ai-agent-reasoning-loops-from-wasting-tokens-2652?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el"&gt;Prevent AI Agent Reasoning Loops&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Serie completa:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://dev.to/aws/stop-ai-agent-hallucinations-4-essential-techniques-2i94?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el"&gt;Stop AI Agent Hallucinations: 5 Essential Techniques&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/aws/why-ai-agents-fail-3-failure-modes-that-cost-you-tokens-and-time-1flb?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el"&gt;Why AI Agents Fail: 3 Failure Modes That Cost You Tokens and Time&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;
  
  
  Cierre
&lt;/h2&gt;

&lt;p&gt;Cada patrón en este post existe porque algo se rompió en producción. Agentes que alucinaron estadísticas en demos de clientes. Bucles que quemaron tokens a $47/minuto. Desbordamientos de contexto que truncaron datos críticos. Timeouts que congelaron flujos de trabajo.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ahora sabes qué se rompe y cómo prevenirlo al hacer prompts correctamente.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Cuando le pides a &lt;a href="https://kiro.dev?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el" rel="noopener noreferrer"&gt;Kiro&lt;/a&gt;, Claude Code o ChatGPT que construya un agente, puedes especificar qué patrones aplican. Esa es la diferencia entre prototipos que se rompen y agentes que escalan.&lt;/p&gt;

&lt;p&gt;Úsalo.&lt;/p&gt;



&lt;p&gt;Gracias!&lt;/p&gt;

&lt;p&gt;🇻🇪🇨🇱 &lt;a href="https://dev.to/elizabethfuentes12"&gt;Dev.to&lt;/a&gt; &lt;a href="https://www.linkedin.com/in/lizfue/" rel="noopener noreferrer"&gt;Linkedin&lt;/a&gt; &lt;a href="https://github.com/elizabethfuentes12/" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; &lt;a href="https://twitter.com/elizabethfue12" rel="noopener noreferrer"&gt;Twitter&lt;/a&gt; &lt;a href="https://www.instagram.com/elifue.tech" rel="noopener noreferrer"&gt;Instagram&lt;/a&gt; &lt;a href="https://www.youtube.com/channel/UCr0Gnc-t30m4xyrvsQpNp2Q" rel="noopener noreferrer"&gt;Youtube&lt;/a&gt;&lt;/p&gt;


&lt;div class="ltag__user ltag__user__id__717518"&gt;
    &lt;a href="/elizabethfuentes12" class="ltag__user__link profile-image-link"&gt;
      &lt;div class="ltag__user__pic"&gt;
        &lt;img src="https://media2.dev.to/dynamic/image/width=150,height=150,fit=cover,gravity=auto,format=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F717518%2Fb550b165-b8b9-405d-acfb-e5dc846765b0.png" alt="elizabethfuentes12 image"&gt;
      &lt;/div&gt;
    &lt;/a&gt;
  &lt;div class="ltag__user__content"&gt;
    &lt;h2&gt;
&lt;a class="ltag__user__link" href="/elizabethfuentes12"&gt;Elizabeth Fuentes L&lt;/a&gt;Follow
&lt;/h2&gt;
    &lt;div class="ltag__user__summary"&gt;
      &lt;a class="ltag__user__link" href="/elizabethfuentes12"&gt;I help developers build production-ready AI applications through hands-on tutorials and open-source projects.&lt;/a&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;



</description>
      <category>ai</category>
      <category>tutorial</category>
      <category>programming</category>
      <category>python</category>
    </item>
    <item>
      <title>Prompt AI Coding Assistants to Build Production-Ready Agents: 8 Essential Patterns</title>
      <dc:creator>Elizabeth Fuentes L</dc:creator>
      <pubDate>Mon, 11 May 2026 07:00:00 +0000</pubDate>
      <link>https://forem.com/aws/prompt-ai-coding-assistants-to-build-production-ready-agents-8-essential-patterns-fm5</link>
      <guid>https://forem.com/aws/prompt-ai-coding-assistants-to-build-production-ready-agents-8-essential-patterns-fm5</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;When you ask an AI assistant like &lt;a href="https://kiro.dev?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el" rel="noopener noreferrer"&gt;Kiro&lt;/a&gt; (AWS's AI coding assistant), Claude Code, or ChatGPT to "build me an agent," you get working code. But you don't see the architecture decisions happening behind the scenes. The agent responds to queries, but it might waste tokens in reasoning loops, hallucinate answers from incomplete data, or freeze on slow APIs. These failures are silent until production.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;When you prompt AI coding assistants to build agents, they make architecture decisions silently—choosing retrieval strategies, validation approaches, and error handling patterns. These 8 patterns give you the vocabulary to specify production-grade decisions in your prompts, preventing hallucinations and token waste before code is generated.&lt;/p&gt;

&lt;p&gt;This post closes two series I wrote documenting the most expensive agent failures in production: &lt;a href="https://dev.to/aws/stop-ai-agent-hallucinations-4-essential-techniques-2i94?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el"&gt;Stop AI Agent Hallucinations (5 techniques)&lt;/a&gt; and &lt;a href="https://dev.to/aws/why-ai-agents-fail-3-failure-modes-that-cost-you-tokens-and-time-1flb?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el"&gt;Why AI Agents Fail (3 failure modes)&lt;/a&gt;. &lt;strong&gt;If you know these 8 patterns, you can guide AI assistants to avoid them from the start.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This isn't a step-by-step implementation guide. It's a reference for knowing what exists so you can recognize when to use each pattern based on your use case.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Working code for all 8 techniques:&lt;/strong&gt; Linked in each section&lt;/p&gt;




&lt;h2&gt;
  
  
  Why This Matters
&lt;/h2&gt;

&lt;p&gt;AI coding assistants generate agent code in seconds. &lt;a href="https://kiro.dev?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el" rel="noopener noreferrer"&gt;Kiro&lt;/a&gt;, Claude Code, Cursor, and ChatGPT can scaffold tools, configure LLM calls, and wire up retrieval systems faster than manual coding.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;But speed creates a problem: you get working code without seeing the tradeoffs.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When you prompt "build a booking agent with RAG," the assistant makes decisions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Which retrieval strategy? (vector similarity, graph queries, hybrid)&lt;/li&gt;
&lt;li&gt;How to handle large outputs? (truncate, summarize, external storage)&lt;/li&gt;
&lt;li&gt;What validation runs before tool execution? (none, prompts, framework hooks)&lt;/li&gt;
&lt;li&gt;How to handle slow APIs? (block, timeout, async patterns)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Your prompt doesn't specify these. The assistant picks defaults. Those defaults create the failure modes this post documents.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The 8 Failure Patterns
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Hallucination Failures (5 patterns):&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;GraphRAG&lt;/strong&gt; - Vector RAG fabricates statistics from incomplete chunks&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Semantic Tool Selection&lt;/strong&gt; - Too many tools, agent picks wrong ones&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Neurosymbolic Guardrails&lt;/strong&gt; - Agent ignores business rules in prompts&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Runtime Guardrails (Steering)&lt;/strong&gt; - Agent violates rules, needs correction not blocking&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-Agent Validation&lt;/strong&gt; - Single agent claims success when operations fail&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Silent Token Waste (3 patterns):&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Memory Pointer Pattern&lt;/strong&gt; - Large data overflows context, causes truncation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Async HandleId Pattern&lt;/strong&gt; - Slow APIs block agent indefinitely&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;DebounceHook + Explicit States&lt;/strong&gt; - Agent loops same tool call without progress&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;You don't implement all 8. You learn what they solve, then specify the ones your use case needs when prompting.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Are These 8 Patterns?
&lt;/h2&gt;

&lt;p&gt;These patterns solve the most expensive production failures: hallucinations from incomplete data (GraphRAG, Semantic Tool Selection, Guardrails, Steering, Multi-Agent), and silent token waste (Memory Pointers, Async HandleId, DebounceHook). You learn what each solves, then specify the ones your use case needs when prompting AI assistants. This prevents debugging black-box code in production.&lt;/p&gt;

&lt;h2&gt;
  
  
  Measured Impact from Production
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Pattern&lt;/th&gt;
&lt;th&gt;Result&lt;/th&gt;
&lt;th&gt;Source&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;GraphRAG&lt;/td&gt;
&lt;td&gt;Exact counts vs fabricated approximations&lt;/td&gt;
&lt;td&gt;&lt;a href="https://dev.to/aws/rag-vs-graphrag-when-agents-hallucinate-answers-2mcb?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el"&gt;RAG vs GraphRAG&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Semantic Tool Selection&lt;/td&gt;
&lt;td&gt;86.4% fewer errors, 89% lower token costs&lt;/td&gt;
&lt;td&gt;&lt;a href="https://dev.to/aws/reduce-agent-errors-and-token-costs-with-semantic-tool-selection-7mf?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el"&gt;Tool Selection&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Memory Pointers&lt;/td&gt;
&lt;td&gt;20M tokens reduced to 1,234 tokens&lt;/td&gt;
&lt;td&gt;&lt;a href="https://dev.to/aws/ai-context-window-overflow-memory-pointer-fix-22bk?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el"&gt;IBM Materials Science study&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Async HandleId&lt;/td&gt;
&lt;td&gt;18-second block eliminated, no 424 timeouts&lt;/td&gt;
&lt;td&gt;&lt;a href="https://dev.to/aws/fix-mcp-timeouts-async-handleid-pattern-8ek?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el"&gt;MCP Timeouts&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Explicit States&lt;/td&gt;
&lt;td&gt;14 calls reduced to 2 (7x improvement)&lt;/td&gt;
&lt;td&gt;&lt;a href="https://dev.to/aws/how-to-prevent-ai-agent-reasoning-loops-from-wasting-tokens-2652?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el"&gt;Reasoning Loops&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Pattern 1: GraphRAG for Precise Queries
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What Is GraphRAG?
&lt;/h3&gt;

&lt;p&gt;GraphRAG replaces vector similarity with graph database queries for structured data. When your agent needs exact counts, aggregations, or relationship traversal, GraphRAG translates natural language to Cypher queries that return precise results from structured data instead of hallucinated statistics from incomplete text chunks. Use it for structured queries, keep vector RAG for semantic search.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Breaks
&lt;/h3&gt;

&lt;p&gt;Vector RAG fabricates statistics. Ask "How many hotels in Miami have pools and breakfast?" and vector similarity retrieves 3 text chunks mentioning Miami, pools and breakfast. The LLM sees incomplete data, calculates from samples, and returns "approximately 120 hotels" (fabricated from 3 chunks out of 200 hotels).&lt;/p&gt;

&lt;p&gt;Out-of-domain queries return hallucinated answers instead of admitting no data exists.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Fix
&lt;/h3&gt;

&lt;p&gt;Replace vector retrieval with graph queries for structured data. Store hotels, amenities, and relationships in Neo4j. The LLM translates "hotels with pools and breakfast" into Cypher:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cypher"&gt;&lt;code&gt;&lt;span class="k"&gt;MATCH&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="py"&gt;h:&lt;/span&gt;&lt;span class="n"&gt;Hotel&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="ss"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;:HAS_AMENITY&lt;/span&gt;&lt;span class="ss"&gt;]&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="py"&gt;a:&lt;/span&gt;&lt;span class="n"&gt;Amenity&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;a.name&lt;/span&gt; &lt;span class="ow"&gt;IN&lt;/span&gt; &lt;span class="ss"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'pool'&lt;/span&gt;&lt;span class="ss"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'breakfast'&lt;/span&gt;&lt;span class="ss"&gt;]&lt;/span&gt;
&lt;span class="k"&gt;RETURN&lt;/span&gt; &lt;span class="nf"&gt;count&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="n"&gt;h&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Result: 133 hotels (exact count from database).&lt;/p&gt;

&lt;p&gt;Out-of-domain query: "No hotels found in Antarctica" instead of fabricating results.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcstm4df3sgeyh6d9zbkb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcstm4df3sgeyh6d9zbkb.png" alt="Vector RAG fabricates statistics from text chunks. GraphRAG returns exact counts from structured database queries" width="800" height="990"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  What to Tell Your AI Assistant
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"Build a travel agent using GraphRAG. For structured 
queries (hotels, amenities, availability), translate to Cypher 
and execute against the graph. Only use vector RAG for unstructured descriptions. Return exact counts from graph traversal."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  When to Use
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Structured data with relationships (products, inventory, locations)&lt;/li&gt;
&lt;li&gt;Queries requiring counts, aggregations, or multi-hop traversal&lt;/li&gt;
&lt;li&gt;Domains where fabricating statistics creates legal/financial risk&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Full details:&lt;/strong&gt; &lt;a href="https://dev.to/aws/rag-vs-graphrag-when-agents-hallucinate-answers-2mcb?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el"&gt;RAG vs GraphRAG: When Agents Hallucinate Answers&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Learn more:&lt;/strong&gt; &lt;a href="https://neo4j.com/docs/cypher-manual/current/" rel="noopener noreferrer"&gt;Neo4j Cypher Documentation&lt;/a&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  Pattern 2: Semantic Tool Selection
&lt;/h2&gt;
&lt;h3&gt;
  
  
  What Is Semantic Tool Selection?
&lt;/h3&gt;

&lt;p&gt;Semantic tool selection uses vector embeddings to filter tools before the LLM sees them. When your agent has 10+ tools, sending all descriptions on every call increases error rates (agent picks wrong tools) and token costs (paying for unused descriptions). Semantic filtering embeds tool descriptions offline, then at runtime matches the query to top-5 relevant tools, reducing errors by 86.4% and costs by 89%.&lt;/p&gt;
&lt;h3&gt;
  
  
  What Breaks
&lt;/h3&gt;

&lt;p&gt;With 50 tools, two failures occur: (1) agent picks wrong tools because descriptions overlap, and (2) token costs explode from sending all 50 tool descriptions on every LLM call.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Measured impact:&lt;/strong&gt; Error rates increase with tool count, token costs scale linearly.&lt;/p&gt;
&lt;h3&gt;
  
  
  The Fix
&lt;/h3&gt;

&lt;p&gt;Use vector embeddings to filter tools before the LLM sees them. Embed tool descriptions offline. At runtime, embed the user query, compute similarity, pass only top-5 relevant tools to the agent.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Results from production:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Errors reduced: 86.4%&lt;/li&gt;
&lt;li&gt;Token costs reduced: 89%&lt;/li&gt;
&lt;li&gt;Latency: &amp;lt;10ms for tool filtering&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  What to Tell Your AI Assistant
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"Build a multi-tool agent with semantic tool selection At 
runtime, embed the query, retrieve top-5 similar tools, pass only 
those to the agent. Keep conversation memory, dynamically swap tools."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  When to Use
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Agents with 10+ tools&lt;/li&gt;
&lt;li&gt;Tools with overlapping descriptions&lt;/li&gt;
&lt;li&gt;Cost-sensitive applications&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Full details:&lt;/strong&gt; &lt;a href="https://dev.to/aws/reduce-agent-errors-and-token-costs-with-semantic-tool-selection-7mf?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el"&gt;Reduce Agent Errors and Token Costs with Semantic Tool Selection&lt;/a&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  Pattern 3: Neurosymbolic Guardrails (Block)
&lt;/h2&gt;
&lt;h3&gt;
  
  
  What Are Neurosymbolic Guardrails?
&lt;/h3&gt;

&lt;p&gt;Neurosymbolic guardrails enforce business rules at the framework level, below the LLM's control. When prompts alone cannot enforce constraints (max guests, valid dates, budget limits), guardrails use pre-execution hooks to validate parameters and cancel invalid operations. Rules live in code, not prompts, so the LLM cannot bypass them. Use blocking guardrails for hard constraints that cannot be violated.&lt;/p&gt;
&lt;h3&gt;
  
  
  What Breaks
&lt;/h3&gt;

&lt;p&gt;Prompts cannot enforce business rules. Even with clear docstrings ("max_guests must be ≤10"), the LLM passes &lt;code&gt;max_guests=15&lt;/code&gt; under pressure because prompts are suggestions, not constraints. The agent violates rules silently.&lt;/p&gt;
&lt;h3&gt;
  
  
  The Fix
&lt;/h3&gt;

&lt;p&gt;Use framework hooks to validate parameters before tool execution. If validation fails, cancel the tool call and return corrective guidance. Rules live in code at the framework level, below the LLM's control.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Measured impact:&lt;/strong&gt; Zero violations in 100-query test (vs. 12 violations with prompts alone).&lt;/p&gt;
&lt;h3&gt;
  
  
  What to Tell Your AI Assistant
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"Build a booking agent with guardrails using Strands Agents hooks. 
Create a BeforeToolCallEvent hook that validates:
- max_guests ≤ 10
- check_in_date &amp;gt; today
- budget &amp;gt; 0

If validation fails, cancel the tool call with event.cancel_tool() 
and return error message. Do not rely on prompts for validation."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1jlee9gupk90w1xg00hx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1jlee9gupk90w1xg00hx.png" alt="Prompts can be ignored by LLM (top layer). Framework hooks enforce rules at code level (bottom layer, unbypassable)" width="800" height="651"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  When to Use
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Business rules that cannot be violated (compliance, legal, financial)&lt;/li&gt;
&lt;li&gt;Validation requiring computation (date math, inventory checks)&lt;/li&gt;
&lt;li&gt;Rules that change frequently&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Full details:&lt;/strong&gt; &lt;a href="https://dev.to/aws/ai-agent-guardrails-rules-that-llms-cannot-bypass-596d?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el"&gt;AI Agent Guardrails: Rules That LLMs Cannot Bypass&lt;/a&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  Pattern 4: Runtime Guardrails (Steer, Don't Block)
&lt;/h2&gt;
&lt;h3&gt;
  
  
  What Is Steering vs Blocking?
&lt;/h3&gt;

&lt;p&gt;Steering guardrails return corrective guidance instead of blocking operations. When the agent violates a soft rule (format issues, parameter adjustments, data redaction), steering returns instructions via Guide() so the agent self-corrects and retries. This differs from blocking guardrails (Pattern 3) which stop workflows entirely. Use steering for rules where the agent can fix itself, blocking for hard constraints.&lt;/p&gt;
&lt;h3&gt;
  
  
  What Breaks
&lt;/h3&gt;

&lt;p&gt;Hard guardrails (Pattern 3) block operations and stop workflows. For soft rules where the agent can self-correct (format issues, parameter adjustments, redacting sensitive data), blocking creates friction. The agent could fix the problem itself if given guidance.&lt;/p&gt;
&lt;h3&gt;
  
  
  The Fix
&lt;/h3&gt;

&lt;p&gt;Use &lt;a href="https://github.com/agentcontrol/agent-control" rel="noopener noreferrer"&gt;Agent Control&lt;/a&gt; to return corrective guidance via &lt;code&gt;Guide()&lt;/code&gt; instead of blocking. When the agent violates a soft rule, the control plane returns instructions: "Adjust parameter X to Y and retry." The agent self-corrects and completes the task without human intervention.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Difference from Pattern 3:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Block (Pattern 3):&lt;/strong&gt; Hard constraints, workflow stops&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Steer (Pattern 4):&lt;/strong&gt; Soft rules, agent self-corrects&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  What to Tell Your AI Assistant
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"Build a booking agent with Agent Control for soft rules. Connect 
to Agent Control server. For soft rules (parameter formatting, 
date adjustments, data redaction), return Guide() with correction 
instructions instead of blocking. Agent should retry with fix applied.

Use hard blocks (Pattern 3) only for compliance rules that cannot 
be violated under any circumstance."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  When to Use
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Rules where agent can self-correct (format, adjust parameters)&lt;/li&gt;
&lt;li&gt;Workflows where blocking creates poor UX&lt;/li&gt;
&lt;li&gt;Rules managed centrally via API/dashboard (update without redeploying)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Full details:&lt;/strong&gt; &lt;a href="https://dev.to/aws/runtime-guardrails-for-ai-agents-steer-dont-block-278n?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el"&gt;Runtime Guardrails for AI Agents: Steer, Don't Block&lt;/a&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  Pattern 5: Multi-Agent Validation
&lt;/h2&gt;
&lt;h3&gt;
  
  
  What Is Multi-Agent Validation?
&lt;/h3&gt;

&lt;p&gt;Multi-agent validation deploys specialized agents with different roles (Executor, Validator, Critic) that cross-check each other's work. Single agents optimize for appearing successful, not verifying outcomes. Multiple agents with different optimization functions catch errors the others miss. Executor performs tasks, Validator cross-checks against ground truth, Critic provides final review before returning to the user.&lt;/p&gt;
&lt;h3&gt;
  
  
  What Breaks
&lt;/h3&gt;

&lt;p&gt;Single agents cannot self-validate. When an agent books a hotel, it claims "Success: Booked Grand Plaza Hotel" even if the API returned an error or the hotel doesn't exist in the database. The agent optimizes for appearing successful, not verifying outcomes.&lt;/p&gt;
&lt;h3&gt;
  
  
  The Fix
&lt;/h3&gt;

&lt;p&gt;Deploy multiple agents with different roles: Executor performs tasks, Validator cross-checks against ground truth, Critic provides final review. Agents share context and hand off control autonomously when their role completes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Measured impact:&lt;/strong&gt; Multi-agent catches errors single agent misses (e.g., booking non-existent hotels).&lt;/p&gt;
&lt;h3&gt;
  
  
  What to Tell Your AI Assistant
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"Build a multi-agent system using Strands Swarm with 3 agents:
1. Executor: Books hotels, searches flights
2. Validator: Cross-checks operations against database
3. Critic: Final review before returning to user

Agents share context via swarm.context. Use autonomous handoffs. 
Agents decide when to hand off based on task completion."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  When to Use
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;High-stakes operations (financial, medical, legal)&lt;/li&gt;
&lt;li&gt;Tasks where "appears successful" differs from "actually successful"&lt;/li&gt;
&lt;li&gt;Complex workflows with multiple verification points&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Full details:&lt;/strong&gt; &lt;a href="https://dev.to/aws/how-to-stop-ai-agents-from-hallucinating-silently-with-multi-agent-validation-3f7e?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el"&gt;How to Stop AI Agents from Hallucinating Silently with Multi-Agent Validation&lt;/a&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  Pattern 6: Memory Pointer Pattern
&lt;/h2&gt;
&lt;h3&gt;
  
  
  What Is the Memory Pointer Pattern?
&lt;/h3&gt;

&lt;p&gt;The Memory Pointer Pattern stores large data outside the LLM context and passes short references instead. When tools return 200KB+ logs or 1000-row database results, passing them directly causes silent truncation. Memory pointers store data in agent.state, return a pointer to the LLM, and provide separate tools that resolve pointers to access full data. IBM reduced 20M tokens to 1,234 tokens using this pattern.&lt;/p&gt;
&lt;h3&gt;
  
  
  What Breaks
&lt;/h3&gt;

&lt;p&gt;Context window overflow occurs when tools return more data than the LLM can process (200KB+ logs, 1000-row database results). The agent doesn't crash. It silently truncates data, loses context, produces incomplete answers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Real production case (IBM Materials Science):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Before: 20 million tokens, workflow failed&lt;/li&gt;
&lt;li&gt;After: 1,234 tokens, workflow succeeded&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  The Fix
&lt;/h3&gt;

&lt;p&gt;Store large data in &lt;code&gt;agent.state&lt;/code&gt;, pass short references to the LLM. Tools return pointers like &lt;code&gt;"logs-app-server"&lt;/code&gt;. Subsequent tools resolve pointers to access full data. LLM only sees: "Data stored as logs-app-server. Use analyze_errors(pointer)."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Data in context reduced:&lt;/strong&gt; 214KB → 52 bytes&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsbzqsm6ml5qo4e5lkkjs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsbzqsm6ml5qo4e5lkkjs.png" alt="Before: 20M tokens overflow context. After: Memory pointer reduces to 1,234 tokens, full data stored externally" width="800" height="1022"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  What to Tell Your AI Assistant
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"Build a log analysis agent using Memory Pointer Pattern. When 
fetch_logs returns &amp;gt;20KB:
1. Store in agent.state with unique pointer ID
2. Return to LLM: 'Data stored as logs-{app}. Use analyze_logs(pointer).'
3. Implement analyze_logs(pointer) that resolves from agent.state

Never pass large data directly to LLM context."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  When to Use
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Tools returning large outputs (logs, database queries, files)&lt;/li&gt;
&lt;li&gt;Workflows with multiple processing steps on same large data&lt;/li&gt;
&lt;li&gt;Cost-sensitive applications&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Full details:&lt;/strong&gt; &lt;a href="https://dev.to/aws/ai-context-window-overflow-memory-pointer-fix-22bk?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el"&gt;AI Context Window Overflow: Memory Pointer Fix&lt;/a&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  Pattern 7: Async HandleId Pattern
&lt;/h2&gt;
&lt;h3&gt;
  
  
  What Is the Async HandleId Pattern?
&lt;/h3&gt;

&lt;p&gt;The async handleId pattern prevents slow external APIs from blocking your agent. When an API takes 30+ seconds, synchronous calls freeze the entire agent. Async handleId returns a job ID immediately, letting the agent continue with other tasks. A separate check_status tool polls for results when ready. This eliminates 424 timeout errors and keeps agents responsive.&lt;/p&gt;
&lt;h3&gt;
  
  
  What Breaks
&lt;/h3&gt;

&lt;p&gt;External APIs that take 30+ seconds block the agent indefinitely. No other tools can run. After ~7 seconds, many implementations return 424 timeout errors, freezing the workflow.&lt;/p&gt;
&lt;h3&gt;
  
  
  The Fix
&lt;/h3&gt;

&lt;p&gt;Tools return immediately with a job ID instead of waiting. Agent stores handleId and continues. Separate &lt;code&gt;check_status(job_id)&lt;/code&gt; tool polls for results asynchronously.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Measured impact:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Before: 18-second API blocks agent, 424 timeout&lt;/li&gt;
&lt;li&gt;After: Tool returns &amp;lt;1 second, agent polls when ready&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  What to Tell Your AI Assistant
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"Build an agent with async handleId pattern for slow APIs:

1. start_analysis(data): Submit job, return job_id immediately
2. check_status(job_id): Poll for results

Agent calls start_analysis, stores job_id, continues with other 
tasks, calls check_status when ready. Do not implement blocking calls."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  When to Use
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;External APIs with &amp;gt;5 second response times&lt;/li&gt;
&lt;li&gt;Batch processing (video analysis, large transforms)&lt;/li&gt;
&lt;li&gt;Any system outside your control&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Full details:&lt;/strong&gt; &lt;a href="https://dev.to/aws/fix-mcp-timeouts-async-handleid-pattern-8ek?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el"&gt;Fix MCP Timeouts: Async HandleId Pattern&lt;/a&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  Pattern 8: DebounceHook + Explicit States
&lt;/h2&gt;
&lt;h3&gt;
  
  
  What Prevents Reasoning Loops?
&lt;/h3&gt;

&lt;p&gt;Reasoning loops occur when ambiguous tool feedback ("more may be available") signals that retrying might help. Two fixes work together: explicit terminal states (return SUCCESS/FAILED so the LLM knows when to stop) and DebounceHook (framework hook that blocks duplicate calls). Production tests showed explicit states reduced calls from 14 to 2, while DebounceHook provides a safety net for edge cases.&lt;/p&gt;
&lt;h3&gt;
  
  
  What Breaks
&lt;/h3&gt;

&lt;p&gt;Agents loop calling the same tool repeatedly without progress. Ambiguous feedback like "Found 3 results. More may be available" signals that retrying might help. The agent loops indefinitely.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Real production case:&lt;/strong&gt; 847 reasoning steps at $47/minute, no answer delivered.&lt;/p&gt;
&lt;h3&gt;
  
  
  The Fix (Two Parts)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Part A: Explicit Terminal States&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Return clear SUCCESS or FAILED states. Change "More may be available" to "SUCCESS: Found all 3 matching flights."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Part B: DebounceHook Safety Net&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Framework hook tracks recent tool calls. When same (tool_name, input) appears twice, block third attempt.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Measured impact (travel booking demo):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Ambiguous feedback: 14 calls&lt;/li&gt;
&lt;li&gt;Explicit SUCCESS: 2 calls (7x reduction)&lt;/li&gt;
&lt;li&gt;DebounceHook: 12 calls (2 blocked)&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  What to Tell Your AI Assistant
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"Build a travel agent with anti-loop protection:

1. All tools return explicit states:
   - SUCCESS: [clear completion]
   - FAILED: [clear error]
   Never return 'more may be available'

2. Implement DebounceHook:
   - Track last 3 tool calls as (tool_name, input)
   - If same pair appears twice, block third attempt
   - Return 'BLOCKED: Duplicate detected'

This prevents loops without manual retry limits."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  When to Use
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Agents prone to retry loops (search, API aggregators)&lt;/li&gt;
&lt;li&gt;Cost-sensitive applications where unbounded retries are expensive&lt;/li&gt;
&lt;li&gt;Production systems where infinite loops create availability risk&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Full details:&lt;/strong&gt; &lt;a href="https://dev.to/aws/how-to-prevent-ai-agent-reasoning-loops-from-wasting-tokens-2652?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el"&gt;How to Prevent AI Agent Reasoning Loops from Wasting Tokens&lt;/a&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  Example: Generic vs Informed Prompting
&lt;/h2&gt;
&lt;h3&gt;
  
  
  ❌ Generic Prompt
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"Build a customer support agent that searches our knowledge base 
and books appointments"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;&lt;strong&gt;What you get:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Vector RAG (may hallucinate on structured queries)&lt;/li&gt;
&lt;li&gt;Synchronous booking API (may timeout)&lt;/li&gt;
&lt;li&gt;No validation (can book invalid times)&lt;/li&gt;
&lt;li&gt;Single agent (claims success even when booking fails)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt; Works in demo, fails in production.&lt;/p&gt;


&lt;h3&gt;
  
  
  ✅ Informed Prompt
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"Build a customer support agent:

Knowledge Base:
- Use Neo4j GraphRAG for structured queries (pricing, features)
- Use vector RAG only for semantic search (descriptions)

Booking:
- Validate appointment_time &amp;gt; now() before booking
- Use async handleId for booking API (10+ seconds)
- Return explicit states: SUCCESS / FAILED

Validation:
- Multi-agent: Executor (search/book), Validator (cross-check), 
  Critic (final review)
- Use Strands Swarm for autonomous handoffs

Loop Prevention:
- DebounceHook blocks duplicate calls
- All tools return terminal states"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;&lt;strong&gt;What you get:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;GraphRAG prevents hallucinations&lt;/li&gt;
&lt;li&gt;Async prevents timeouts&lt;/li&gt;
&lt;li&gt;Guardrails prevent invalid bookings&lt;/li&gt;
&lt;li&gt;Multi-agent catches false successes&lt;/li&gt;
&lt;li&gt;DebounceHook prevents loops&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt; Production-ready agent.&lt;/p&gt;


&lt;h2&gt;
  
  
  Common Mistakes
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Mistake 1: Assuming Defaults Are Best Practices
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Problem:&lt;/strong&gt; "Build a production agent" assumes the assistant knows what production means.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; Specify patterns: "Use GraphRAG, guardrails, async patterns."&lt;/p&gt;
&lt;h3&gt;
  
  
  Mistake 2: Relying Only on Prompts for Validation
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Problem:&lt;/strong&gt; "Make sure max_guests &amp;lt; 10" in system prompt gets ignored under pressure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; "Implement BeforeToolCallEvent hook that validates and cancels invalid calls."&lt;/p&gt;
&lt;h3&gt;
  
  
  Mistake 3: Not Recognizing When Patterns Apply
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Problem:&lt;/strong&gt; Agent works in demo, breaks on edge cases.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; Know the 8 patterns. When you see hallucinations, timeouts, or loops, you'll recognize which pattern solves it.&lt;/p&gt;


&lt;h2&gt;
  
  
  My Thoughts
&lt;/h2&gt;

&lt;p&gt;AI coding assistants will keep improving at generating working code. But working code and production-ready architecture remain different targets.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The gap isn't the assistant's capability. It's the prompt's specificity.&lt;/strong&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  Next Steps
&lt;/h2&gt;
&lt;h3&gt;
  
  
  If You're Building a New Agent
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Identify which patterns apply (use symptom checklist)&lt;/li&gt;
&lt;li&gt;Specify patterns in your prompt&lt;/li&gt;
&lt;li&gt;Verify generated code implements them&lt;/li&gt;
&lt;li&gt;Test failure modes (timeouts, invalid inputs, non-existent data)&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;
  
  
  If You're Debugging an Existing Agent
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Identify the symptom (hallucinations, loops, timeouts, rule violations)&lt;/li&gt;
&lt;li&gt;Map symptom to pattern (see Step 1: Recognize the Symptom)&lt;/li&gt;
&lt;li&gt;Prompt your assistant to add the pattern: "Add DebounceHook to prevent loops"&lt;/li&gt;
&lt;li&gt;Verify fix with targeted tests&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;
  
  
  Learn More (Full Implementation Guides)
&lt;/h3&gt;

&lt;p&gt;Each pattern has a complete guide with working code:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;GraphRAG:&lt;/strong&gt; &lt;a href="https://dev.to/aws/rag-vs-graphrag-when-agents-hallucinate-answers-2mcb?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el"&gt;RAG vs GraphRAG: When Agents Hallucinate Answers&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Semantic Tool Selection:&lt;/strong&gt; &lt;a href="https://dev.to/aws/reduce-agent-errors-and-token-costs-with-semantic-tool-selection-7mf?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el"&gt;Reduce Agent Errors and Token Costs&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Neurosymbolic Guardrails:&lt;/strong&gt; &lt;a href="https://dev.to/aws/ai-agent-guardrails-rules-that-llms-cannot-bypass-596d?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el"&gt;AI Agent Guardrails: Rules That LLMs Cannot Bypass&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Runtime Guardrails (Steering):&lt;/strong&gt; &lt;a href="https://dev.to/aws/runtime-guardrails-for-ai-agents-steer-dont-block-278n?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el"&gt;Runtime Guardrails for AI Agents: Steer, Don't Block&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-Agent Validation:&lt;/strong&gt; &lt;a href="https://dev.to/aws/how-to-stop-ai-agents-from-hallucinating-silently-with-multi-agent-validation-3f7e?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el"&gt;Stop AI Agents from Hallucinating Silently&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Memory Pointers:&lt;/strong&gt; &lt;a href="https://dev.to/aws/ai-context-window-overflow-memory-pointer-fix-22bk?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el"&gt;AI Context Window Overflow: Memory Pointer Fix&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Async HandleId:&lt;/strong&gt; &lt;a href="https://dev.to/aws/fix-mcp-timeouts-async-handleid-pattern-8ek?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el"&gt;Fix MCP Timeouts: Async HandleId Pattern&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;DebounceHook:&lt;/strong&gt; &lt;a href="https://dev.to/aws/how-to-prevent-ai-agent-reasoning-loops-from-wasting-tokens-2652?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el"&gt;Prevent AI Agent Reasoning Loops&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Complete series:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://dev.to/aws/stop-ai-agent-hallucinations-4-essential-techniques-2i94?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el"&gt;Stop AI Agent Hallucinations: 5 Essential Techniques&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/aws/why-ai-agents-fail-3-failure-modes-that-cost-you-tokens-and-time-1flb?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el"&gt;Why AI Agents Fail: 3 Failure Modes That Cost You Tokens and Time&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;



&lt;p&gt;Gracias!&lt;/p&gt;

&lt;p&gt;🇻🇪🇨🇱 &lt;a href="https://dev.to/elizabethfuentes12"&gt;Dev.to&lt;/a&gt; &lt;a href="https://www.linkedin.com/in/lizfue/" rel="noopener noreferrer"&gt;Linkedin&lt;/a&gt; &lt;a href="https://github.com/elizabethfuentes12/" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; &lt;a href="https://twitter.com/elizabethfue12" rel="noopener noreferrer"&gt;Twitter&lt;/a&gt; &lt;a href="https://www.instagram.com/elifue.tech" rel="noopener noreferrer"&gt;Instagram&lt;/a&gt; &lt;a href="https://www.youtube.com/channel/UCr0Gnc-t30m4xyrvsQpNp2Q" rel="noopener noreferrer"&gt;Youtube&lt;/a&gt;&lt;/p&gt;


&lt;div class="ltag__user ltag__user__id__717518"&gt;
    &lt;a href="/elizabethfuentes12" class="ltag__user__link profile-image-link"&gt;
      &lt;div class="ltag__user__pic"&gt;
        &lt;img src="https://media2.dev.to/dynamic/image/width=150,height=150,fit=cover,gravity=auto,format=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F717518%2Fb550b165-b8b9-405d-acfb-e5dc846765b0.png" alt="elizabethfuentes12 image"&gt;
      &lt;/div&gt;
    &lt;/a&gt;
  &lt;div class="ltag__user__content"&gt;
    &lt;h2&gt;
&lt;a class="ltag__user__link" href="/elizabethfuentes12"&gt;Elizabeth Fuentes L&lt;/a&gt;Follow
&lt;/h2&gt;
    &lt;div class="ltag__user__summary"&gt;
      &lt;a class="ltag__user__link" href="/elizabethfuentes12"&gt;I help developers build production-ready AI applications through hands-on tutorials and open-source projects.&lt;/a&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;



</description>
      <category>ai</category>
      <category>tutorial</category>
      <category>programming</category>
      <category>python</category>
    </item>
    <item>
      <title>[Boost]</title>
      <dc:creator>Elizabeth Fuentes L</dc:creator>
      <pubDate>Sat, 09 May 2026 00:58:52 +0000</pubDate>
      <link>https://forem.com/elizabethfuentes12/-f98</link>
      <guid>https://forem.com/elizabethfuentes12/-f98</guid>
      <description>&lt;div class="ltag__link--embedded"&gt;
  &lt;div class="crayons-story "&gt;
  &lt;a href="https://dev.to/aws/how-to-prevent-ai-agent-reasoning-loops-from-wasting-tokens-2652" class="crayons-story__hidden-navigation-link"&gt;How to Prevent AI Agent Reasoning Loops from Wasting Tokens&lt;/a&gt;


  &lt;div class="crayons-story__body crayons-story__body-full_post"&gt;
    &lt;div class="crayons-story__top"&gt;
      &lt;div class="crayons-story__meta"&gt;
        &lt;div class="crayons-story__author-pic"&gt;
          &lt;a class="crayons-logo crayons-logo--l" href="/aws"&gt;
            &lt;img alt="AWS logo" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Forganization%2Fprofile_image%2F1726%2F2a73f1e6-7995-4348-ae37-44b064274c59.png" class="crayons-logo__image" width="320" height="320"&gt;
          &lt;/a&gt;

          &lt;a href="/elizabethfuentes12" class="crayons-avatar  crayons-avatar--s absolute -right-2 -bottom-2 border-solid border-2 border-base-inverted  "&gt;
            &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F717518%2Fb550b165-b8b9-405d-acfb-e5dc846765b0.png" alt="elizabethfuentes12 profile" class="crayons-avatar__image" width="420" height="420"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
        &lt;div&gt;
          &lt;div&gt;
            &lt;a href="/elizabethfuentes12" class="crayons-story__secondary fw-medium m:hidden"&gt;
              Elizabeth Fuentes L
            &lt;/a&gt;
            &lt;div class="profile-preview-card relative mb-4 s:mb-0 fw-medium hidden m:inline-block"&gt;
              
                Elizabeth Fuentes L
                
              
              &lt;div id="story-author-preview-content-3593545" class="profile-preview-card__content crayons-dropdown branded-7 p-4 pt-0"&gt;
                &lt;div class="gap-4 grid"&gt;
                  &lt;div class="-mt-4"&gt;
                    &lt;a href="/elizabethfuentes12" class="flex"&gt;
                      &lt;span class="crayons-avatar crayons-avatar--xl mr-2 shrink-0"&gt;
                        &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F717518%2Fb550b165-b8b9-405d-acfb-e5dc846765b0.png" class="crayons-avatar__image" alt="" width="420" height="420"&gt;
                      &lt;/span&gt;
                      &lt;span class="crayons-link crayons-subtitle-2 mt-5"&gt;Elizabeth Fuentes L&lt;/span&gt;
                    &lt;/a&gt;
                  &lt;/div&gt;
                  &lt;div class="print-hidden"&gt;
                    
                      Follow
                    
                  &lt;/div&gt;
                  &lt;div class="author-preview-metadata-container"&gt;&lt;/div&gt;
                &lt;/div&gt;
              &lt;/div&gt;
            &lt;/div&gt;

            &lt;span&gt;
              &lt;span class="crayons-story__tertiary fw-normal"&gt; for &lt;/span&gt;&lt;a href="/aws" class="crayons-story__secondary fw-medium"&gt;AWS&lt;/a&gt;
            &lt;/span&gt;
          &lt;/div&gt;
          &lt;a href="https://dev.to/aws/how-to-prevent-ai-agent-reasoning-loops-from-wasting-tokens-2652" class="crayons-story__tertiary fs-xs"&gt;&lt;time&gt;May 4&lt;/time&gt;&lt;span class="time-ago-indicator-initial-placeholder"&gt;&lt;/span&gt;&lt;/a&gt;
        &lt;/div&gt;
      &lt;/div&gt;

    &lt;/div&gt;

    &lt;div class="crayons-story__indention"&gt;
      &lt;h2 class="crayons-story__title crayons-story__title-full_post"&gt;
        &lt;a href="https://dev.to/aws/how-to-prevent-ai-agent-reasoning-loops-from-wasting-tokens-2652" id="article-link-3593545"&gt;
          How to Prevent AI Agent Reasoning Loops from Wasting Tokens
        &lt;/a&gt;
      &lt;/h2&gt;
        &lt;div class="crayons-story__tags"&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/ai"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;ai&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/tutorial"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;tutorial&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/python"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;python&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/aws"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;aws&lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="crayons-story__bottom"&gt;
        &lt;div class="crayons-story__details"&gt;
          &lt;a href="https://dev.to/aws/how-to-prevent-ai-agent-reasoning-loops-from-wasting-tokens-2652" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left"&gt;
            &lt;div class="multiple_reactions_aggregate"&gt;
              &lt;span class="multiple_reactions_icons_container"&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/multi-unicorn-b44d6f8c23cdd00964192bedc38af3e82463978aa611b4365bd33a0f1f4f3e97.svg" width="24" height="24"&gt;
                  &lt;/span&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/fire-f60e7a582391810302117f987b22a8ef04a2fe0df7e3258a5f49332df1cec71e.svg" width="24" height="24"&gt;
                  &lt;/span&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/sparkle-heart-5f9bee3767e18deb1bb725290cb151c25234768a0e9a2bd39370c382d02920cf.svg" width="24" height="24"&gt;
                  &lt;/span&gt;
              &lt;/span&gt;
              &lt;span class="aggregate_reactions_counter"&gt;13&lt;span class="hidden s:inline"&gt; reactions&lt;/span&gt;&lt;/span&gt;
            &lt;/div&gt;
          &lt;/a&gt;
            &lt;a href="https://dev.to/aws/how-to-prevent-ai-agent-reasoning-loops-from-wasting-tokens-2652#comments" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left flex items-center"&gt;
              Comments


              2&lt;span class="hidden s:inline"&gt; comments&lt;/span&gt;
            &lt;/a&gt;
        &lt;/div&gt;
        &lt;div class="crayons-story__save"&gt;
          &lt;small class="crayons-story__tertiary fs-xs mr-2"&gt;
            8 min read
          &lt;/small&gt;
            
              &lt;span class="bm-initial"&gt;
                

              &lt;/span&gt;
              &lt;span class="bm-success"&gt;
                

              &lt;/span&gt;
            
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;


</description>
      <category>agents</category>
      <category>ai</category>
      <category>aws</category>
      <category>llm</category>
    </item>
    <item>
      <title>Por Qué Fallan los Agentes de IA: 3 Modos de Fallo Que Cuestan Tokens y Tiempo</title>
      <dc:creator>Elizabeth Fuentes L</dc:creator>
      <pubDate>Fri, 08 May 2026 23:19:28 +0000</pubDate>
      <link>https://forem.com/aws-espanol/por-que-fallan-los-agentes-de-ia-3-modos-de-fallo-que-cuestan-tokens-y-tiempo-20b</link>
      <guid>https://forem.com/aws-espanol/por-que-fallan-los-agentes-de-ia-3-modos-de-fallo-que-cuestan-tokens-y-tiempo-20b</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;Los agentes de IA no fallan como el software tradicional: no se bloquean con un stack trace. Fallan silenciosamente: devuelven respuestas incompletas, se congelan en APIs lentas o queman tokens llamando a la misma herramienta una y otra vez. El agente parece funcionar, pero la salida está mal, llega tarde o es costosa.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Esta serie cubre los tres modos de fallo más comunes con soluciones respaldadas por investigación. Cada técnica tiene una demostración ejecutable que mide la diferencia antes/después.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Código funcional:&lt;/strong&gt; &lt;a href="https://github.com/aws-samples/sample-why-agents-fail/tree/main/stop-ai-agents-wasting-tokens" rel="noopener noreferrer"&gt;github.com/aws-samples/sample-why-agents-fail&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Las demos usan &lt;a href="https://strandsagents.com/?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el" rel="noopener noreferrer"&gt;Strands Agents&lt;/a&gt; con &lt;a href="https://platform.openai.com/" rel="noopener noreferrer"&gt;OpenAI&lt;/a&gt; (GPT-4o-mini). Los patrones son independientes del framework: aplican a &lt;a href="https://langchain-ai.github.io/langgraph/" rel="noopener noreferrer"&gt;LangGraph&lt;/a&gt;, &lt;a href="https://github.com/microsoft/autogen" rel="noopener noreferrer"&gt;AutoGen&lt;/a&gt;, &lt;a href="https://github.com/crewAIInc/crewAI" rel="noopener noreferrer"&gt;CrewAI&lt;/a&gt; o cualquier framework que soporte llamadas a herramientas y hooks de ciclo de vida.&lt;/p&gt;

&lt;h2&gt;
  
  
  Esta Serie: 3 Soluciones Esenciales
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Desbordamiento de Ventana de Contexto&lt;/strong&gt; — Patrón de Puntero de Memoria para datos grandes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Herramientas MCP Que Nunca Responden&lt;/strong&gt; — Patrón handleId asíncrono para APIs externas lentas&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Loops de Razonamiento en Agentes de IA&lt;/strong&gt; — DebounceHook + estados claros de herramientas para bloquear llamadas repetidas&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  ¿Qué Sucede Cuando las Salidas de Herramientas Desbordan la Ventana de Contexto?
&lt;/h2&gt;

&lt;p&gt;El desbordamiento de ventana de contexto ocurre cuando una herramienta devuelve más datos de los que el LLM puede procesar: logs del servidor, resultados de bases de datos o contenidos de archivos que exceden el límite de tokens. El agente no falla con un error. Se degrada silenciosamente: trunca datos, pierde contexto o produce respuestas incompletas.&lt;/p&gt;

&lt;p&gt;Una investigación de &lt;a href="https://arxiv.org/html/2511.22729v1" rel="noopener noreferrer"&gt;IBM&lt;/a&gt; cuantifica esto: un flujo de trabajo de Ciencia de Materiales consumió &lt;strong&gt;20 millones de tokens y falló&lt;/strong&gt;. El mismo flujo con punteros de memoria usó &lt;strong&gt;1,234 tokens y tuvo éxito&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz197w37u0rmuq1w3y3n9.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz197w37u0rmuq1w3y3n9.jpg" alt="Comparación de un agente de IA sin Patrón de Puntero de Memoria versus con él, mostrando cómo los datos grandes permanecen fuera de la ventana de contexto" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;La solución — Patrón de Puntero de Memoria:&lt;/strong&gt; Almacena datos grandes en &lt;code&gt;agent.state&lt;/code&gt;, devuelve un puntero corto al contexto. La siguiente herramienta resuelve el puntero para acceder a los datos completos:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ToolContext&lt;/span&gt;

&lt;span class="nd"&gt;@tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;fetch_application_logs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;app_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tool_context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ToolContext&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;hours&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;24&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Obtiene logs. Almacena datos grandes como puntero para evitar desbordamiento de contexto.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;logs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;generate_logs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;app_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;hours&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Podría ser 200KB+
&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;logs&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;20_000&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;pointer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;logs-&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;app_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="n"&gt;tool_context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pointer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;logs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Datos almacenados como puntero &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;pointer&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;. Usa herramientas de análisis para consultarlo.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;logs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nd"&gt;@tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;analyze_error_patterns&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data_pointer&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tool_context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ToolContext&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Analiza errores — resuelve puntero desde agent.state.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tool_context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data_pointer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;errors&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;level&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ERROR&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Se encontraron &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;errors&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; errores en &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;service&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;errors&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; servicios&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;El LLM nunca ve los 200KB: solo ve &lt;code&gt;"Datos almacenados como puntero 'logs-payment-service'"&lt;/code&gt; (52 bytes).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;¿Por qué &lt;a href="https://strandsagents.com/?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el" rel="noopener noreferrer"&gt;Strands Agents&lt;/a&gt;?&lt;/strong&gt; La API de &lt;a href="https://strandsagents.com/docs/user-guide/concepts/tools/custom-tools/?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el" rel="noopener noreferrer"&gt;&lt;code&gt;ToolContext&lt;/code&gt;&lt;/a&gt; proporciona &lt;code&gt;agent.state&lt;/code&gt; como un almacén clave-valor nativo con alcance para cada agente: sin diccionarios globales, sin infraestructura externa. Para flujos multi-agente, &lt;a href="https://strandsagents.com/docs/user-guide/concepts/multi-agent/multi-agent-patterns/?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el" rel="noopener noreferrer"&gt;&lt;code&gt;invocation_state&lt;/code&gt;&lt;/a&gt; comparte datos entre agentes en un &lt;a href="https://strandsagents.com/docs/user-guide/concepts/multi-agent/swarm/?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el" rel="noopener noreferrer"&gt;Swarm&lt;/a&gt; con la misma API.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Métrica&lt;/th&gt;
&lt;th&gt;Sin punteros&lt;/th&gt;
&lt;th&gt;Con Punteros de Memoria&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Datos en contexto&lt;/td&gt;
&lt;td&gt;214KB (logs completos)&lt;/td&gt;
&lt;td&gt;52 bytes (puntero)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Comportamiento del agente&lt;/td&gt;
&lt;td&gt;Trunca o falla&lt;/td&gt;
&lt;td&gt;Procesa todos los datos&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Errores detectados&lt;/td&gt;
&lt;td&gt;Parcial&lt;/td&gt;
&lt;td&gt;Completo&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0vsn5ydchfhy6lu831d0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0vsn5ydchfhy6lu831d0.png" alt="Gráfico de barras mostrando uso de tokens en diferentes estrategias de gestión de contexto" width="800" height="478"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Demo completa:&lt;/strong&gt; &lt;a href="https://github.com/aws-samples/sample-why-agents-fail/tree/main/stop-ai-agents-wasting-tokens/01-context-overflow-demo" rel="noopener noreferrer"&gt;01-context-overflow-demo&lt;/a&gt; — implementaciones de agente único y multi-agente (Swarm) con notebooks.&lt;/p&gt;


&lt;h2&gt;
  
  
  ¿Por Qué los Agentes de IA se Congelan al Llamar APIs Externas?
&lt;/h2&gt;

&lt;p&gt;Los agentes de IA se congelan cuando las herramientas MCP llaman a APIs externas lentas o que no responden. El agente se bloquea en la llamada a la herramienta, el usuario no ve progreso, y después de 7 segundos muchas implementaciones devuelven un &lt;a href="https://community.openai.com/t/call-remote-mcp-server-tool-timed-out-resulting-in-error-424/1364167" rel="noopener noreferrer"&gt;error 424&lt;/a&gt;. &lt;a href="https://strandsagents.com/docs/user-guide/concepts/tools/mcp-tools/?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el" rel="noopener noreferrer"&gt;MCP (Model Context Protocol)&lt;/a&gt; les da a los agentes la capacidad de llamar herramientas externas, pero no maneja timeout o reintentos por defecto.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frlvwpo26agw84bvi620c.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frlvwpo26agw84bvi620c.jpg" alt="Llamada síncrona a herramienta MCP mostrando agente bloqueado mientras espera API lenta" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;La solución — Patrón handleId asíncrono:&lt;/strong&gt; La herramienta devuelve inmediatamente un ID de trabajo. El agente consulta una herramienta separada &lt;code&gt;check_status&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;mcp.server.fastmcp&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FastMCP&lt;/span&gt;

&lt;span class="n"&gt;mcp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;FastMCP&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;timeout-demo&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;JOBS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;

&lt;span class="nd"&gt;@mcp.tool&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;start_long_job&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Devuelve handle inmediatamente — previene timeout.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;job_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;uuid&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;uuid4&lt;/span&gt;&lt;span class="p"&gt;())[:&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;JOBS&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;job_id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;processing&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;task&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;_process_job&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;job_id&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;  &lt;span class="c1"&gt;# Trabajo en segundo plano
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Trabajo iniciado. Handle: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;job_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;. Usa check_job_status para consultar.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="nd"&gt;@mcp.tool&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;check_job_status&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;job_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Consulta estado del trabajo — devuelve &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;processing&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; o &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;completed&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; con resultado.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;job&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;JOBS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;job_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;FAILED: Trabajo &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;job_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; no encontrado&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;upper&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;result&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Todavía procesando...&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Escenario&lt;/th&gt;
&lt;th&gt;Tiempo de respuesta&lt;/th&gt;
&lt;th&gt;UX&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;API rápida (1s)&lt;/td&gt;
&lt;td&gt;3s total&lt;/td&gt;
&lt;td&gt;OK&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;API lenta (15s)&lt;/td&gt;
&lt;td&gt;18s bloqueado&lt;/td&gt;
&lt;td&gt;Agente congelado&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;API fallida&lt;/td&gt;
&lt;td&gt;Error 424 después de 7s&lt;/td&gt;
&lt;td&gt;Agente falla&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;handleId asíncrono&lt;/td&gt;
&lt;td&gt;~4s (inmediato + consulta)&lt;/td&gt;
&lt;td&gt;Agente responde&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj1ef742j0hk1dcvuna54.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj1ef742j0hk1dcvuna54.jpg" alt="Visualización de línea de tiempo mostrando cuatro patrones de respuesta MCP" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;¿Por qué &lt;a href="https://strandsagents.com/?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el" rel="noopener noreferrer"&gt;Strands Agents&lt;/a&gt;?&lt;/strong&gt; El &lt;a href="https://strandsagents.com/docs/user-guide/concepts/tools/mcp-tools/?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el" rel="noopener noreferrer"&gt;&lt;code&gt;MCPClient&lt;/code&gt;&lt;/a&gt; se conecta a cualquier servidor MCP. El agente descubre herramientas en tiempo de ejecución vía &lt;code&gt;list_tools_sync()&lt;/code&gt;: sin lista de herramientas codificada. Cuando el servidor MCP implementa el patrón asíncrono, el agente consulta automáticamente sin código de orquestación adicional.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Demo completa:&lt;/strong&gt; &lt;a href="https://github.com/aws-samples/sample-why-agents-fail/tree/main/stop-ai-agents-wasting-tokens/02-mcp-timeout-demo" rel="noopener noreferrer"&gt;02-mcp-timeout-demo&lt;/a&gt; — servidor MCP local con los 4 escenarios y notebook.&lt;/p&gt;


&lt;h2&gt;
  
  
  ¿Por Qué los Agentes de IA Repiten la Misma Llamada a Herramienta?
&lt;/h2&gt;

&lt;p&gt;Los loops de razonamiento en agentes de IA ocurren cuando el agente llama a la misma herramienta repetidamente con parámetros idénticos, sin hacer progreso. La causa raíz es retroalimentación ambigua de la herramienta: respuestas como "puede haber más resultados disponibles" hacen que el agente piense que otra llamada producirá mejores resultados. &lt;a href="https://the-decoder.com/language-models-can-overthink-and-get-stuck-in-endless-thought-loops/" rel="noopener noreferrer"&gt;Las investigaciones muestran&lt;/a&gt; que los agentes pueden hacer loops cientos de veces sin entregar una respuesta.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F76bs0qj49uby5z0t06np.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F76bs0qj49uby5z0t06np.jpg" alt="Diagrama mostrando cómo la retroalimentación ambigua de herramientas causa loops versus cómo estados claros y DebounceHook los previenen" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solución 1 — Estados terminales claros:&lt;/strong&gt; Las herramientas devuelven &lt;code&gt;SUCCESS&lt;/code&gt; o &lt;code&gt;FAILED&lt;/code&gt; explícito en lugar de mensajes ambiguos:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Ambiguo (causa loops)
&lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Vuelos encontrados: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;. Puede haber más resultados disponibles.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="c1"&gt;# Claro (el agente se detiene)
&lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SUCCESS: Vuelo &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;conf_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; reservado para &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;passenger&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;. Confirmación enviada.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;&lt;strong&gt;Solución 2 — DebounceHook:&lt;/strong&gt; Detecta y bloquea llamadas duplicadas a herramientas a nivel de framework:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands.hooks.registry&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;HookProvider&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;HookRegistry&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands.hooks.events&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BeforeToolCallEvent&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;DebounceHook&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;HookProvider&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Bloquea llamadas duplicadas a herramientas en una ventana deslizante.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;window_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;call_history&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;window_size&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;window_size&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;register_hooks&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;registry&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;HookRegistry&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;registry&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_callback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BeforeToolCallEvent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;check_duplicate&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;check_duplicate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;BeforeToolCallEvent&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_use&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_use&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{})))&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;call_history&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cancel_tool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;BLOCKED: Llamada duplicada a &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_use&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;call_history&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;call_history&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;call_history&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;window_size&lt;/span&gt;&lt;span class="p"&gt;:]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Estrategia&lt;/th&gt;
&lt;th&gt;Llamadas a herramientas&lt;/th&gt;
&lt;th&gt;Resultado&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Retroalimentación ambigua (línea base)&lt;/td&gt;
&lt;td&gt;14 llamadas&lt;/td&gt;
&lt;td&gt;Sin respuesta definitiva&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DebounceHook&lt;/td&gt;
&lt;td&gt;12 llamadas (2 bloqueadas)&lt;/td&gt;
&lt;td&gt;Completa con bloqueos&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Estados SUCCESS claros&lt;/td&gt;
&lt;td&gt;2 llamadas&lt;/td&gt;
&lt;td&gt;Completado inmediato&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqwgxbs8mu8tl83f1s9s2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqwgxbs8mu8tl83f1s9s2.png" alt="Gráfico de barras mostrando llamadas a herramientas en diferentes estrategias" width="800" height="434"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;¿Por qué &lt;a href="https://strandsagents.com/?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el" rel="noopener noreferrer"&gt;Strands Agents&lt;/a&gt;?&lt;/strong&gt; La API de &lt;a href="https://strandsagents.com/docs/user-guide/concepts/agents/hooks/?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el" rel="noopener noreferrer"&gt;&lt;code&gt;HookProvider&lt;/code&gt;&lt;/a&gt; intercepta llamadas a herramientas vía &lt;code&gt;BeforeToolCallEvent&lt;/code&gt; antes de que se ejecuten. Establecer &lt;code&gt;event.cancel_tool&lt;/code&gt; bloquea la ejecución a nivel de framework: el LLM no puede omitirlo. Esto hace que los hooks sean componibles para apilar DebounceHook, LimitToolCounts y validadores personalizados en el mismo agente.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Demo completa:&lt;/strong&gt; &lt;a href="https://github.com/aws-samples/sample-why-agents-fail/tree/main/stop-ai-agents-wasting-tokens/03-reasoning-loops-demo" rel="noopener noreferrer"&gt;03-reasoning-loops-demo&lt;/a&gt; — los 4 escenarios con hooks y notebook.&lt;/p&gt;


&lt;h2&gt;
  
  
  Requisitos Previos
&lt;/h2&gt;

&lt;p&gt;Necesitas &lt;a href="https://python.org/downloads" rel="noopener noreferrer"&gt;Python 3.9+&lt;/a&gt;, &lt;a href="https://docs.astral.sh/uv/" rel="noopener noreferrer"&gt;uv&lt;/a&gt; (un gestor de paquetes rápido de Python), y una &lt;a href="https://platform.openai.com/api-keys" rel="noopener noreferrer"&gt;clave API de OpenAI&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/aws-samples/sample-why-agents-fail
&lt;span class="nb"&gt;cd &lt;/span&gt;sample-why-agents-fail/stop-ai-agents-wasting-tokens

&lt;span class="c"&gt;# Elige cualquier demo&lt;/span&gt;
&lt;span class="nb"&gt;cd &lt;/span&gt;01-context-overflow-demo   &lt;span class="c"&gt;# o 02-mcp-timeout-demo, 03-reasoning-loops-demo&lt;/span&gt;
uv venv &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; uv pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"tu-clave-aquí"&lt;/span&gt;

uv run python test_&lt;span class="k"&gt;*&lt;/span&gt;.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Cada demo es independiente con sus propias dependencias, script de prueba y notebook de &lt;a href="https://jupyter.org/" rel="noopener noreferrer"&gt;Jupyter&lt;/a&gt;.&lt;/p&gt;


&lt;h2&gt;
  
  
  Preguntas Frecuentes
&lt;/h2&gt;
&lt;h3&gt;
  
  
  ¿Cuáles son los modos de fallo más comunes en agentes de IA?
&lt;/h3&gt;

&lt;p&gt;Los tres modos de fallo más comunes son el desbordamiento de ventana de contexto (la herramienta devuelve más datos de los que el LLM puede procesar), timeouts de herramientas MCP (APIs externas bloquean al agente indefinidamente) y loops de razonamiento (el agente repite la misma llamada a herramienta sin progresar). Cada modo de fallo causa desperdicio de tokens y degrada la calidad de respuesta.&lt;/p&gt;
&lt;h3&gt;
  
  
  ¿Cómo reduzco los costos de tokens de un agente de IA?
&lt;/h3&gt;

&lt;p&gt;Las dos técnicas más efectivas son los punteros de memoria y estados claros de herramientas. El Patrón de Puntero de Memoria almacena salidas grandes de herramientas en estado externo y pasa referencias cortas al contexto del LLM, reduciendo el uso de tokens de más de 200KB a menos de 100 bytes por llamada a herramienta. Estados terminales claros (&lt;code&gt;SUCCESS&lt;/code&gt;/&lt;code&gt;FAILED&lt;/code&gt;) en respuestas de herramientas previenen que el agente reintente operaciones completadas, lo que puede reducir las llamadas a herramientas de 14 a 2.&lt;/p&gt;
&lt;h3&gt;
  
  
  ¿Puedo usar estos patrones con frameworks distintos a Strands Agents?
&lt;/h3&gt;

&lt;p&gt;Sí. El Patrón de Puntero de Memoria funciona con cualquier framework que soporte contexto de herramientas (pasar estado entre herramientas). El patrón handleId asíncrono es un patrón de diseño de servidor MCP: funciona con cualquier agente compatible con MCP. DebounceHook requiere hooks de ciclo de vida, que están disponibles en &lt;a href="https://langchain-ai.github.io/langgraph/" rel="noopener noreferrer"&gt;LangGraph&lt;/a&gt;, &lt;a href="https://github.com/microsoft/autogen" rel="noopener noreferrer"&gt;AutoGen&lt;/a&gt; y &lt;a href="https://github.com/crewAIInc/crewAI" rel="noopener noreferrer"&gt;CrewAI&lt;/a&gt; con APIs diferentes.&lt;/p&gt;


&lt;h2&gt;
  
  
  Referencias
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Investigación
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://arxiv.org/html/2511.22729v1" rel="noopener noreferrer"&gt;Solving Context Window Overflow in AI Agents&lt;/a&gt; — IBM Research, Nov 2025&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://arxiv.org/pdf/2412.05449" rel="noopener noreferrer"&gt;Towards Effective GenAI Multi-Agent Collaboration&lt;/a&gt; — Amazon, Dec 2024&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://octopus.com/blog/mcp-timeout-retry" rel="noopener noreferrer"&gt;Resilient AI Agents With MCP&lt;/a&gt; — Octopus, May 2025&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://the-decoder.com/language-models-can-overthink-and-get-stuck-in-endless-thought-loops/" rel="noopener noreferrer"&gt;Language models can overthink&lt;/a&gt; — The Decoder, Jan 2025&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Implementación
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://strandsagents.com/docs/user-guide/concepts/agents/state/?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el" rel="noopener noreferrer"&gt;Strands Agent State&lt;/a&gt; — ToolContext and agent.state&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://strandsagents.com/docs/user-guide/concepts/tools/mcp-tools/?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el" rel="noopener noreferrer"&gt;Strands MCP Tools&lt;/a&gt; — Connect any MCP server&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://strandsagents.com/docs/user-guide/concepts/agents/hooks/?trk=87c4c426-cddf-4799-a299-273337552ad8&amp;amp;sc_channel=el" rel="noopener noreferrer"&gt;Strands Hooks&lt;/a&gt; — Lifecycle events and tool cancellation&lt;/li&gt;
&lt;/ul&gt;



&lt;p&gt;¿Qué modo de fallo has encontrado en tus agentes? Comparte en los comentarios.&lt;/p&gt;



&lt;p&gt;Gracias!&lt;/p&gt;

&lt;p&gt;🇻🇪🇨🇱 &lt;a href="https://dev.to/elizabethfuentes12"&gt;Dev.to&lt;/a&gt; &lt;a href="https://www.linkedin.com/in/lizfue/" rel="noopener noreferrer"&gt;Linkedin&lt;/a&gt; &lt;a href="https://github.com/elizabethfuentes12/" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; &lt;a href="https://twitter.com/elizabethfue12" rel="noopener noreferrer"&gt;Twitter&lt;/a&gt; &lt;a href="https://www.instagram.com/elifue.tech" rel="noopener noreferrer"&gt;Instagram&lt;/a&gt; &lt;a href="https://www.youtube.com/channel/UCr0Gnc-t30m4xyrvsQpNp2Q" rel="noopener noreferrer"&gt;Youtube&lt;/a&gt;&lt;/p&gt;


&lt;div class="ltag__user ltag__user__id__717518"&gt;
    &lt;a href="/elizabethfuentes12" class="ltag__user__link profile-image-link"&gt;
      &lt;div class="ltag__user__pic"&gt;
        &lt;img src="https://media2.dev.to/dynamic/image/width=150,height=150,fit=cover,gravity=auto,format=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F717518%2Fb550b165-b8b9-405d-acfb-e5dc846765b0.png" alt="elizabethfuentes12 image"&gt;
      &lt;/div&gt;
    &lt;/a&gt;
  &lt;div class="ltag__user__content"&gt;
    &lt;h2&gt;
&lt;a class="ltag__user__link" href="/elizabethfuentes12"&gt;Elizabeth Fuentes L&lt;/a&gt;Follow
&lt;/h2&gt;
    &lt;div class="ltag__user__summary"&gt;
      &lt;a class="ltag__user__link" href="/elizabethfuentes12"&gt;I help developers build production-ready AI applications through hands-on tutorials and open-source projects.&lt;/a&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;



</description>
      <category>ai</category>
      <category>tutorial</category>
      <category>python</category>
      <category>programming</category>
    </item>
    <item>
      <title>How to Prevent AI Agent Reasoning Loops from Wasting Tokens</title>
      <dc:creator>Elizabeth Fuentes L</dc:creator>
      <pubDate>Mon, 04 May 2026 23:00:00 +0000</pubDate>
      <link>https://forem.com/aws/how-to-prevent-ai-agent-reasoning-loops-from-wasting-tokens-2652</link>
      <guid>https://forem.com/aws/how-to-prevent-ai-agent-reasoning-loops-from-wasting-tokens-2652</guid>
      <description>&lt;p&gt;&lt;strong&gt;AI agent reasoning loops&lt;/strong&gt; occur when an agent calls the same tool repeatedly without making progress, convinced that one more attempt will produce the perfect answer. The agent wastes tokens, time, and money without delivering a result. This post shows how to detect and block repeated calls, validated with a demo where ambiguous tools caused 14 calls vs clear SUCCESS states that stopped in 2.&lt;/p&gt;

&lt;p&gt;This demo uses &lt;a href="https://strandsagents.com/docs/" rel="noopener noreferrer"&gt;Strands Agents&lt;/a&gt;. The patterns — debounce hooks, clear tool states, and call limits — are framework-agnostic and apply to any agent that supports lifecycle hooks, including LangGraph, AutoGen, and CrewAI.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Working code:&lt;/strong&gt; &lt;a href="https://github.com/aws-samples/sample-why-agents-fail/tree/main/stop-ai-agents-wasting-tokens/03-reasoning-loops-demo" rel="noopener noreferrer"&gt;github.com/aws-samples/sample-why-agents-fail&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Series: Why AI Agents Fail
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/aws-samples/sample-why-agents-fail/tree/main/stop-ai-agents-wasting-tokens/01-context-overflow-demo" rel="noopener noreferrer"&gt;Context Window Overflow&lt;/a&gt;&lt;/strong&gt; — Memory Pointer Pattern for large data&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/aws-samples/sample-why-agents-fail/tree/main/stop-ai-agents-wasting-tokens/02-mcp-timeout-demo" rel="noopener noreferrer"&gt;MCP Tools That Never Respond&lt;/a&gt;&lt;/strong&gt; — Async pattern for slow external APIs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI Agent Reasoning Loops&lt;/strong&gt; (this post) — Detect and block repeated tool calls&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The Problem: Agents That Overthink
&lt;/h2&gt;

&lt;p&gt;AI agent reasoning loops occur when an agent calls the same tool repeatedly without making progress, wasting tokens and time without delivering a result. AI agents don't just fail by giving wrong answers; they fail by never finishing. Research shows agents get trapped in reasoning loops where they call the same tool repeatedly, convinced that "one more step" will produce the perfect answer.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://the-decoder.com/language-models-can-overthink-and-get-stuck-in-endless-thought-loops/" rel="noopener noreferrer"&gt;The Decoder (Jan 2025)&lt;/a&gt; found that even with unlimited computing power, overthinking leads to poor decisions. Incomplete understanding of the world causes compounding errors. Each additional reasoning step makes things worse, not better.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://particula.tech/blog/ai-agent-loops-reasoning-steps-optimization" rel="noopener noreferrer"&gt;Particula (Jul 2025)&lt;/a&gt; (community observation) documented an extreme case: an agent executed &lt;strong&gt;847 reasoning steps&lt;/strong&gt; at &lt;strong&gt;$47 per minute&lt;/strong&gt; and never delivered a final answer. It kept refining logic, questioning conclusions, and requesting more data in an endless cycle.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://codieshub.com/for-ai/prevent-agent-loops-costs" rel="noopener noreferrer"&gt;CodiesHub (Dec 2025)&lt;/a&gt; (community observation) identifies the root causes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Unclear goals&lt;/strong&gt; — agent doesn't know when the task is complete&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ambiguous tool feedback&lt;/strong&gt; — tools don't return clear success/failure states&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No stopping criteria&lt;/strong&gt; — no hard limits on iterations or time&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why Loops Happen: Ambiguous Tool Feedback
&lt;/h2&gt;

&lt;p&gt;Ambiguous tool feedback occurs when tools return partial results or suggest "more data may be available" without clear terminal states, causing agents to retry the same call. Tools that return partial results or suggest "more data may be available" cause agents to retry:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@tool&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;search_flights&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;origin&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;destination&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_price&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Search for flights under a max price.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;prices&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;randint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;800&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;
    &lt;span class="n"&gt;matching&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;prices&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="n"&gt;max_price&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="c1"&gt;# The problem: "More results may be available" signals the LLM to retry
&lt;/span&gt;    &lt;span class="c1"&gt;# The agent interprets this as "I should search again to find a better deal"
&lt;/span&gt;    &lt;span class="nf"&gt;return &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Found &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;matching&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; flights under $&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;max_price&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;(out of &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prices&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; checked). &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Note: More results may be available. Prices change frequently.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;That "Note: More results may be available" triggers the loop. The agent sees it and thinks: "Maybe if I search again, I'll find a better deal." It retries with the same parameters, gets similar results, and the cycle continues.&lt;/p&gt;
&lt;h2&gt;
  
  
  Solution 1: Debounce Hook with Strands
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://strandsagents.com/docs/user-guide/concepts/agents/hooks/" rel="noopener noreferrer"&gt;Strands Hooks&lt;/a&gt; intercept the agent lifecycle at any point. A Debounce Hook uses &lt;a href="https://strandsagents.com/docs/user-guide/concepts/agents/hooks/" rel="noopener noreferrer"&gt;&lt;code&gt;BeforeToolCallEvent&lt;/code&gt;&lt;/a&gt; to detect duplicate calls before they execute:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands.hooks&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;HookProvider&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;BeforeToolCallEvent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;BeforeInvocationEvent&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;DebounceHook&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;HookProvider&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;window_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;call_history&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;       &lt;span class="c1"&gt;# Tracks (tool_name, input) pairs
&lt;/span&gt;        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;window_size&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;window_size&lt;/span&gt;  &lt;span class="c1"&gt;# Sliding window size for duplicate detection
&lt;/span&gt;        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;blocked_count&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;register_hooks&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;registry&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# BeforeInvocationEvent fires once at the start of each agent.invoke() call
&lt;/span&gt;        &lt;span class="n"&gt;registry&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_callback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BeforeInvocationEvent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;reset&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="c1"&gt;# BeforeToolCallEvent fires before every tool execution — this is where we intercept
&lt;/span&gt;        &lt;span class="n"&gt;registry&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_callback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BeforeToolCallEvent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;check_duplicate&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;reset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Clear history at the start of each invocation so limits don't bleed across calls
&lt;/span&gt;        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;call_history&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;check_duplicate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Build a fingerprint from tool name + exact inputs
&lt;/span&gt;        &lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_use&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_use&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]))&lt;/span&gt;
        &lt;span class="n"&gt;recent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;call_history&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;window_size&lt;/span&gt;&lt;span class="p"&gt;:]&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;recent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="c1"&gt;# cancel_tool is a native Strands API: blocks execution and returns this message to the LLM
&lt;/span&gt;            &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cancel_tool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;BLOCKED: Duplicate call detected&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;blocked_count&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt;

        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;call_history&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;search_flights&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;hooks&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;DebounceHook&lt;/span&gt;&lt;span class="p"&gt;()])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;The hook tracks the last 3 tool calls. If the same tool with the same parameters appears twice, the third attempt is blocked via &lt;code&gt;event.cancel_tool&lt;/code&gt;, a native Strands API that prevents tool execution and returns an error message to the LLM.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn4v3joj5k8bwwcdovbgc.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn4v3joj5k8bwwcdovbgc.jpg" alt="Flow diagram showing how DebounceHook intercepts tool calls, checks a sliding window for duplicates, and blocks repeated calls" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Solution 2: Clear SUCCESS/FAILED States
&lt;/h2&gt;

&lt;p&gt;Tools that return explicit terminal states help agents know when to stop:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@tool&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;book_hotel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;hotel&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;guest&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;nights&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Book a hotel room. Returns clear SUCCESS or FAILED.

    Returns:
        SUCCESS: Booking confirmed with ID
        FAILED: Booking failed with reason
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;random&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mf"&gt;0.15&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;conf&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;HT&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;randint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;99999&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="n"&gt;price&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;randint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;150&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;350&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SUCCESS: Booking &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;conf&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; confirmed — &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;guest&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; at &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;hotel&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;nights&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; nights, $&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;price&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;nights&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; total&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;FAILED: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;hotel&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; fully booked&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;When the agent receives &lt;code&gt;"SUCCESS: Booking HT79265 confirmed"&lt;/code&gt;, it knows the task is done. No ambiguity, no extra calls.&lt;/p&gt;
&lt;h2&gt;
  
  
  Solution 3: Hard Limits with LimitToolCounts
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://codieshub.com/for-ai/prevent-agent-loops-costs" rel="noopener noreferrer"&gt;CodiesHub&lt;/a&gt; recommends: "Iterations, tokens, time, spend are non-negotiable." Strands provides &lt;a href="https://strandsagents.com/docs/user-guide/concepts/agents/hooks/" rel="noopener noreferrer"&gt;&lt;code&gt;LimitToolCounts&lt;/code&gt;&lt;/a&gt; in the Hooks Cookbook — a hook that caps tool calls per invocation:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands.hooks&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;HookProvider&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;BeforeToolCallEvent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;BeforeInvocationEvent&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;threading&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Lock&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;LimitToolCounts&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;HookProvider&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Limits tool calls per invocation. From Strands Hooks Cookbook.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_tool_counts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;]):&lt;/span&gt;
        &lt;span class="c1"&gt;# Per-tool call budgets: {"search_flights": 2} means max 2 searches per invocation
&lt;/span&gt;        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;max_tool_counts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;max_tool_counts&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_counts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_lock&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Lock&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  &lt;span class="c1"&gt;# Thread-safe for concurrent tool calls in Swarm scenarios
&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;register_hooks&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;registry&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;registry&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_callback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BeforeInvocationEvent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;reset_counts&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;registry&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_callback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BeforeToolCallEvent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;intercept_tool&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;reset_counts&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Reset per invocation so limits apply per task, not per agent lifetime
&lt;/span&gt;        &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_lock&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_counts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;intercept_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;tool_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_use&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_lock&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;max_count&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;max_tool_counts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;count&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_counts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_counts&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;count&lt;/span&gt;

            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;max_count&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;count&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;max_count&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="c1"&gt;# Hard ceiling: block the call and tell the LLM explicitly to stop
&lt;/span&gt;                &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cancel_tool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Tool &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; limit reached. DO NOT CALL ANYMORE.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="c1"&gt;# Enforce a hard limit of 2 flight searches per booking task — prevents runaway costs
&lt;/span&gt;&lt;span class="n"&gt;limit_hook&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LimitToolCounts&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_tool_counts&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search_flights&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;search_flights&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;hooks&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;limit_hook&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Even if the agent wants to search 10 times, it's capped at 2. Hard ceiling, predictable costs.&lt;/p&gt;
&lt;h2&gt;
  
  
  Demo Results
&lt;/h2&gt;

&lt;p&gt;We tested with a travel booking agent that searches for flights and hotels:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;Tool Calls&lt;/th&gt;
&lt;th&gt;Time&lt;/th&gt;
&lt;th&gt;Result&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Ambiguous Feedback&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;14&lt;/td&gt;
&lt;td&gt;21s&lt;/td&gt;
&lt;td&gt;Agent retried organically — "prices may change" caused loops&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;DebounceHook&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;td&gt;15s&lt;/td&gt;
&lt;td&gt;Reduced retries but some variation in parameters&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Clear SUCCESS States&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;4s&lt;/td&gt;
&lt;td&gt;Agent stopped immediately after SUCCESS&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;LimitToolCounts&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;6 (2 blocked)&lt;/td&gt;
&lt;td&gt;6s&lt;/td&gt;
&lt;td&gt;Hard ceiling enforced — no runaway&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The contrast is dramatic: &lt;strong&gt;14 calls with ambiguous tools vs 2 calls with clear SUCCESS states&lt;/strong&gt;. That is a 7x difference caused purely by tool feedback design.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F549teet1ds9cr7ipp1z2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F549teet1ds9cr7ipp1z2.png" alt="Bar chart comparing tool calls across ambiguous feedback, DebounceHook, clear SUCCESS states, and LimitToolCounts strategies" width="800" height="434"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  When to Use Each Solution
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;DebounceHook&lt;/strong&gt; — prevents duplicate calls with identical parameters. Use when tools are idempotent and retrying with the same input is wasteful.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Clear SUCCESS/FAILED states&lt;/strong&gt; — the simplest solution. Design tools to return explicit terminal states. The agent knows when to stop.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;LimitToolCounts&lt;/strong&gt; — hard ceiling on tool calls per invocation. Use in production to prevent runaway costs regardless of tool design. From the &lt;a href="https://strandsagents.com/docs/user-guide/concepts/agents/hooks/" rel="noopener noreferrer"&gt;Strands Hooks Cookbook&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;All three together&lt;/strong&gt; — defense in depth. Clear states prevent most loops, debounce catches duplicates, and hard limits guarantee bounded execution.&lt;/p&gt;
&lt;h2&gt;
  
  
  Try It Yourself
&lt;/h2&gt;

&lt;p&gt;You need &lt;a href="https://python.org/downloads" rel="noopener noreferrer"&gt;Python 3.9+&lt;/a&gt;, &lt;a href="https://docs.astral.sh/uv/" rel="noopener noreferrer"&gt;uv&lt;/a&gt;, and an &lt;a href="https://platform.openai.com/api-keys" rel="noopener noreferrer"&gt;OpenAI API key&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/aws-samples/sample-why-agents-fail
&lt;span class="nb"&gt;cd &lt;/span&gt;sample-why-agents-fail/stop-ai-agents-wasting-tokens/03-reasoning-loops-demo
uv venv &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; uv pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"your-key-here"&lt;/span&gt;

uv run python test_reasoning_loops.py   &lt;span class="c"&gt;# Runs all 4 scenarios&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Or open &lt;code&gt;test_reasoning_loops.ipynb&lt;/code&gt; in &lt;a href="https://jupyter.org/" rel="noopener noreferrer"&gt;Jupyter&lt;/a&gt;, &lt;a href="https://jupyterlab.readthedocs.io/" rel="noopener noreferrer"&gt;JupyterLab&lt;/a&gt;, VS Code, or your preferred notebook environment.&lt;/p&gt;
&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Ambiguous tool feedback causes organic loops&lt;/strong&gt; — "more results may be available" makes agents retry&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;14 calls vs 2 calls&lt;/strong&gt; — clear SUCCESS states reduce calls by 7x in our demo&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hooks intercept before execution&lt;/strong&gt; — &lt;code&gt;BeforeToolCallEvent.cancel_tool&lt;/code&gt; blocks the call before the tool runs. The &lt;code&gt;DebounceHook&lt;/code&gt; is ~30 lines of code&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hard limits are mandatory&lt;/strong&gt; — every agent needs caps on iterations, time, and spend&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;847 steps at $47/min was documented&lt;/strong&gt; (Particula, community observation) — unbounded agents burn money without delivering answers&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;
  
  
  Frequently Asked Questions
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Why do AI agents repeat the same tool call?
&lt;/h3&gt;

&lt;p&gt;Agents repeat tool calls when tool responses contain ambiguous feedback such as "more results may be available" or "prices change frequently." The LLM interprets these signals as a reason to retry, expecting different or better results. Without clear terminal states (SUCCESS/FAILED), the agent has no way to know the task is complete.&lt;/p&gt;
&lt;h3&gt;
  
  
  What is a DebounceHook and how does it prevent reasoning loops?
&lt;/h3&gt;

&lt;p&gt;A DebounceHook tracks recent tool calls in a sliding window. When the same tool is called with identical parameters more than a set threshold (typically 2 times within a window of 3), the hook blocks the call using &lt;code&gt;event.cancel_tool&lt;/code&gt; before the tool executes. The LLM receives a "BLOCKED: Duplicate call" message and must try a different approach. In Strands Agents, this is about 30 lines of code using the &lt;code&gt;HookProvider&lt;/code&gt; API.&lt;/p&gt;
&lt;h3&gt;
  
  
  How do clear SUCCESS/FAILED states reduce tool calls?
&lt;/h3&gt;

&lt;p&gt;When a tool returns "SUCCESS: Booking HT79265 confirmed," the LLM recognizes the task is complete and stops calling that tool. Ambiguous responses such as "Found 2 flights, more may be available" lack this signal, causing the agent to retry. In our demo, clear states reduced tool calls from 14 to 2, a 7x improvement.&lt;/p&gt;
&lt;h2&gt;
  
  
  References
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Research
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://the-decoder.com/language-models-can-overthink-and-get-stuck-in-endless-thought-loops/" rel="noopener noreferrer"&gt;Language models can overthink&lt;/a&gt; — The Decoder, Jan 2025&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://particula.tech/blog/ai-agent-loops-reasoning-steps-optimization" rel="noopener noreferrer"&gt;How many reasoning steps do AI agents need&lt;/a&gt; — Particula (community observation), Jul 2025&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://codieshub.com/for-ai/prevent-agent-loops-costs" rel="noopener noreferrer"&gt;How to Prevent Infinite Loops and Spiraling Costs&lt;/a&gt; — CodiesHub (community observation), Dec 2025&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Implementation
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://strandsagents.com/docs/user-guide/concepts/agents/hooks/" rel="noopener noreferrer"&gt;Strands Hooks&lt;/a&gt; — Lifecycle event interception and tool cancellation&lt;/li&gt;
&lt;/ul&gt;



&lt;p&gt;Gracias!&lt;/p&gt;

&lt;p&gt;🇻🇪🇨🇱 &lt;a href="https://dev.to/elizabethfuentes12"&gt;Dev.to&lt;/a&gt; &lt;a href="https://www.linkedin.com/in/lizfue/" rel="noopener noreferrer"&gt;Linkedin&lt;/a&gt; &lt;a href="https://github.com/elizabethfuentes12/" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; &lt;a href="https://twitter.com/elizabethfue12" rel="noopener noreferrer"&gt;Twitter&lt;/a&gt; &lt;a href="https://www.instagram.com/elifue.tech" rel="noopener noreferrer"&gt;Instagram&lt;/a&gt; &lt;a href="https://www.youtube.com/channel/UCr0Gnc-t30m4xyrvsQpNp2Q" rel="noopener noreferrer"&gt;Youtube&lt;/a&gt;&lt;/p&gt;


&lt;div class="ltag__user ltag__user__id__717518"&gt;
    &lt;a href="/elizabethfuentes12" class="ltag__user__link profile-image-link"&gt;
      &lt;div class="ltag__user__pic"&gt;
        &lt;img src="https://media2.dev.to/dynamic/image/width=150,height=150,fit=cover,gravity=auto,format=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F717518%2Fb550b165-b8b9-405d-acfb-e5dc846765b0.png" alt="elizabethfuentes12 image"&gt;
      &lt;/div&gt;
    &lt;/a&gt;
  &lt;div class="ltag__user__content"&gt;
    &lt;h2&gt;
&lt;a class="ltag__user__link" href="/elizabethfuentes12"&gt;Elizabeth Fuentes L&lt;/a&gt;Follow
&lt;/h2&gt;
    &lt;div class="ltag__user__summary"&gt;
      &lt;a class="ltag__user__link" href="/elizabethfuentes12"&gt;I help developers build production-ready AI applications through hands-on tutorials and open-source projects.&lt;/a&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;



</description>
      <category>ai</category>
      <category>tutorial</category>
      <category>python</category>
      <category>aws</category>
    </item>
    <item>
      <title>[Boost]</title>
      <dc:creator>Elizabeth Fuentes L</dc:creator>
      <pubDate>Sat, 02 May 2026 20:00:22 +0000</pubDate>
      <link>https://forem.com/elizabethfuentes12/-3lai</link>
      <guid>https://forem.com/elizabethfuentes12/-3lai</guid>
      <description>&lt;div class="ltag__link--embedded"&gt;
  &lt;div class="crayons-story "&gt;
  &lt;a href="https://dev.to/fmarchena/de-bloquear-a-autocorregir-una-demo-practica-de-guardrails-para-agentes-de-ia-con-laravel-grok-y-2djp" class="crayons-story__hidden-navigation-link"&gt;De bloquear a autocorregir: una demo práctica de guardrails para agentes de IA con Laravel, Grok y OpenSpec&lt;/a&gt;


  &lt;div class="crayons-story__body crayons-story__body-full_post"&gt;
    &lt;div class="crayons-story__top"&gt;
      &lt;div class="crayons-story__meta"&gt;
        &lt;div class="crayons-story__author-pic"&gt;

          &lt;a href="/fmarchena" class="crayons-avatar  crayons-avatar--l  "&gt;
            &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F745267%2Fadc94710-89ef-48db-9679-3af59be4f3f6.jpeg" alt="fmarchena profile" class="crayons-avatar__image"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
        &lt;div&gt;
          &lt;div&gt;
            &lt;a href="/fmarchena" class="crayons-story__secondary fw-medium m:hidden"&gt;
              Francisco
            &lt;/a&gt;
            &lt;div class="profile-preview-card relative mb-4 s:mb-0 fw-medium hidden m:inline-block"&gt;
              
                Francisco
                
              
              &lt;div id="story-author-preview-content-3597897" class="profile-preview-card__content crayons-dropdown branded-7 p-4 pt-0"&gt;
                &lt;div class="gap-4 grid"&gt;
                  &lt;div class="-mt-4"&gt;
                    &lt;a href="/fmarchena" class="flex"&gt;
                      &lt;span class="crayons-avatar crayons-avatar--xl mr-2 shrink-0"&gt;
                        &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F745267%2Fadc94710-89ef-48db-9679-3af59be4f3f6.jpeg" class="crayons-avatar__image" alt=""&gt;
                      &lt;/span&gt;
                      &lt;span class="crayons-link crayons-subtitle-2 mt-5"&gt;Francisco&lt;/span&gt;
                    &lt;/a&gt;
                  &lt;/div&gt;
                  &lt;div class="print-hidden"&gt;
                    
                      Follow
                    
                  &lt;/div&gt;
                  &lt;div class="author-preview-metadata-container"&gt;&lt;/div&gt;
                &lt;/div&gt;
              &lt;/div&gt;
            &lt;/div&gt;

          &lt;/div&gt;
          &lt;a href="https://dev.to/fmarchena/de-bloquear-a-autocorregir-una-demo-practica-de-guardrails-para-agentes-de-ia-con-laravel-grok-y-2djp" class="crayons-story__tertiary fs-xs"&gt;&lt;time&gt;May 1&lt;/time&gt;&lt;span class="time-ago-indicator-initial-placeholder"&gt;&lt;/span&gt;&lt;/a&gt;
        &lt;/div&gt;
      &lt;/div&gt;

    &lt;/div&gt;

    &lt;div class="crayons-story__indention"&gt;
      &lt;h2 class="crayons-story__title crayons-story__title-full_post"&gt;
        &lt;a href="https://dev.to/fmarchena/de-bloquear-a-autocorregir-una-demo-practica-de-guardrails-para-agentes-de-ia-con-laravel-grok-y-2djp" id="article-link-3597897"&gt;
          De bloquear a autocorregir: una demo práctica de guardrails para agentes de IA con Laravel, Grok y OpenSpec
        &lt;/a&gt;
      &lt;/h2&gt;
        &lt;div class="crayons-story__tags"&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/laravel"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;laravel&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/ai"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;ai&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/openspec"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;openspec&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/guardrails"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;guardrails&lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="crayons-story__bottom"&gt;
        &lt;div class="crayons-story__details"&gt;
          &lt;a href="https://dev.to/fmarchena/de-bloquear-a-autocorregir-una-demo-practica-de-guardrails-para-agentes-de-ia-con-laravel-grok-y-2djp" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left"&gt;
            &lt;div class="multiple_reactions_aggregate"&gt;
              &lt;span class="multiple_reactions_icons_container"&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/exploding-head-daceb38d627e6ae9b730f36a1e390fca556a4289d5a41abb2c35068ad3e2c4b5.svg" width="18" height="18"&gt;
                  &lt;/span&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/multi-unicorn-b44d6f8c23cdd00964192bedc38af3e82463978aa611b4365bd33a0f1f4f3e97.svg" width="18" height="18"&gt;
                  &lt;/span&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/sparkle-heart-5f9bee3767e18deb1bb725290cb151c25234768a0e9a2bd39370c382d02920cf.svg" width="18" height="18"&gt;
                  &lt;/span&gt;
              &lt;/span&gt;
              &lt;span class="aggregate_reactions_counter"&gt;6&lt;span class="hidden s:inline"&gt; reactions&lt;/span&gt;&lt;/span&gt;
            &lt;/div&gt;
          &lt;/a&gt;
            &lt;a href="https://dev.to/fmarchena/de-bloquear-a-autocorregir-una-demo-practica-de-guardrails-para-agentes-de-ia-con-laravel-grok-y-2djp#comments" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left flex items-center"&gt;
              Comments


              1&lt;span class="hidden s:inline"&gt; comment&lt;/span&gt;
            &lt;/a&gt;
        &lt;/div&gt;
        &lt;div class="crayons-story__save"&gt;
          &lt;small class="crayons-story__tertiary fs-xs mr-2"&gt;
            9 min read
          &lt;/small&gt;
            
              &lt;span class="bm-initial"&gt;
                

              &lt;/span&gt;
              &lt;span class="bm-success"&gt;
                

              &lt;/span&gt;
            
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;


</description>
    </item>
    <item>
      <title>Fix MCP Timeouts: Async HandleId Pattern</title>
      <dc:creator>Elizabeth Fuentes L</dc:creator>
      <pubDate>Thu, 30 Apr 2026 19:28:53 +0000</pubDate>
      <link>https://forem.com/aws/fix-mcp-timeouts-async-handleid-pattern-8ek</link>
      <guid>https://forem.com/aws/fix-mcp-timeouts-async-handleid-pattern-8ek</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;MCP tools freeze AI agents when external APIs are slow, causing 424 errors. The async handleId pattern returns immediately with a job ID and polls for results without blocking.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;MCP tool timeout&lt;/strong&gt; occurs when an AI agent calls a Model Context Protocol (MCP) tool that depends on a slow external API. The tool blocks the agent indefinitely instead of returning an error. The result is a 424 (Failed Dependency) error or a frozen workflow with no user feedback. This post shows the problem with real scenarios and how the async handleId pattern provides immediate responses.&lt;/p&gt;

&lt;p&gt;This demo uses &lt;a href="https://strandsagents.com" rel="noopener noreferrer"&gt;Strands Agents&lt;/a&gt; with &lt;a href="https://strandsagents.com/docs/user-guide/concepts/tools/mcp-tools/" rel="noopener noreferrer"&gt;MCP (Model Context Protocol)&lt;/a&gt;. The async pattern is framework-agnostic and applies to any agent that calls external APIs through MCP.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Working code:&lt;/strong&gt; &lt;a href="https://github.com/aws-samples/sample-why-agents-fail/tree/main/stop-ai-agents-wasting-tokens/02-mcp-timeout-demo" rel="noopener noreferrer"&gt;github.com/aws-samples/sample-why-agents-fail&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Series: Why AI Agents Fail
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/aws-samples/sample-why-agents-fail/tree/main/stop-ai-agents-wasting-tokens/01-context-overflow-demo" rel="noopener noreferrer"&gt;Context Window Overflow&lt;/a&gt;&lt;/strong&gt; — Memory Pointer Pattern for large data&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MCP Tools That Never Respond&lt;/strong&gt; (this post) — Async pattern for slow external APIs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/aws-samples/sample-why-agents-fail/tree/main/stop-ai-agents-wasting-tokens/03-reasoning-loops-demo" rel="noopener noreferrer"&gt;AI Agent Reasoning Loops&lt;/a&gt;&lt;/strong&gt; — Detect and block repeated tool calls&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The Problem: MCP Tools That Never Respond
&lt;/h2&gt;

&lt;p&gt;The Model Context Protocol (MCP) enables AI agents to call external tools. But when those tools depend on slow APIs, the entire agent workflow freezes. The agent waits. The user waits. Nothing happens.&lt;/p&gt;

&lt;p&gt;Community observation from Octopus (&lt;a href="https://octopus.com/blog/mcp-timeout-retry" rel="noopener noreferrer"&gt;Resilient AI Agents With MCP, 2025&lt;/a&gt;) identifies the core issue: as external system integrations increase, so does the likelihood of failure. Systems become unavailable, slow to respond, or return errors. Agents have no built-in strategy to handle this.&lt;/p&gt;

&lt;p&gt;OpenAI Community reports confirm the real-world impact:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://community.openai.com/t/call-remote-mcp-server-tool-timed-out-resulting-in-error-424/1364167" rel="noopener noreferrer"&gt;424 errors&lt;/a&gt; when MCP tools take too long&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://community.openai.com/t/mcp-tool-hangs-indefinitely/1369341" rel="noopener noreferrer"&gt;Unresponsive states&lt;/a&gt; where requests neither succeed nor fail&lt;/li&gt;
&lt;li&gt;Tools that pass handshake validation but timeout during execution&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why This Happens
&lt;/h2&gt;

&lt;p&gt;MCP expects tools to respond quickly. When a tool calls a slow external API.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhhpqoi10dvwofqbm2wu9.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhhpqoi10dvwofqbm2wu9.jpg" alt=" " width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The MCP protocol has implicit timeout expectations. If the tool doesn't respond within ~7-10 seconds, the connection may drop with a 424 (Failed Dependency) error. The agent receives an error instead of data, and the user gets no useful response.&lt;/p&gt;

&lt;p&gt;Three failure modes:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Slow API&lt;/strong&gt; — Tool waits 15+ seconds, poor UX but eventually responds&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Failing API&lt;/strong&gt; — External service unavailable, 424 error after timeout&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Unresponsive state&lt;/strong&gt; — Request accepted but never returns, requires session restart&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The Demo: Simulating Real Timeout Scenarios
&lt;/h2&gt;

&lt;p&gt;We built an MCP server that simulates these real-world scenarios:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;mcp.server&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FastMCP&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;

&lt;span class="c1"&gt;# FastMCP is a lightweight MCP server framework — tools are registered with @mcp.tool()
&lt;/span&gt;&lt;span class="n"&gt;mcp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;FastMCP&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Timeout Demo Server&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Baseline: responds in 1s, well within MCP's implicit timeout threshold (~7-10s)
&lt;/span&gt;&lt;span class="nd"&gt;@mcp.tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Fast API - responds in 1 second&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;fast_api&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Fast result for: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="c1"&gt;# Problem case: 15s delay exceeds MCP timeout — the agent freezes waiting for this
&lt;/span&gt;&lt;span class="nd"&gt;@mcp.tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Slow API - responds in 15 seconds&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;slow_api&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Simulates a slow external service (data pipeline, batch job)
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Slow result for: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="c1"&gt;# Failure case: 7s delay triggers the timeout, then raises Failed Dependency (424)
&lt;/span&gt;&lt;span class="nd"&gt;@mcp.tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Failing API - returns 424 after delay&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;failing_api&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;Exception&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Failed Dependency: External service unavailable&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  The Async HandleId Solution
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpgtgq97p79untwdxl1ec.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpgtgq97p79untwdxl1ec.png" alt="Comparison of synchronous MCP tool call blocked for 17.2 seconds versus async handleId pattern completing in 1.7 seconds" width="800" height="397"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Instead of waiting for slow operations, return immediately with a tracking ID:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;uuid&lt;/span&gt;

&lt;span class="c1"&gt;# In-memory job store: maps job_id → {status, query, result}
# For production, replace with a persistent store (Redis, DynamoDB) for durability across restarts
&lt;/span&gt;&lt;span class="n"&gt;JOBS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;

&lt;span class="c1"&gt;# The handleId pattern: return a tracking ID immediately instead of blocking
&lt;/span&gt;&lt;span class="nd"&gt;@mcp.tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Start a long-running job, returns immediately with job ID&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;start_async_job&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;job_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;uuid&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;uuid4&lt;/span&gt;&lt;span class="p"&gt;())[:&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;  &lt;span class="c1"&gt;# Short ID the LLM can pass in follow-up calls
&lt;/span&gt;    &lt;span class="n"&gt;JOBS&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;job_id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;processing&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c1"&gt;# Fire-and-forget: slow work runs in background, tool returns before it finishes
&lt;/span&gt;    &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;do_work&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;job_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="c1"&gt;# The agent receives this in &amp;lt; 1s — no timeout, no frozen UI
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Job started: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;job_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;. Use check_job_status to poll for results.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="c1"&gt;# Polling endpoint: the agent calls this repeatedly until status is "completed"
&lt;/span&gt;&lt;span class="nd"&gt;@mcp.tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Check status of a running job&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;check_job_status&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;job_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;job&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;JOBS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;job_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Job &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;job_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; not found&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;completed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;COMPLETED: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;result&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;  &lt;span class="c1"&gt;# Return the actual result to the agent
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;PROCESSING: Job &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;job_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; still running&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;  &lt;span class="c1"&gt;# Agent polls again after a short wait
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h2&gt;
  
  
  Demo Results
&lt;/h2&gt;

&lt;p&gt;We tested all four scenarios with a Strands Agent connected to the MCP server:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;Response Time&lt;/th&gt;
&lt;th&gt;User Experience&lt;/th&gt;
&lt;th&gt;Research Finding&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Fast API&lt;/strong&gt; (1s delay)&lt;/td&gt;
&lt;td&gt;3.2s total&lt;/td&gt;
&lt;td&gt;✅ Good UX&lt;/td&gt;
&lt;td&gt;Baseline&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Slow API&lt;/strong&gt; (15s delay)&lt;/td&gt;
&lt;td&gt;17.8s total&lt;/td&gt;
&lt;td&gt;❌ Poor UX — agent waits&lt;/td&gt;
&lt;td&gt;Octopus: "agent waits indefinitely"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Failing API&lt;/strong&gt; (424)&lt;/td&gt;
&lt;td&gt;7.7s total&lt;/td&gt;
&lt;td&gt;❌ Error after wait&lt;/td&gt;
&lt;td&gt;OpenAI Community: 424 errors&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Async pattern&lt;/strong&gt; (handleId)&lt;/td&gt;
&lt;td&gt;3.7s total&lt;/td&gt;
&lt;td&gt;✅ Immediate response&lt;/td&gt;
&lt;td&gt;Solution: "respond ASAP with handleId"&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo65kxfg90v71xspu1lfu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo65kxfg90v71xspu1lfu.png" alt="Bar chart comparing MCP tool response times across fast API, slow API, failing API, and async handleId scenarios" width="800" height="490"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The async pattern transforms a 17.8s wait into a 3.7s immediate response. The agent tells the user "job started" and can check status later, with no frozen UI and no timeout errors.&lt;/p&gt;
&lt;h2&gt;
  
  
  Why Strands Agents for MCP Integration?
&lt;/h2&gt;

&lt;p&gt;The &lt;a href="https://strandsagents.com/docs/user-guide/concepts/tools/mcp-tools/" rel="noopener noreferrer"&gt;&lt;code&gt;MCPClient&lt;/code&gt;&lt;/a&gt; connects to any MCP server in two lines. The agent discovers available tools at runtime through &lt;code&gt;list_tools_sync()&lt;/code&gt;, so you don't maintain a hardcoded tool list. When the MCP server implements the async handleId pattern, the agent polls automatically without extra orchestration code.&lt;/p&gt;

&lt;p&gt;Strands supports multiple &lt;a href="https://strandsagents.com/docs/user-guide/concepts/model-providers/" rel="noopener noreferrer"&gt;model providers&lt;/a&gt; (OpenAI, Amazon Bedrock, Anthropic, Ollama). The MCP timeout patterns shown here work identically across all providers.&lt;/p&gt;
&lt;h2&gt;
  
  
  When to Use Each Pattern
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Direct call&lt;/strong&gt; (fast tools &amp;lt; 5s):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Lookups, calculations, small API calls&lt;/li&gt;
&lt;li&gt;No timeout risk&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Async handleId&lt;/strong&gt; (slow tools &amp;gt; 5s):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;External API calls with unpredictable latency&lt;/li&gt;
&lt;li&gt;Data processing, report generation&lt;/li&gt;
&lt;li&gt;Any operation that might exceed MCP timeout&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Retry with backoff&lt;/strong&gt; (intermittent failures):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Services that occasionally fail but recover&lt;/li&gt;
&lt;li&gt;Network-dependent operations&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Try It Yourself
&lt;/h2&gt;

&lt;p&gt;You need &lt;a href="https://python.org/downloads" rel="noopener noreferrer"&gt;Python 3.9+&lt;/a&gt;, &lt;a href="https://docs.astral.sh/uv/" rel="noopener noreferrer"&gt;uv&lt;/a&gt;, and an &lt;a href="https://platform.openai.com/api-keys" rel="noopener noreferrer"&gt;OpenAI API key&lt;/a&gt;. The MCP server runs locally as a subprocess, so no external services are needed.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/aws-samples/sample-why-agents-fail
&lt;span class="nb"&gt;cd &lt;/span&gt;sample-why-agents-fail/stop-ai-agents-wasting-tokens/02-mcp-timeout-demo
uv venv &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; uv pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"your-key-here"&lt;/span&gt;

uv run python test_mcp_timeout.py   &lt;span class="c"&gt;# Runs all 4 scenarios&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Or open &lt;code&gt;test_mcp_timeout.ipynb&lt;/code&gt; in &lt;a href="https://jupyter.org/" rel="noopener noreferrer"&gt;Jupyter&lt;/a&gt;, &lt;a href="https://jupyterlab.readthedocs.io/" rel="noopener noreferrer"&gt;JupyterLab&lt;/a&gt;, VS Code, or your preferred notebook environment.&lt;/p&gt;
&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;MCP tools timeout silently&lt;/strong&gt; — 424 errors with no recovery&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Slow APIs freeze the entire agent&lt;/strong&gt; — 17.8s wait with no feedback&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Async handleId pattern solves it&lt;/strong&gt; — immediate response, poll for results&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Design for failure&lt;/strong&gt; — every external call can timeout, plan accordingly&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;
  
  
  Frequently Asked Questions
&lt;/h2&gt;
&lt;h3&gt;
  
  
  What causes 424 errors in MCP tool calls?
&lt;/h3&gt;

&lt;p&gt;A 424 (Failed Dependency) error occurs when an MCP tool takes longer than the implicit timeout threshold (typically 7-10 seconds) to respond. The MCP protocol expects tools to return quickly. When an external API blocks the tool beyond this threshold, the connection drops and the agent receives a 424 error instead of data.&lt;/p&gt;
&lt;h3&gt;
  
  
  When should I use the async handleId pattern instead of a direct MCP tool call?
&lt;/h3&gt;

&lt;p&gt;Use the async handleId pattern for any tool that calls an external API with unpredictable latency: data processing, report generation, third-party service calls, or any operation that might exceed 5 seconds. For fast lookups, calculations, and small API calls under 5 seconds, direct calls work fine.&lt;/p&gt;
&lt;h3&gt;
  
  
  Does the async handleId pattern work with any MCP server, not only Strands?
&lt;/h3&gt;

&lt;p&gt;Yes. The async handleId pattern is an MCP server design pattern, not a framework feature. Any MCP-compatible agent can call &lt;code&gt;start_long_job&lt;/code&gt; and &lt;code&gt;check_job_status&lt;/code&gt; tools. The pattern works with OpenAI Agents, LangChain MCP integrations, and any client that supports the Model Context Protocol.&lt;/p&gt;
&lt;h2&gt;
  
  
  References
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Research
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://octopus.com/blog/mcp-timeout-retry" rel="noopener noreferrer"&gt;Resilient AI Agents With MCP: Timeout And Retry Strategies&lt;/a&gt; — Octopus blog (community observation), May 2025&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://community.openai.com/t/call-remote-mcp-server-tool-timed-out-resulting-in-error-424/1364167" rel="noopener noreferrer"&gt;Call remote MCP server tool timed out, error 424&lt;/a&gt; — OpenAI Community (community forum)&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://community.openai.com/t/handling-timeouts-with-long-running-mcp-connectors-vertex-ai-agent/1369341" rel="noopener noreferrer"&gt;Handling Timeouts with Long-Running MCP Connectors&lt;/a&gt; — OpenAI Community (community forum), Dec 2025&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.arsturn.com/blog/no-more-timeouts-how-to-build-long-running-mcp-tools-that-actually-finish-the-job" rel="noopener noreferrer"&gt;Build Timeout-Proof MCP Tools&lt;/a&gt; — Arsturn (community observation)&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Implementation
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://strandsagents.com/docs/user-guide/concepts/tools/mcp-tools/" rel="noopener noreferrer"&gt;Strands MCP Tools&lt;/a&gt; — Connect any MCP server&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://strandsagents.com/docs/user-guide/concepts/model-providers/" rel="noopener noreferrer"&gt;Strands Model Providers&lt;/a&gt; — Swap to Amazon Bedrock, Anthropic, Ollama&lt;/li&gt;
&lt;/ul&gt;



&lt;p&gt;Gracias!&lt;/p&gt;

&lt;p&gt;🇻🇪🇨🇱 &lt;a href="https://dev.to/elizabethfuentes12"&gt;Dev.to&lt;/a&gt; &lt;a href="https://www.linkedin.com/in/lizfue/" rel="noopener noreferrer"&gt;Linkedin&lt;/a&gt; &lt;a href="https://github.com/elizabethfuentes12/" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; &lt;a href="https://twitter.com/elizabethfue12" rel="noopener noreferrer"&gt;Twitter&lt;/a&gt; &lt;a href="https://www.instagram.com/elifue.tech" rel="noopener noreferrer"&gt;Instagram&lt;/a&gt; &lt;a href="https://www.youtube.com/channel/UCr0Gnc-t30m4xyrvsQpNp2Q" rel="noopener noreferrer"&gt;Youtube&lt;/a&gt;&lt;/p&gt;


&lt;div class="ltag__user ltag__user__id__717518"&gt;
    &lt;a href="/elizabethfuentes12" class="ltag__user__link profile-image-link"&gt;
      &lt;div class="ltag__user__pic"&gt;
        &lt;img src="https://media2.dev.to/dynamic/image/width=150,height=150,fit=cover,gravity=auto,format=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F717518%2Fb550b165-b8b9-405d-acfb-e5dc846765b0.png" alt="elizabethfuentes12 image"&gt;
      &lt;/div&gt;
    &lt;/a&gt;
  &lt;div class="ltag__user__content"&gt;
    &lt;h2&gt;
&lt;a class="ltag__user__link" href="/elizabethfuentes12"&gt;Elizabeth Fuentes L&lt;/a&gt;Follow
&lt;/h2&gt;
    &lt;div class="ltag__user__summary"&gt;
      &lt;a class="ltag__user__link" href="/elizabethfuentes12"&gt;I help developers build production-ready AI applications through hands-on tutorials and open-source projects.&lt;/a&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;



</description>
      <category>ai</category>
      <category>tutorial</category>
      <category>python</category>
      <category>llm</category>
    </item>
    <item>
      <title>[Boost]</title>
      <dc:creator>Elizabeth Fuentes L</dc:creator>
      <pubDate>Sat, 18 Apr 2026 05:49:59 +0000</pubDate>
      <link>https://forem.com/elizabethfuentes12/-39pg</link>
      <guid>https://forem.com/elizabethfuentes12/-39pg</guid>
      <description>&lt;div class="ltag__link--embedded"&gt;
  &lt;div class="crayons-story "&gt;
  &lt;a href="https://dev.to/aws/5-techniques-to-stop-ai-agent-hallucinations-in-production-oik" class="crayons-story__hidden-navigation-link"&gt;5 Techniques to Stop AI Agent Hallucinations in Production&lt;/a&gt;


  &lt;div class="crayons-story__body crayons-story__body-full_post"&gt;
    &lt;div class="crayons-story__top"&gt;
      &lt;div class="crayons-story__meta"&gt;
        &lt;div class="crayons-story__author-pic"&gt;
          &lt;a class="crayons-logo crayons-logo--l" href="/aws"&gt;
            &lt;img alt="AWS logo" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Forganization%2Fprofile_image%2F1726%2F2a73f1e6-7995-4348-ae37-44b064274c59.png" class="crayons-logo__image" width="320" height="320"&gt;
          &lt;/a&gt;

          &lt;a href="/elizabethfuentes12" class="crayons-avatar  crayons-avatar--s absolute -right-2 -bottom-2 border-solid border-2 border-base-inverted  "&gt;
            &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F717518%2Fb550b165-b8b9-405d-acfb-e5dc846765b0.png" alt="elizabethfuentes12 profile" class="crayons-avatar__image" width="420" height="420"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
        &lt;div&gt;
          &lt;div&gt;
            &lt;a href="/elizabethfuentes12" class="crayons-story__secondary fw-medium m:hidden"&gt;
              Elizabeth Fuentes L
            &lt;/a&gt;
            &lt;div class="profile-preview-card relative mb-4 s:mb-0 fw-medium hidden m:inline-block"&gt;
              
                Elizabeth Fuentes L
                
              
              &lt;div id="story-author-preview-content-3433117" class="profile-preview-card__content crayons-dropdown branded-7 p-4 pt-0"&gt;
                &lt;div class="gap-4 grid"&gt;
                  &lt;div class="-mt-4"&gt;
                    &lt;a href="/elizabethfuentes12" class="flex"&gt;
                      &lt;span class="crayons-avatar crayons-avatar--xl mr-2 shrink-0"&gt;
                        &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F717518%2Fb550b165-b8b9-405d-acfb-e5dc846765b0.png" class="crayons-avatar__image" alt="" width="420" height="420"&gt;
                      &lt;/span&gt;
                      &lt;span class="crayons-link crayons-subtitle-2 mt-5"&gt;Elizabeth Fuentes L&lt;/span&gt;
                    &lt;/a&gt;
                  &lt;/div&gt;
                  &lt;div class="print-hidden"&gt;
                    
                      Follow
                    
                  &lt;/div&gt;
                  &lt;div class="author-preview-metadata-container"&gt;&lt;/div&gt;
                &lt;/div&gt;
              &lt;/div&gt;
            &lt;/div&gt;

            &lt;span&gt;
              &lt;span class="crayons-story__tertiary fw-normal"&gt; for &lt;/span&gt;&lt;a href="/aws" class="crayons-story__secondary fw-medium"&gt;AWS&lt;/a&gt;
            &lt;/span&gt;
          &lt;/div&gt;
          &lt;a href="https://dev.to/aws/5-techniques-to-stop-ai-agent-hallucinations-in-production-oik" class="crayons-story__tertiary fs-xs"&gt;&lt;time&gt;Mar 30&lt;/time&gt;&lt;span class="time-ago-indicator-initial-placeholder"&gt;&lt;/span&gt;&lt;/a&gt;
        &lt;/div&gt;
      &lt;/div&gt;

    &lt;/div&gt;

    &lt;div class="crayons-story__indention"&gt;
      &lt;h2 class="crayons-story__title crayons-story__title-full_post"&gt;
        &lt;a href="https://dev.to/aws/5-techniques-to-stop-ai-agent-hallucinations-in-production-oik" id="article-link-3433117"&gt;
          5 Techniques to Stop AI Agent Hallucinations in Production
        &lt;/a&gt;
      &lt;/h2&gt;
        &lt;div class="crayons-story__tags"&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/ai"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;ai&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/aws"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;aws&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/python"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;python&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/machinelearning"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;machinelearning&lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="crayons-story__bottom"&gt;
        &lt;div class="crayons-story__details"&gt;
          &lt;a href="https://dev.to/aws/5-techniques-to-stop-ai-agent-hallucinations-in-production-oik" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left"&gt;
            &lt;div class="multiple_reactions_aggregate"&gt;
              &lt;span class="multiple_reactions_icons_container"&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/multi-unicorn-b44d6f8c23cdd00964192bedc38af3e82463978aa611b4365bd33a0f1f4f3e97.svg" width="24" height="24"&gt;
                  &lt;/span&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/fire-f60e7a582391810302117f987b22a8ef04a2fe0df7e3258a5f49332df1cec71e.svg" width="24" height="24"&gt;
                  &lt;/span&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/sparkle-heart-5f9bee3767e18deb1bb725290cb151c25234768a0e9a2bd39370c382d02920cf.svg" width="24" height="24"&gt;
                  &lt;/span&gt;
              &lt;/span&gt;
              &lt;span class="aggregate_reactions_counter"&gt;45&lt;span class="hidden s:inline"&gt; reactions&lt;/span&gt;&lt;/span&gt;
            &lt;/div&gt;
          &lt;/a&gt;
            &lt;a href="https://dev.to/aws/5-techniques-to-stop-ai-agent-hallucinations-in-production-oik#comments" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left flex items-center"&gt;
              Comments


              4&lt;span class="hidden s:inline"&gt; comments&lt;/span&gt;
            &lt;/a&gt;
        &lt;/div&gt;
        &lt;div class="crayons-story__save"&gt;
          &lt;small class="crayons-story__tertiary fs-xs mr-2"&gt;
            16 min read
          &lt;/small&gt;
            
              &lt;span class="bm-initial"&gt;
                

              &lt;/span&gt;
              &lt;span class="bm-success"&gt;
                

              &lt;/span&gt;
            
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;


</description>
    </item>
    <item>
      <title>[Boost]</title>
      <dc:creator>Elizabeth Fuentes L</dc:creator>
      <pubDate>Thu, 16 Apr 2026 17:34:31 +0000</pubDate>
      <link>https://forem.com/elizabethfuentes12/-h65</link>
      <guid>https://forem.com/elizabethfuentes12/-h65</guid>
      <description>&lt;div class="ltag__link--embedded"&gt;
  &lt;div class="crayons-story "&gt;
  &lt;a href="https://dev.to/aws/ai-context-window-overflow-memory-pointer-fix-3akc" class="crayons-story__hidden-navigation-link"&gt;AI Context Window Overflow: Memory Pointer Fix&lt;/a&gt;


  &lt;div class="crayons-story__body crayons-story__body-full_post"&gt;
    &lt;div class="crayons-story__top"&gt;
      &lt;div class="crayons-story__meta"&gt;
        &lt;div class="crayons-story__author-pic"&gt;
          &lt;a class="crayons-logo crayons-logo--l" href="/aws"&gt;
            &lt;img alt="AWS logo" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Forganization%2Fprofile_image%2F1726%2F2a73f1e6-7995-4348-ae37-44b064274c59.png" class="crayons-logo__image" width="320" height="320"&gt;
          &lt;/a&gt;

          &lt;a href="/elizabethfuentes12" class="crayons-avatar  crayons-avatar--s absolute -right-2 -bottom-2 border-solid border-2 border-base-inverted  "&gt;
            &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F717518%2Fb550b165-b8b9-405d-acfb-e5dc846765b0.png" alt="elizabethfuentes12 profile" class="crayons-avatar__image" width="420" height="420"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
        &lt;div&gt;
          &lt;div&gt;
            &lt;a href="/elizabethfuentes12" class="crayons-story__secondary fw-medium m:hidden"&gt;
              Elizabeth Fuentes L
            &lt;/a&gt;
            &lt;div class="profile-preview-card relative mb-4 s:mb-0 fw-medium hidden m:inline-block"&gt;
              
                Elizabeth Fuentes L
                
              
              &lt;div id="story-author-preview-content-3496579" class="profile-preview-card__content crayons-dropdown branded-7 p-4 pt-0"&gt;
                &lt;div class="gap-4 grid"&gt;
                  &lt;div class="-mt-4"&gt;
                    &lt;a href="/elizabethfuentes12" class="flex"&gt;
                      &lt;span class="crayons-avatar crayons-avatar--xl mr-2 shrink-0"&gt;
                        &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F717518%2Fb550b165-b8b9-405d-acfb-e5dc846765b0.png" class="crayons-avatar__image" alt="" width="420" height="420"&gt;
                      &lt;/span&gt;
                      &lt;span class="crayons-link crayons-subtitle-2 mt-5"&gt;Elizabeth Fuentes L&lt;/span&gt;
                    &lt;/a&gt;
                  &lt;/div&gt;
                  &lt;div class="print-hidden"&gt;
                    
                      Follow
                    
                  &lt;/div&gt;
                  &lt;div class="author-preview-metadata-container"&gt;&lt;/div&gt;
                &lt;/div&gt;
              &lt;/div&gt;
            &lt;/div&gt;

            &lt;span&gt;
              &lt;span class="crayons-story__tertiary fw-normal"&gt; for &lt;/span&gt;&lt;a href="/aws" class="crayons-story__secondary fw-medium"&gt;AWS&lt;/a&gt;
            &lt;/span&gt;
          &lt;/div&gt;
          &lt;a href="https://dev.to/aws/ai-context-window-overflow-memory-pointer-fix-3akc" class="crayons-story__tertiary fs-xs"&gt;&lt;time&gt;Apr 13&lt;/time&gt;&lt;span class="time-ago-indicator-initial-placeholder"&gt;&lt;/span&gt;&lt;/a&gt;
        &lt;/div&gt;
      &lt;/div&gt;

    &lt;/div&gt;

    &lt;div class="crayons-story__indention"&gt;
      &lt;h2 class="crayons-story__title crayons-story__title-full_post"&gt;
        &lt;a href="https://dev.to/aws/ai-context-window-overflow-memory-pointer-fix-3akc" id="article-link-3496579"&gt;
          AI Context Window Overflow: Memory Pointer Fix
        &lt;/a&gt;
      &lt;/h2&gt;
        &lt;div class="crayons-story__tags"&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/ai"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;ai&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/tutorial"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;tutorial&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/python"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;python&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/aws"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;aws&lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="crayons-story__bottom"&gt;
        &lt;div class="crayons-story__details"&gt;
          &lt;a href="https://dev.to/aws/ai-context-window-overflow-memory-pointer-fix-3akc" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left"&gt;
            &lt;div class="multiple_reactions_aggregate"&gt;
              &lt;span class="multiple_reactions_icons_container"&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/multi-unicorn-b44d6f8c23cdd00964192bedc38af3e82463978aa611b4365bd33a0f1f4f3e97.svg" width="24" height="24"&gt;
                  &lt;/span&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/fire-f60e7a582391810302117f987b22a8ef04a2fe0df7e3258a5f49332df1cec71e.svg" width="24" height="24"&gt;
                  &lt;/span&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/sparkle-heart-5f9bee3767e18deb1bb725290cb151c25234768a0e9a2bd39370c382d02920cf.svg" width="24" height="24"&gt;
                  &lt;/span&gt;
              &lt;/span&gt;
              &lt;span class="aggregate_reactions_counter"&gt;34&lt;span class="hidden s:inline"&gt; reactions&lt;/span&gt;&lt;/span&gt;
            &lt;/div&gt;
          &lt;/a&gt;
            &lt;a href="https://dev.to/aws/ai-context-window-overflow-memory-pointer-fix-3akc#comments" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left flex items-center"&gt;
              Comments


              1&lt;span class="hidden s:inline"&gt; comment&lt;/span&gt;
            &lt;/a&gt;
        &lt;/div&gt;
        &lt;div class="crayons-story__save"&gt;
          &lt;small class="crayons-story__tertiary fs-xs mr-2"&gt;
            10 min read
          &lt;/small&gt;
            
              &lt;span class="bm-initial"&gt;
                

              &lt;/span&gt;
              &lt;span class="bm-success"&gt;
                

              &lt;/span&gt;
            
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;


</description>
    </item>
  </channel>
</rss>
