<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Geri Máté</title>
    <description>The latest articles on Forem by Geri Máté (@gerimate).</description>
    <link>https://forem.com/gerimate</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1079609%2Feb9cc87f-0fcf-4df9-bf8e-5ce57cfc4060.png</url>
      <title>Forem: Geri Máté</title>
      <link>https://forem.com/gerimate</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/gerimate"/>
    <language>en</language>
    <item>
      <title>We're running our first hackathon: Build with VectorAI DB, win Claude subscriptions</title>
      <dc:creator>Geri Máté</dc:creator>
      <pubDate>Thu, 09 Apr 2026 09:39:39 +0000</pubDate>
      <link>https://forem.com/gerimate/were-running-our-first-hackathon-build-with-vectorai-db-win-claude-subscriptions-2f0c</link>
      <guid>https://forem.com/gerimate/were-running-our-first-hackathon-build-with-vectorai-db-win-claude-subscriptions-2f0c</guid>
      <description>&lt;p&gt;The Actian VectorAI DB Build Challenge is our first community hackathon, and we want to see what you build. Solo or team, beginner or experienced, local or cloud. If you've been looking for a reason to actually ship something with a vector database, this is it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;April 13-18, 2026 | Virtual | &lt;a href="https://dorahacks.io/hackathon/2097/detail" rel="noopener noreferrer"&gt;Register on DoraHacks&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  What you're building
&lt;/h3&gt;

&lt;p&gt;An AI application that solves a real, tangible problem using Actian VectorAI DB. It can run on your laptop, on a server, in the cloud, wherever. The only rule: VectorAI DB has to be a core part of your stack, not something you bolted on at the end.&lt;/p&gt;

&lt;p&gt;Your project also needs to go beyond basic similarity search. Pick at least one of these:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hybrid Fusion&lt;/strong&gt; - combine multiple search signals into one ranked result. Not just meaning, not just keywords. Both, fused together.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;What that looks like in practice:&lt;/em&gt; A job board that ranks candidates by semantic fit ("backend engineer who gets distributed systems") AND keyword match ("Golang, Kubernetes") merged into one list using RRF or DBSF.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Filtered Search&lt;/strong&gt; - pair vector search with structured filters on your data so results are actually useful, not just semantically close.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;What that looks like in practice:&lt;/em&gt; A campus event finder that understands what you're looking for but also filters by date, location, and student org. So you're finding events you can go to, not just events that sound similar.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Named Vectors / Multimodal&lt;/strong&gt; - store and search across different data types in the same collection. Text, images, audio, whatever fits your idea.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;What that looks like in practice:&lt;/em&gt; A study tool where you search your notes by typing a question or uploading a diagram. Both hit the same knowledge base, just through different vector spaces.&lt;/p&gt;

&lt;p&gt;Bonus points for running locally, on ARM, or offline. No fixed weight, judges' call.&lt;/p&gt;




&lt;h3&gt;
  
  
  Not sure what to build?
&lt;/h3&gt;

&lt;p&gt;Some starting points, but don't let these limit you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A RAG app over any dataset you actually care about (research papers, course notes, documentation, news)&lt;/li&gt;
&lt;li&gt;A semantic search tool with smart filters (campus events, job listings, study materials)&lt;/li&gt;
&lt;li&gt;A recommendation engine that combines meaning and metadata&lt;/li&gt;
&lt;li&gt;An anomaly detection or monitoring system&lt;/li&gt;
&lt;li&gt;An AI agent with vector-powered memory&lt;/li&gt;
&lt;li&gt;A multimodal search tool across text and images&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Getting started
&lt;/h3&gt;

&lt;p&gt;The database runs in Docker and works natively on Mac (including Apple Silicon), Linux, and Windows. No Rosetta, no platform flags needed.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Clone the repo and start the database&lt;/span&gt;
docker compose up

&lt;span class="c"&gt;# Install the Python client&lt;/span&gt;
pip &lt;span class="nb"&gt;install &lt;/span&gt;actian-vectorai
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Not sure where to begin? Start with the featured RAG example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; examples/rag/requirements.txt
python examples/rag/rag_example.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It walks you through building a complete retrieval-augmented generation app from scratch. You'll have something running in under 10 minutes.&lt;/p&gt;

&lt;p&gt;VectorAI DB handles storage and search. You bring your own embedding model. A good default to start with is &lt;code&gt;sentence-transformers/all-MiniLM-L6-v2&lt;/code&gt;, fast, lightweight, and works well for most text use cases.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;sentence-transformers
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For the full API docs and more examples, check the repo README linked in Discord.&lt;/p&gt;




&lt;h3&gt;
  
  
  Prizes
&lt;/h3&gt;

&lt;p&gt;🥇 1st place team: Claude Max 5x, 3 months per person&lt;/p&gt;

&lt;p&gt;🥈 2nd place team: Claude Max 5x, 1 month per person&lt;/p&gt;

&lt;p&gt;🥉 3rd place team: Claude Pro, 1 month per person&lt;/p&gt;

&lt;p&gt;Teams of up to 4. Solo submissions welcome.&lt;/p&gt;




&lt;h3&gt;
  
  
  How we judge
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Use of Actian VectorAI DB (30%):&lt;/strong&gt; Is VectorAI DB doing real work in this app? Does the team know why they used it the way they did?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Real-world impact (25%):&lt;/strong&gt; Does it solve something people actually care about? Would someone use this?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Technical execution (25%):&lt;/strong&gt; Does it work? Is the code coherent and the architecture thought through?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Demo and presentation (20%):&lt;/strong&gt; Can you explain what you built and why it matters?&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  How to submit
&lt;/h3&gt;

&lt;p&gt;All submissions go through DoraHacks. You'll need a public GitHub or GitLab repo with a README, a working demo (video, Loom, or live link), and a short write-up covering what you built, why, and which technical requirement you used.&lt;/p&gt;

&lt;p&gt;Results announced April 20 on Discord.&lt;/p&gt;




&lt;h3&gt;
  
  
  Join us
&lt;/h3&gt;

&lt;p&gt;Register: &lt;a href="https://dorahacks.io/hackathon/2097/detail" rel="noopener noreferrer"&gt;dorahacks.io/hackathon/2097/detail&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Discord for support, team formation, and progress sharing: &lt;a href="https://discord.gg/432A2M63Py" rel="noopener noreferrer"&gt;discord.gg/432A2M63Py&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Drop a comment if you're in. See you April 13.&lt;/p&gt;

</description>
      <category>hackathon</category>
      <category>vectordatabase</category>
      <category>database</category>
      <category>ai</category>
    </item>
    <item>
      <title>Building Your First AI Agent Without Frameworks</title>
      <dc:creator>Geri Máté</dc:creator>
      <pubDate>Fri, 13 Jun 2025 10:50:56 +0000</pubDate>
      <link>https://forem.com/gerimate/building-your-first-ai-agent-without-frameworks-l5p</link>
      <guid>https://forem.com/gerimate/building-your-first-ai-agent-without-frameworks-l5p</guid>
      <description>&lt;p&gt;&lt;strong&gt;Want to understand how AI agents actually work? Let's build one from scratch before jumping into frameworks.&lt;/strong&gt;&lt;/p&gt;




&lt;p&gt;Most AI agent tutorials start with &lt;a href="https://langchain-ai.github.io/langgraph/" rel="noopener noreferrer"&gt;LangGraph&lt;/a&gt; or &lt;a href="https://www.crewai.com/" rel="noopener noreferrer"&gt;CrewAI&lt;/a&gt;, which are great tools, but they can make it hard to understand what's happening underneath. &lt;/p&gt;

&lt;p&gt;An agent is really just a language model that can call functions. Once you understand that, frameworks make way more sense.&lt;/p&gt;

&lt;p&gt;Today we're building a customer support system using &lt;a href="https://platform.openai.com/docs/api-reference" rel="noopener noreferrer"&gt;OpenAI's API&lt;/a&gt; and Python. This will give you the fundamentals that make any agent framework easier to use and debug.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What we're building:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A routing system that decides which "specialist" handles each query&lt;/li&gt;
&lt;li&gt;Function-calling agents that can search FAQs and analyze sentiment
&lt;/li&gt;
&lt;li&gt;Simple state management to track conversations&lt;/li&gt;
&lt;li&gt;Logic to escalate to humans when needed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By the end, you'll understand how agents work under the hood, making you much more effective when you do use frameworks.&lt;/p&gt;

&lt;h2&gt;
  
  
  An Agent is Just an LLM with Tools
&lt;/h2&gt;

&lt;p&gt;Seriously, that's all there is to it:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Language model&lt;/strong&gt; with a specific job&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Functions&lt;/strong&gt; it can call &lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Logic&lt;/strong&gt; to decide when to use them&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Everything else is just orchestration.&lt;/p&gt;

&lt;p&gt;Let's start with the simplest possible agent:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;

&lt;span class="c1"&gt;# Set up OpenAI (get your API key from https://platform.openai.com/api-keys)
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="n"&gt;openai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;SimpleAgent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;callable&lt;/span&gt;&lt;span class="p"&gt;]):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;role&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;role&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;tool&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;__name__&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;respond&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Create tool descriptions for the model
&lt;/span&gt;        &lt;span class="n"&gt;tool_descriptions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;func&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;items&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
            &lt;span class="n"&gt;tool_descriptions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;function&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;function&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;func&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;__doc__&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Function &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;parameters&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;object&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;properties&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;The input query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
                        &lt;span class="p"&gt;},&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;required&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
                    &lt;span class="p"&gt;}&lt;/span&gt;
                &lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="p"&gt;})&lt;/span&gt;

        &lt;span class="c1"&gt;# Call OpenAI with function calling
&lt;/span&gt;        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o-mini&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
                &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tool_descriptions&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;tool_choice&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;auto&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;  &lt;span class="c1"&gt;# Let the model decide when to use tools
&lt;/span&gt;        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Handle function calls
&lt;/span&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_calls&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;tool_call&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_calls&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="n"&gt;function_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tool_call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;function&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;
            &lt;span class="n"&gt;arguments&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tool_call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;function&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;arguments&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

            &lt;span class="c1"&gt;# Execute the function
&lt;/span&gt;            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;function_name&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;function_name&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="n"&gt;arguments&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
                &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

        &lt;span class="c1"&gt;# Regular response if no function call
&lt;/span&gt;        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="c1"&gt;# Test it out
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;search_faq&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Search the FAQ database for answers&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;faqs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;shipping&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Standard shipping takes 3-5 business days&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;refund&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Refunds processed within 5-7 business days&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;return&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Returns accepted within 30 days&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;answer&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;faqs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;items&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;topic&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;answer&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;No FAQ found for that topic&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="c1"&gt;# Create an FAQ agent
&lt;/span&gt;&lt;span class="n"&gt;faq_agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SimpleAgent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;FAQ Assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;re a helpful FAQ assistant. Use the search_faq function to find answers to customer questions.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;search_faq&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Test it
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;faq_agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;respond&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;How long does shipping take?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="c1"&gt;# FAQ Assistant: Standard shipping takes 3-5 business days
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Done.&lt;/strong&gt; You just built an AI agent. It understands questions, knows when to use its tool, and gives helpful answers.&lt;/p&gt;

&lt;h2&gt;
  
  
  Adding More Specialists
&lt;/h2&gt;

&lt;p&gt;Now let's add agents that handle different stuff:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;analyze_sentiment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Analyze the emotional tone of customer messages&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="c1"&gt;# Simple keyword approach - you could use [Transformers](https://huggingface.co/docs/transformers/index) for a real sentiment model
&lt;/span&gt;    &lt;span class="n"&gt;negative_words&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;angry&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;frustrated&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;terrible&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;awful&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hate&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;urgent_words&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;urgent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;immediately&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;asap&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;emergency&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="n"&gt;query_lower&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;any&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;word&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;query_lower&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;word&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;urgent_words&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;URGENT: Customer needs immediate attention&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="nf"&gt;any&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;word&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;query_lower&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;word&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;negative_words&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;NEGATIVE: Customer is frustrated, handle with care&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;NEUTRAL: Standard response appropriate&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;check_escalation_needed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Determine if human escalation is needed&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;escalation_triggers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;speak to manager&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cancel account&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;legal action&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;complaint&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;lawsuit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;terrible service&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;any&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;trigger&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;trigger&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;escalation_triggers&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ESCALATE: Route to human agent immediately&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CONTINUE: AI agent can handle this query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="c1"&gt;# Create specialized agents
&lt;/span&gt;&lt;span class="n"&gt;sentiment_agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SimpleAgent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Sentiment Analyzer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You analyze customer emotions. Use analyze_sentiment to understand how the customer is feeling.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;analyze_sentiment&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;escalation_agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SimpleAgent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Escalation Manager&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You decide when customers need human help. Use check_escalation_needed to evaluate queries.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;check_escalation_needed&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The Router: Deciding Who Handles What
&lt;/h2&gt;

&lt;p&gt;Here's where it gets interesting - we need something to decide which agent handles each message:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;AgentRouter&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;agents&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;faq&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;faq_agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sentiment&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;sentiment_agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;escalation&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;escalation_agent&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conversation_history&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;route_query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Decide which agent should handle this query&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

        &lt;span class="c1"&gt;# Save the conversation
&lt;/span&gt;        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conversation_history&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;

        &lt;span class="c1"&gt;# Basic routing - you could make this way smarter
&lt;/span&gt;        &lt;span class="n"&gt;query_lower&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="c1"&gt;# Check for escalation triggers first
&lt;/span&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;any&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;word&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;query_lower&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;word&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;manager&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;complaint&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cancel&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;lawsuit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]):&lt;/span&gt;
            &lt;span class="n"&gt;agent_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;escalation&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="c1"&gt;# Check for emotional language
&lt;/span&gt;        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="nf"&gt;any&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;word&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;query_lower&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;word&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;angry&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;frustrated&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;urgent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;terrible&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]):&lt;/span&gt;
            &lt;span class="n"&gt;agent_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sentiment&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="c1"&gt;# Default to FAQ for standard questions
&lt;/span&gt;        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;agent_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;faq&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

        &lt;span class="c1"&gt;# Get response from the right agent
&lt;/span&gt;        &lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;agents&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;agent_name&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;respond&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Save that too
&lt;/span&gt;        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conversation_history&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[Routed to &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;agent_name&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;upper&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;]&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_conversation_summary&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Get a summary of the conversation so far&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conversation_history&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;No conversation yet&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

        &lt;span class="n"&gt;summary&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Conversation with &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conversation_history&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;//&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; exchanges:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conversation_history&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;:]):&lt;/span&gt;  &lt;span class="c1"&gt;# Last 2 exchanges
&lt;/span&gt;            &lt;span class="n"&gt;role&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Customer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="n"&gt;summary&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;summary&lt;/span&gt;

&lt;span class="c1"&gt;# Test the complete system
&lt;/span&gt;&lt;span class="n"&gt;router&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AgentRouter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;=== Customer Support Agent System ===&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Test different types of queries
&lt;/span&gt;&lt;span class="n"&gt;test_queries&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;How long does shipping take?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;I&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;m really frustrated with this terrible service!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;I want to speak to your manager right now!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s your return policy?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;test_queries&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Customer: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;router&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;route_query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Conversation Summary:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;router&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_conversation_summary&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Making It Smarter: Let the AI Do the Routing
&lt;/h2&gt;

&lt;p&gt;Keyword matching works, but we can do better. Let's use the LLM itself to make routing decisions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;SmartRouter&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;agents&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;faq&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;faq_agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sentiment&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;sentiment_agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;escalation&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;escalation_agent&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conversation_history&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;smart_route&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Use AI to decide which agent should handle the query&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

        &lt;span class="n"&gt;routing_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;You&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;re routing customer queries to specialists.

        Options:
        - faq: Standard questions about policies, shipping, returns
        - sentiment: Upset or frustrated customers  
        - escalation: Complex complaints or requests for managers

        Customer: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;

        Which specialist? Just answer: faq, sentiment, or escalation&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o-mini&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;routing_prompt&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
            &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;agent_choice&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="c1"&gt;# Default to FAQ if something weird happens
&lt;/span&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;agent_choice&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;agents&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;agent_choice&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;faq&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

        &lt;span class="c1"&gt;# Get response from chosen agent
&lt;/span&gt;        &lt;span class="n"&gt;agent_response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;agents&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;agent_choice&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;respond&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[Smart routed to &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;agent_choice&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;upper&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;]&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;agent_response&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="c1"&gt;# Test smart routing
&lt;/span&gt;&lt;span class="n"&gt;smart_router&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SmartRouter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;=== Smart Routing Test ===&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;smart_test_queries&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;My package is late and I&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;m getting married tomorrow!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Do you accept international credit cards?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;This is absolutely ridiculous, I want my money back immediately!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Can I return something I bought 3 weeks ago?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;smart_test_queries&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Customer: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;smart_router&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;smart_route&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Adding Memory: Making Conversations Actually Work
&lt;/h2&gt;

&lt;p&gt;Real support conversations build on what happened before. Here's how to add memory:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;MemoryAwareRouter&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;agents&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;faq&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;faq_agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sentiment&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;sentiment_agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;escalation&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;escalation_agent&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conversation_memory&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sentiment_history&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[],&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;escalated&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;resolved_issues&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;process_with_memory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Process query with full conversation context&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

        &lt;span class="c1"&gt;# Save current message
&lt;/span&gt;        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conversation_memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;timestamp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;now&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;

        &lt;span class="c1"&gt;# Build context summary
&lt;/span&gt;        &lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_build_context&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="n"&gt;routing_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Previous conversation context:
        &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;

        Current message: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;

        Which specialist should handle this?
        - faq: Standard questions
        - sentiment: Emotional customers
        - escalation: Complex issues or if already escalated

        Consider the conversation history. Answer: faq, sentiment, or escalation&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o-mini&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;routing_prompt&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
            &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;agent_choice&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;agent_choice&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;agents&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;agent_choice&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;faq&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

        &lt;span class="c1"&gt;# Update customer context based on routing
&lt;/span&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;agent_choice&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sentiment&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_context&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sentiment_history&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;negative&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;agent_choice&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;escalation&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_context&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;escalated&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;

        &lt;span class="c1"&gt;# Get enhanced response with context
&lt;/span&gt;        &lt;span class="n"&gt;agent_response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_get_contextual_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;agent_choice&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Add to memory
&lt;/span&gt;        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conversation_memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;agent_response&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;agent_choice&lt;/span&gt;
        &lt;span class="p"&gt;})&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[Contextual routing to &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;agent_choice&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;upper&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;]&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;agent_response&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_build_context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Build conversation context summary&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conversation_memory&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;New conversation&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

        &lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Conversation history: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conversation_memory&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; messages&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Customer escalated: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_context&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;escalated&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Negative sentiment detected: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_context&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;sentiment_history&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; times&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

        &lt;span class="c1"&gt;# Include last few exchanges
&lt;/span&gt;        &lt;span class="n"&gt;recent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conversation_memory&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;:]&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;recent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;role&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Customer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Agent (&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;agent&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;unknown&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;...&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_get_contextual_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;agent_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Get response with conversation context&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;agents&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;agent_name&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

        &lt;span class="c1"&gt;# Add context to the agent's response
&lt;/span&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_context&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;escalated&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;agent_name&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;escalation&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;prefix&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[Customer previously escalated] &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_context&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sentiment_history&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;prefix&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[Customer has been frustrated multiple times] &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;prefix&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;

        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;respond&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;prefix&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;

&lt;span class="c1"&gt;# Test memory-aware system
&lt;/span&gt;&lt;span class="n"&gt;memory_router&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MemoryAwareRouter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;=== Memory-Aware Conversation ===&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;conversation_flow&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s your return policy?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;That&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s not good enough, I&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;m really frustrated!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;I want to speak to someone who can actually help me!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Fine, what information do you need for the return?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;conversation_flow&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Customer: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;memory_router&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;process_with_memory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What You Actually Built
&lt;/h2&gt;

&lt;p&gt;You just created a complete customer support system using basic Python and OpenAI. Here's what you learned:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The fundamentals:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ &lt;strong&gt;Agents = LLM + functions + routing logic&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;✅ &lt;strong&gt;Function calling&lt;/strong&gt; lets agents take actions&lt;/li&gt;
&lt;li&gt;✅ &lt;strong&gt;Smart routing&lt;/strong&gt; decides who handles what&lt;/li&gt;
&lt;li&gt;✅ &lt;strong&gt;State management&lt;/strong&gt; keeps conversations coherent&lt;/li&gt;
&lt;li&gt;✅ &lt;strong&gt;Memory&lt;/strong&gt; makes agents context-aware&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Why this approach:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You'll understand what frameworks actually do for you&lt;/li&gt;
&lt;li&gt;Easier to debug when things go wrong&lt;/li&gt;
&lt;li&gt;You can customize behavior exactly how you want&lt;/li&gt;
&lt;li&gt;Works with any LLM provider&lt;/li&gt;
&lt;li&gt;Good foundation before learning frameworks&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Making It Production Ready
&lt;/h2&gt;

&lt;p&gt;To actually deploy this, you'd need:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The basics:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Error handling (APIs fail)&lt;/li&gt;
&lt;li&gt;Database for conversation storage&lt;/li&gt;
&lt;li&gt;Rate limiting (prevent abuse)&lt;/li&gt;
&lt;li&gt;Proper logging&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The nice-to-haves:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Real sentiment analysis model&lt;/li&gt;
&lt;li&gt;Integration with your FAQ database&lt;/li&gt;
&lt;li&gt;Actual escalation to humans (&lt;a href="https://api.slack.com/" rel="noopener noreferrer"&gt;Slack API&lt;/a&gt;, email, etc.)&lt;/li&gt;
&lt;li&gt;Analytics on what's working&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;When frameworks make sense:&lt;/strong&gt;&lt;br&gt;
Now you understand what &lt;a href="https://langchain-ai.github.io/langgraph/" rel="noopener noreferrer"&gt;LangGraph&lt;/a&gt;, &lt;a href="https://www.crewai.com/" rel="noopener noreferrer"&gt;CrewAI&lt;/a&gt;, and &lt;a href="https://microsoft.github.io/autogen/" rel="noopener noreferrer"&gt;AutoGen&lt;/a&gt; do - they handle the routing and orchestration you just built manually. They're great when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You need complex multi-step workflows&lt;/li&gt;
&lt;li&gt;You want pre-built integrations and tools&lt;/li&gt;
&lt;li&gt;You're working on a team that benefits from standardized patterns&lt;/li&gt;
&lt;li&gt;You need features like human-in-the-loop or advanced state management&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The key is knowing when the abstraction helps versus when you need more control.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Real Lesson
&lt;/h2&gt;

&lt;p&gt;AI agents are organized LLMs with specific jobs and the ability to call functions. The "multi-agent" part is smart routing and state management.&lt;/p&gt;

&lt;p&gt;Understanding these fundamentals makes you better at using any framework because you know what's happening underneath. Start here, then use frameworks when their features solve real problems you're facing.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Built something cool with this? I'd love to see what you made - drop it in the comments!&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>openai</category>
      <category>python</category>
    </item>
    <item>
      <title>How to Prevent AI Agents From Breaking in Production</title>
      <dc:creator>Geri Máté</dc:creator>
      <pubDate>Fri, 06 Jun 2025 12:21:12 +0000</pubDate>
      <link>https://forem.com/gerimate/how-to-prevent-ai-agents-from-breaking-in-production-24c3</link>
      <guid>https://forem.com/gerimate/how-to-prevent-ai-agents-from-breaking-in-production-24c3</guid>
      <description>&lt;p&gt;Deploying AI agents in production is trickier than most teams expect. What works perfectly in development often becomes a reliability nightmare once real traffic hits.&lt;/p&gt;

&lt;p&gt;After looking at incident reports, some clear patterns emerge. The same few issues keep causing the majority of production failures.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://hugobowne.substack.com/p/why-ai-agents-fail-in-productionand" rel="noopener noreferrer"&gt;42% of AI agent failures come from hallucinated API calls&lt;/a&gt;&lt;/strong&gt;, and another &lt;strong&gt;&lt;a href="https://www.bankinfosecurity.com/popular-gpus-used-ai-systems-vulnerable-to-memory-leak-flaw-a-24135" rel="noopener noreferrer"&gt;23% are GPU memory leaks&lt;/a&gt;&lt;/strong&gt;. These aren't edge cases - they're systematic problems that need systematic solutions.&lt;/p&gt;

&lt;p&gt;Here's what's actually breaking and how to prevent it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Common failure patterns
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Hallucinated API calls
&lt;/h3&gt;

&lt;p&gt;LLMs generate code that looks correct but calls non-existent methods or deprecated endpoints. Traditional validation tools miss this because the code is syntactically valid - it just references APIs that don't exist in your environment.&lt;/p&gt;

&lt;p&gt;Teams often spend significant time debugging what appears to be infrastructure issues when the root cause is the AI making incorrect assumptions about available APIs.&lt;/p&gt;

&lt;h3&gt;
  
  
  GPU memory leaks
&lt;/h3&gt;

&lt;p&gt;A &lt;a href="https://www.bankinfosecurity.com/popular-gpus-used-ai-systems-vulnerable-to-memory-leak-flaw-a-24135" rel="noopener noreferrer"&gt;known vulnerability in AMD, Apple, and Qualcomm GPUs&lt;/a&gt; can cause AI workloads to leak over 180MB per inference cycle. In Kubernetes environments, this can cascade across pods and eventually crash entire nodes.&lt;/p&gt;

&lt;p&gt;Standard monitoring often doesn't catch this until resource exhaustion is already occurring.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cascading failures
&lt;/h3&gt;

&lt;p&gt;AI agents are more interconnected than typical microservices. A single malformed operation can stall agent threads for extended periods, and recovery processes often reset accumulated context, leading to broader system failures.&lt;/p&gt;

&lt;h3&gt;
  
  
  Insufficient observability
&lt;/h3&gt;

&lt;p&gt;Most teams monitor traditional infrastructure metrics but lack visibility into AI-specific behavior like GPU utilization patterns, token consumption, and model performance degradation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Practical solutions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Constrain API generation
&lt;/h3&gt;

&lt;p&gt;Instead of relying on post-generation validation, limit what the LLM can suggest in the first place by providing explicit API context:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Extract what's actually available
&lt;/span&gt;&lt;span class="n"&gt;global_deps&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;extract_imports&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;codebase&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;local_deps&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;parse_function_calls&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;current_module&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Tell the LLM what it can actually use
&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
Available APIs: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;global_deps&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;
Local functions: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;local_deps&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;
Task: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;user_request&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Teams using dependency-constrained prompting report fewer API hallucinations. The approach is straightforward: if you don't tell the LLM about APIs that don't exist, it's less likely to invent them.&lt;/p&gt;

&lt;h3&gt;
  
  
  Implement GPU resource controls
&lt;/h3&gt;

&lt;p&gt;Set explicit resource limits in your container orchestration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;limits&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;nvidia.com/gpu&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
    &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;4Gi"&lt;/span&gt;
  &lt;span class="na"&gt;requests&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;4Gi"&lt;/span&gt;
    &lt;span class="na"&gt;cpu&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Monitor GPU memory usage and restart containers before they crash:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;
&lt;span class="k"&gt;while &lt;/span&gt;&lt;span class="nb"&gt;true&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do
  &lt;/span&gt;&lt;span class="nv"&gt;vram_usage&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;nvidia-smi &lt;span class="nt"&gt;--query-gpu&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;memory.used &lt;span class="nt"&gt;--format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;csv,noheader,nounits&lt;span class="si"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="nv"&gt;$vram_usage&lt;/span&gt; &lt;span class="nt"&gt;-gt&lt;/span&gt; 7500 &lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then&lt;/span&gt;  &lt;span class="c"&gt;# 90% of 8GB&lt;/span&gt;
    kubectl rollout restart deployment/ai-agent
  &lt;span class="k"&gt;fi
  &lt;/span&gt;&lt;span class="nb"&gt;sleep &lt;/span&gt;30
&lt;span class="k"&gt;done&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This type of proactive monitoring has reduced OOM crashes in production environments.&lt;/p&gt;

&lt;h3&gt;
  
  
  Version AI components as units
&lt;/h3&gt;

&lt;p&gt;AI agents consist of multiple interdependent components: models, vector databases, prompt templates, and configuration. These should be &lt;a href="https://www.dbos.dev/blog/durable-execution-crashproof-ai-agents" rel="noopener noreferrer"&gt;versioned and deployed together&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# ai-agent-chart/Chart.yaml&lt;/span&gt;
&lt;span class="na"&gt;dependencies&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;llm-model&lt;/span&gt;
    &lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1.2.3"&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;vector-db&lt;/span&gt;
    &lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;0.9.1"&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;prompt-templates&lt;/span&gt;
    &lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2.1.0"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Deploying the entire bundle as a unit prevents version mismatches that can cause subtle but significant failures.&lt;/p&gt;

&lt;h3&gt;
  
  
  Add AI-specific monitoring
&lt;/h3&gt;

&lt;p&gt;Traditional APM tools don't capture AI-specific metrics. You need to track GPU utilization, token consumption, and model performance alongside business outcomes. &lt;a href="https://latitude-blog.ghost.io/blog/best-practices-for-llm-observability-in-cicd/" rel="noopener noreferrer"&gt;OpenTelemetry&lt;/a&gt; provides a good foundation for this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;opentelemetry&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;trace&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;

&lt;span class="n"&gt;tracer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;trace&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_tracer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;__name__&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;ai_inference&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;tracer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;start_as_current_span&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ai_inference&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;start_time&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_attribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prompt.length&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_attribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user.id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_attribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;response.length&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_attribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;inference.duration&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;start_time&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_attribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tokens.consumed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;count_tokens&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Correlating these metrics with infrastructure data helps identify when GPU pressure affects response quality.&lt;/p&gt;

&lt;h3&gt;
  
  
  Build resilient fallback systems
&lt;/h3&gt;

&lt;p&gt;Implement &lt;a href="https://botpress.com/blog/ai-agent-routing" rel="noopener noreferrer"&gt;circuit breakers&lt;/a&gt; for external API calls:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;tenacity&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;retry&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;stop_after_attempt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;wait_exponential&lt;/span&gt;

&lt;span class="nd"&gt;@retry&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;stop&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;stop_after_attempt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;wait&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;wait_exponential&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;multiplier&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;min&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;max&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;call_external_api&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;endpoint&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;endpoint&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;raise_for_status&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Have a clear escalation path when AI components fail:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;ai_with_fallback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_request&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;ai_agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;process&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="n"&gt;AIAgentError&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;rule_based_handler&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;process&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;escalate_to_human&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Request escalated to support team&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Making AI agents production-ready
&lt;/h2&gt;

&lt;p&gt;AI agents in production require the same operational discipline as any other critical system. The difference is that they have unique failure modes that traditional monitoring and deployment practices don't address.&lt;/p&gt;

&lt;p&gt;Teams that succeed treat AI agents as complex distributed systems with proper observability, resource management, and graceful degradation. The ones that struggle try to deploy them like traditional applications.&lt;/p&gt;

&lt;p&gt;The good news is that once you address these systematic issues, AI agents become much more predictable and reliable in production environments.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>devops</category>
      <category>kubernetes</category>
    </item>
    <item>
      <title>Deploy AI Agents Without Infrastructure Headaches</title>
      <dc:creator>Geri Máté</dc:creator>
      <pubDate>Fri, 30 May 2025 11:17:40 +0000</pubDate>
      <link>https://forem.com/gerimate/deploy-ai-agents-without-infrastructure-headaches-4230</link>
      <guid>https://forem.com/gerimate/deploy-ai-agents-without-infrastructure-headaches-4230</guid>
      <description>&lt;p&gt;Platform engineers have a new nightmare: explaining to their CTO why the AI agent deployment that worked perfectly in staging is now burning through $50,000/month in production. The Terraform config looks flawless. The security groups are properly configured. The ECS tasks are healthy. But somehow, the vector database is choking on embeddings, the LLM gateway is routing traffic to the wrong regions, and the workflow orchestration is stuck in an infinite retry loop.&lt;/p&gt;

&lt;p&gt;Traditional IaC tools weren't built for this complexity.&lt;/p&gt;

&lt;h2&gt;
  
  
  Traditional IaC Can't Handle AI Workloads
&lt;/h2&gt;

&lt;p&gt;When ChatGPT generates your Terraform config, it looks perfect. But deploy it and everything breaks:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="c1"&gt;# This looks right but will fail in production&lt;/span&gt;
&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_security_group"&lt;/span&gt; &lt;span class="s2"&gt;"ai_agent"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"ai-agent-sg"&lt;/span&gt;

  &lt;span class="nx"&gt;ingress&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;from_port&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;443&lt;/span&gt;
    &lt;span class="nx"&gt;to_port&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;443&lt;/span&gt;
    &lt;span class="nx"&gt;protocol&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"tcp"&lt;/span&gt;
    &lt;span class="nx"&gt;cidr_blocks&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"0.0.0.0/0"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;  &lt;span class="c1"&gt;# ❌ Too permissive&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_ecs_service"&lt;/span&gt; &lt;span class="s2"&gt;"ai_agent"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt;            &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"ai-agent"&lt;/span&gt;
  &lt;span class="nx"&gt;cluster&lt;/span&gt;         &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_ecs_cluster&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;main&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
  &lt;span class="nx"&gt;task_definition&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_ecs_task_definition&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ai_agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;arn&lt;/span&gt;

  &lt;span class="c1"&gt;# ❌ Missing: vector DB networking, LLM provider configs, &lt;/span&gt;
  &lt;span class="c1"&gt;# retry policies, cost controls, monitoring...&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;LLMs generating IaC are trained on public examples, not production systems. They miss vector database networking, multi-provider LLM failover, and other complexities that break under real traffic.&lt;/p&gt;

&lt;p&gt;AI agents need &lt;a href="https://www.madrona.com/ai-agent-infrastructure-three-layers-tools-data-orchestration/" rel="noopener noreferrer"&gt;completely different infrastructure&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;Traditional Layer:         AI-Specific Layer&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;Compute (ECS/Lambda)     - Vector Database (Pinecone/Weaviate)&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;Storage (S3/EBS)         - LLM Gateway (Multi-provider routing)&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;Database (RDS)           - Workflow Orchestration (Temporal/Prefect)&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;Networking (VPC/ALB)     - Model Serving &amp;amp; State Management&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each has its own failure modes and scaling patterns that traditional IaC treats as generic cloud resources.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Actually Works
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Pulumi for AI Infrastructure
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://www.pulumi.com/solutions/ai/" rel="noopener noreferrer"&gt;Pulumi has native AI providers&lt;/a&gt; that treat vector databases and LLM gateways as real infrastructure. The trade-off? Your team needs to learn TypeScript/Python instead of HCL, and you're betting on a smaller ecosystem than Terraform's.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Alternative approaches:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Custom Terraform providers&lt;/strong&gt; - Build your own for AI services (more work, but stays in Terraform)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Terraform + scripts&lt;/strong&gt; - Use Terraform for basic infra, scripts for AI-specific parts&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AWS CDK&lt;/strong&gt; - Good if you're AWS-only
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;pinecone&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@pulumi/pinecone&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;temporal&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@pulumi/temporal&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;// Native vector database support&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;vectorIndex&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;pinecone&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Index&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;knowledge-base&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;customer-support-kb&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;metric&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;cosine&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;dimension&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1536&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;serverless&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="na"&gt;cloud&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;aws&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="na"&gt;region&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;us-east-1&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="c1"&gt;// Workflow orchestration as code&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;aiWorkflow&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;temporal&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Namespace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;ai-workflows&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;customer-support&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;retention&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;7d&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Temporal Handles Complex AI Workflows
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://temporal.io/blog/nine-ways-to-use-temporal-in-your-ai-workflows" rel="noopener noreferrer"&gt;Temporal manages the orchestration&lt;/a&gt; that AI agents need. Downsides: another system to operate, and your team needs to learn workflow concepts.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Alternatives:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Prefect&lt;/strong&gt; - Similar to Temporal but more Python-native&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Step Functions&lt;/strong&gt; - AWS-native, simpler but less powerful&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Kubernetes Jobs&lt;/strong&gt; - If you want to stay close to K8s
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@workflow.defn&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;CustomerSupportAgent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nd"&gt;@workflow.run&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;handle_request&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Survives infrastructure failures
&lt;/span&gt;        &lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute_activity&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;search_knowledge_base&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;user_query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;start_to_close_timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;timedelta&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;seconds&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Automatic retries with backoff
&lt;/span&gt;        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute_activity&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;call_llm_with_context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user_query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;context&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="n"&gt;retry_policy&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;RetryPolicy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;maximum_attempts&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Long-running workflows (hours/days/weeks)
&lt;/span&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;needs_human_review&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;wait_condition&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="k"&gt;lambda&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="n"&gt;search_attributes&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;approved&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;CostOptimizedAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pulumi&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ComponentResource&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Spot instances for training
&lt;/span&gt;        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;training_cluster&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;aws&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ecs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Cluster&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;-training&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;capacity_providers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;FARGATE_SPOT&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Reserved capacity for production
&lt;/span&gt;        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;inference_service&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;aws&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ecs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Service&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;-inference&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;desired_count&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;calculate_optimal_capacity&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Security and Operational Considerations
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;API Key Management:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use AWS Secrets Manager or Azure Key Vault for LLM API keys&lt;/li&gt;
&lt;li&gt;Rotate keys automatically (most AI providers support this)&lt;/li&gt;
&lt;li&gt;Never put API keys in your IaC code - use secret references&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Rollback Strategy:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AI infrastructure changes can break in subtle ways&lt;/li&gt;
&lt;li&gt;Always test rollbacks in staging first&lt;/li&gt;
&lt;li&gt;Keep vector database backups before schema changes&lt;/li&gt;
&lt;li&gt;Use blue-green deployments for model updates&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Team Training:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Budget 2-4 weeks for engineers to learn Pulumi + Temporal&lt;/li&gt;
&lt;li&gt;Start with one person, then spread knowledge&lt;/li&gt;
&lt;li&gt;Document your AI infrastructure patterns for the team&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Monitoring That Actually Matters
&lt;/h2&gt;

&lt;p&gt;Regular monitoring misses what's important for AI systems. &lt;a href="https://my.idc.com/getdoc.jsp?containerId=prUS52758624" rel="noopener noreferrer"&gt;AI infrastructure spending hits $223 billion by 2028&lt;/a&gt;, so you need proper observability:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;aiMetrics&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;aws&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;cloudwatch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Dashboard&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;ai-observability&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;dashboardBody&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;pulumi&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;jsonStringify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="na"&gt;widgets&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;
            &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;metric&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="na"&gt;properties&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="na"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                    &lt;span class="c1"&gt;// Traditional metrics&lt;/span&gt;
                    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;AWS/ECS&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;CPUUtilization&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;AWS/ECS&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;MemoryUtilization&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;

                    &lt;span class="c1"&gt;// AI-specific metrics that actually matter&lt;/span&gt;
                    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;AI/VectorDB&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;QueryLatency&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;AI/LLM&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;TokensPerSecond&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;AI/LLM&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;ResponseQuality&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;AI/Workflow&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;CompletionRate&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;AI/Cost&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;DollarPerInteraction&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
                &lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="na"&gt;title&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;AI System Health&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}]&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="c1"&gt;// Alert on cost spikes&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;costSpike&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;aws&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;cloudwatch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;MetricAlarm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;ai-cost-spike&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;comparisonOperator&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;GreaterThanThreshold&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;metricName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;DollarPerInteraction&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;threshold&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.50&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;// Alert if cost per interaction &amp;gt; $0.50&lt;/span&gt;
    &lt;span class="na"&gt;alarmDescription&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;AI infrastructure costs spiking&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What Teams Are Seeing
&lt;/h2&gt;

&lt;p&gt;People adopting AI-native infrastructure report significant improvements:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;10-100x lower costs&lt;/strong&gt; with &lt;a href="https://www.pulumi.com/blog/pinecone-serverless/" rel="noopener noreferrer"&gt;serverless vector databases&lt;/a&gt; vs. provisioned capacity&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.qwak.com/post/llm-cost" rel="noopener noreferrer"&gt;Self-hosted models can cost significantly less&lt;/a&gt; than API-based solutions for high-volume workloads&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Companies using &lt;a href="https://temporal.io/blog/build-resilient-agentic-ai-with-temporal" rel="noopener noreferrer"&gt;Temporal for AI workflows&lt;/a&gt; report significantly reduced debugging time and improved reliability for long-running AI processes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Start here:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Check your AI costs&lt;/strong&gt; - How much are you spending compared to self-hosted options?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pick one AI workflow&lt;/strong&gt; to rebuild as a test&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Try &lt;a href="https://docs.pinecone.io/integrations/pulumi" rel="noopener noreferrer"&gt;Pulumi with Pinecone&lt;/a&gt;&lt;/strong&gt; - deploy a test vector database&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Next month:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Move critical AI workflows to Temporal&lt;/li&gt;
&lt;li&gt;Set up cost monitoring and alerts&lt;/li&gt;
&lt;li&gt;Add AI-specific observability&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Companies building reliable, cheap AI infrastructure stopped using traditional IaC tools. They switched to AI-native approaches that treat AI workloads properly.&lt;/p&gt;

&lt;p&gt;Your call: Keep fighting with Terraform and burning money, or use patterns that actually work.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>devops</category>
      <category>infrastructureascode</category>
      <category>terraform</category>
    </item>
    <item>
      <title>AI Deployment: Why Serverless is Perfect (and Terrible)</title>
      <dc:creator>Geri Máté</dc:creator>
      <pubDate>Wed, 28 May 2025 10:40:29 +0000</pubDate>
      <link>https://forem.com/gerimate/ai-deployment-why-serverless-is-perfect-and-terrible-4phl</link>
      <guid>https://forem.com/gerimate/ai-deployment-why-serverless-is-perfect-and-terrible-4phl</guid>
      <description>&lt;p&gt;Your AI agent works perfectly in development. You've tested the reasoning chains, the tool integrations are solid, and the responses are exactly what users need. Then you deploy to production and everything breaks.&lt;/p&gt;

&lt;p&gt;The timeout kills your multi-step workflows after 15 minutes. Your bundle exceeds the 250MB limit because you need scikit-learn, pandas, and a vector database client. Cold starts take 6+ seconds while your models load, making real-time interactions impossible.&lt;/p&gt;

&lt;p&gt;Sound familiar? You're not alone. One developer working on an e-commerce recommendation engine discovered that "scikit-learn and pandas libraries increased the size of my deployment package beyond the AWS Lambda package limits." Another found their TensorFlow model loading caused API calls to timeout after 29 seconds.&lt;/p&gt;

&lt;p&gt;Here's the thing: serverless isn't broken for AI. You're just hitting the boundaries of what it was designed for. Traditional serverless platforms were built for quick, stateless web requests—not long-running AI agent workflows that need to maintain context, load large models, and perform complex reasoning chains.&lt;/p&gt;

&lt;p&gt;But before you abandon serverless entirely, understand this: for certain AI workloads, serverless is absolutely perfect. The question isn't whether to use serverless for AI—it's knowing when it works brilliantly and when it fails catastrophically.&lt;/p&gt;

&lt;h2&gt;
  
  
  When Serverless Shines for AI Deployments
&lt;/h2&gt;

&lt;p&gt;Serverless excels in three specific AI scenarios that traditional infrastructure can't match.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Unpredictable Traffic Patterns&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;AI applications often experience extreme traffic variability. Your chatbot gets mentioned in a tweet and suddenly handles 1000x normal load. A content generation API processes 10 requests per hour during quiet periods, then 1000 requests during marketing campaigns.&lt;/p&gt;

&lt;p&gt;Serverless platforms automatically scale from zero to thousands of concurrent executions without configuration. AWS Lambda provides &lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/invocation-scaling.html" rel="noopener noreferrer"&gt;1,000 concurrent executions by default&lt;/a&gt;, scaling instantly based on demand. You pay only for actual compute time—not idle servers waiting for the next AI inference request.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Event-Driven AI Processing&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Many AI workflows fit perfectly into event-driven patterns. Document uploaded → extract text → summarize content. New customer signup → analyze preferences → generate personalized recommendations. Code commit → run AI code review → post feedback.&lt;/p&gt;

&lt;p&gt;These discrete, triggered operations align with serverless strengths. Each event spawns an independent function execution that processes the task and terminates. No need to manage background services or polling mechanisms.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Simple Inference Tasks&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Lightweight AI operations—sentiment analysis, text classification, simple embeddings generation—work excellently in serverless environments. These tasks typically complete within seconds, use manageable dependencies, and don't require complex state management.&lt;/p&gt;

&lt;p&gt;A sentiment analysis API using a pre-trained model can process requests in under 100ms with warm starts, providing excellent user experience while benefiting from serverless cost efficiency.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Serverless Reality Check
&lt;/h2&gt;

&lt;p&gt;The problems start when your AI workloads bump against fundamental serverless constraints.&lt;/p&gt;

&lt;h3&gt;
  
  
  Timeout Limitations Kill Complex Workflows
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/configuration-timeout.html" rel="noopener noreferrer"&gt;AWS Lambda caps execution at &lt;strong&gt;15 minutes maximum&lt;/strong&gt;&lt;/a&gt;. &lt;a href="https://vercel.com/docs/functions/configuring-functions/duration" rel="noopener noreferrer"&gt;Vercel Functions limits vary by plan&lt;/a&gt;: &lt;strong&gt;60 seconds on Hobby, 300 seconds on Pro, 900 seconds on Enterprise&lt;/strong&gt;. &lt;a href="https://developers.cloudflare.com/workers/platform/limits/" rel="noopener noreferrer"&gt;Cloudflare Workers allows unlimited wall-clock time&lt;/a&gt; but restricts &lt;strong&gt;CPU time to 5 minutes&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Multi-step AI agent workflows routinely exceed these limits. Consider a research agent that:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Searches multiple data sources (2-3 minutes)&lt;/li&gt;
&lt;li&gt;Processes and analyzes findings (3-5 minutes)
&lt;/li&gt;
&lt;li&gt;Generates comprehensive report (5-8 minutes)&lt;/li&gt;
&lt;li&gt;Formats and delivers output (1-2 minutes)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Total runtime: 11-18 minutes. This workflow will fail on most serverless platforms or hit timeout limits that kill execution before completion.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Real-world example&lt;/strong&gt;: AI agents performing "extract, transform, and load (ETL) jobs and content generation workflows such as creating PDF files or media transcoding require fast, scalable local storage to process large amounts of data quickly"—operations that frequently exceed serverless timeout constraints.&lt;/p&gt;

&lt;h3&gt;
  
  
  Bundle Size Problems Block AI Dependencies
&lt;/h3&gt;

&lt;p&gt;Traditional serverless deployments face &lt;a href="https://stackoverflow.com/questions/54632009/how-to-increase-the-maximum-size-of-the-aws-lambda-deployment-package-requesten" rel="noopener noreferrer"&gt;severe size restrictions&lt;/a&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;AWS Lambda ZIP packages&lt;/strong&gt;: 50MB compressed, 250MB uncompressed&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vercel Functions&lt;/strong&gt;: 250MB uncompressed including layers&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cloudflare Workers&lt;/strong&gt;: 3MB free, 10MB paid plans&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Popular AI libraries routinely exceed these limits. Scikit-learn, pandas, numpy, and scipy together often surpass 250MB. Add a vector database client like Pinecone or Weaviate, plus an LLM SDK, and you're well beyond platform constraints.&lt;/p&gt;

&lt;p&gt;The introduction of &lt;a href="https://aws.amazon.com/blogs/aws/new-for-aws-lambda-container-image-support/" rel="noopener noreferrer"&gt;&lt;strong&gt;AWS Lambda container images&lt;/strong&gt;&lt;/a&gt; (up to 10GB) fundamentally changes this landscape, but requires more complex deployment processes and sacrifices some serverless simplicity.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cold Start Performance Destroys User Experience
&lt;/h3&gt;

&lt;p&gt;AI workloads suffer dramatically from cold start penalties. &lt;a href="https://www.bounteous.com/insights/improving-your-lambda-coldstart-performance-with-aws-lambda-snapstart/" rel="noopener noreferrer"&gt;Research shows that &lt;strong&gt;99.9% of cold starts take up to 6.99 seconds&lt;/strong&gt;&lt;/a&gt; for Java-based AI applications, while warm starts complete in just 33 milliseconds.&lt;/p&gt;

&lt;p&gt;Loading TensorFlow models can cause initial API calls to timeout after 29 seconds during cold starts, though subsequent warm function calls process images in under one second. This unpredictable performance makes serverless unsuitable for real-time AI interactions where users expect immediate responses.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The cold start penalty compounds with AI complexity&lt;/strong&gt;: larger models, more dependencies, and initialization-heavy frameworks all extend startup times beyond acceptable user experience thresholds.&lt;/p&gt;

&lt;h2&gt;
  
  
  Making Serverless Work: Practical Patterns
&lt;/h2&gt;

&lt;p&gt;You can work around serverless limitations with architectural patterns designed for AI workloads.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Workflow Suspension and Resume
&lt;/h3&gt;

&lt;p&gt;Break long-running AI processes into discrete steps with state persistence between invocations. Each step saves progress to external storage, enabling the next function to continue from checkpoint.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Step 1: Initial Analysis&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;analyzeInput&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;analysis&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;performAnalysis&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;input&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="c1"&gt;// Save state to Redis/DynamoDB&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;saveState&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;workflowId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; 
    &lt;span class="na"&gt;step&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;analysis&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;result&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;analysis&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;nextStep&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;generate&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="c1"&gt;// Trigger next step&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;triggerNextStep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;workflowId&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;processing&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;workflowId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;workflowId&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="c1"&gt;// Step 2: Content Generation  &lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;generateContent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;state&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;loadState&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;workflowId&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;generateFromAnalysis&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;saveState&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;workflowId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;step&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;complete&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;finalResult&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;content&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;complete&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;result&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;content&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This pattern enables unlimited workflow duration by staying within individual function timeout limits while maintaining progress state.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. External State Management
&lt;/h3&gt;

&lt;p&gt;AI agents require sophisticated state management beyond serverless stateless models. Externalize all persistent data to dedicated storage:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Redis/ElastiCache&lt;/strong&gt;: Conversation context, short-term agent memory&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PostgreSQL/MongoDB&lt;/strong&gt;: Long-term user preferences, interaction history
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://www.pinecone.io/" rel="noopener noreferrer"&gt;Vector databases&lt;/a&gt;&lt;/strong&gt;: Embeddings storage for semantic search and RAG
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;chatAgent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// Load conversation context&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`chat:&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="c1"&gt;// Process with context&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;generateResponse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="c1"&gt;// Update conversation state&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setex&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`chat:&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3600&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[...&lt;/span&gt;&lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="na"&gt;lastActivity&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Container-Based Deployment
&lt;/h3&gt;

&lt;p&gt;Use AWS Lambda container images to eliminate bundle size constraints. Include complete AI frameworks and pre-trained models within container deployments.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; public.ecr.aws/lambda/python:3.9&lt;/span&gt;

&lt;span class="c"&gt;# Copy model files during build&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; models/ ${LAMBDA_TASK_ROOT}/models/&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; requirements.txt .&lt;/span&gt;

&lt;span class="k"&gt;RUN &lt;/span&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt

&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; app.py ${LAMBDA_TASK_ROOT}&lt;/span&gt;

&lt;span class="k"&gt;CMD&lt;/span&gt;&lt;span class="s"&gt; ["app.lambda_handler"]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Container deployment enables 10GB packages while maintaining serverless operational benefits, though with increased deployment complexity.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Smart Cold Start Mitigation
&lt;/h3&gt;

&lt;p&gt;Implement strategies to minimize cold start impact:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Model Pre-warming&lt;/strong&gt;: Use scheduled functions to keep models loaded:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Scheduled every 5 minutes&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;keepWarm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;modelExists&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;checkModelAvailability&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;modelExists&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;downloadAndCacheModel&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;model ready&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Progressive Response&lt;/strong&gt;: Return immediate acknowledgment, then stream results:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;aiInference&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// Immediate response&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;responseId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;generateId&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;sendInitialResponse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;responseId&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="c1"&gt;// Background processing with streaming updates&lt;/span&gt;
  &lt;span class="nf"&gt;processInBackground&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;input&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;responseId&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;responseId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;processing&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Platform-Specific Considerations
&lt;/h2&gt;

&lt;h3&gt;
  
  
  AWS Lambda: Enterprise-Grade with Complexity Trade-offs
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Strengths&lt;/strong&gt;: Longest timeouts (15 minutes), container support up to 10GB, mature ecosystem, &lt;a href="https://aws.amazon.com/lambda/provisioned-concurrency/" rel="noopener noreferrer"&gt;Provisioned Concurrency for predictable performance&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Best for&lt;/strong&gt;: Complex AI workflows, enterprise deployments requiring compliance and integration with AWS services.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Limitations&lt;/strong&gt;: Cold start performance, complex configuration for container deployments.&lt;/p&gt;

&lt;h3&gt;
  
  
  Vercel Functions: Developer Experience with Timeout Constraints
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Strengths&lt;/strong&gt;: Excellent developer experience, edge distribution, &lt;a href="https://vercel.com/guides/what-can-i-do-about-vercel-serverless-functions-timing-out" rel="noopener noreferrer"&gt;Fluid Compute for extended durations&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Best for&lt;/strong&gt;: Simple AI APIs, content generation workflows, applications prioritizing deployment simplicity.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Limitations&lt;/strong&gt;: Aggressive timeout limits (60 seconds on free tier), bundle size restrictions persist.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cloudflare Workers: Global Edge with Memory Constraints
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Strengths&lt;/strong&gt;: Global edge distribution, unlimited wall-clock time, &lt;a href="https://developers.cloudflare.com/changelog/2025-03-25-higher-cpu-limits/" rel="noopener noreferrer"&gt;recent CPU limit increases to 5 minutes&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Best for&lt;/strong&gt;: Real-time AI inference requiring global distribution, lightweight AI operations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Limitations&lt;/strong&gt;: 128MB memory limit, 10MB maximum bundle size, V8 runtime restrictions.&lt;/p&gt;

&lt;h2&gt;
  
  
  When NOT to Use Serverless for AI
&lt;/h2&gt;

&lt;p&gt;Certain AI workloads fundamentally conflict with serverless constraints:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Always-On AI Agents&lt;/strong&gt;: Customer service bots, monitoring systems, and agents requiring continuous availability benefit from dedicated infrastructure avoiding cold start penalties.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Heavy Model Inference&lt;/strong&gt;: Large language models requiring substantial memory (8GB+ RAM) or &lt;a href="https://aws.amazon.com/ec2/instance-types/p4/" rel="noopener noreferrer"&gt;specialized hardware (GPUs)&lt;/a&gt; exceed serverless platform capabilities.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Complex Multi-Agent Systems&lt;/strong&gt;: Workflows requiring persistent communication between multiple AI agents, shared memory, or complex coordination patterns work better with traditional infrastructure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;High-Volume Production Workloads&lt;/strong&gt;: Applications processing thousands of AI requests per minute may find dedicated infrastructure more cost-effective than per-invocation serverless pricing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hybrid Architectures: Best of Both Worlds
&lt;/h2&gt;

&lt;p&gt;Most production AI systems benefit from hybrid approaches combining serverless and traditional infrastructure. &lt;a href="https://aws.amazon.com/step-functions/" rel="noopener noreferrer"&gt;AWS Step Functions&lt;/a&gt; provides excellent orchestration for these patterns:&lt;/p&gt;

&lt;h3&gt;
  
  
  Router Pattern
&lt;/h3&gt;

&lt;p&gt;Use serverless functions as intelligent routers directing requests to appropriate processing infrastructure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;aiRouter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;complexity&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;analyzeRequestComplexity&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;complexity&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;simple&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;processServerless&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;queueForContainerProcessing&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Hot/Cold Architecture
&lt;/h3&gt;

&lt;p&gt;Maintain always-on infrastructure for baseline load, serverless for traffic spikes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Containers handle predictable, consistent traffic&lt;/li&gt;
&lt;li&gt;Serverless functions scale for demand peaks&lt;/li&gt;
&lt;li&gt;Cost optimization through usage pattern matching&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Making the Right Choice for Your AI Deployment
&lt;/h2&gt;

&lt;p&gt;Use this decision framework when evaluating serverless for AI workloads:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Choose Serverless When:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Execution time consistently under 10 minutes&lt;/li&gt;
&lt;li&gt;Traffic patterns are unpredictable or bursty
&lt;/li&gt;
&lt;li&gt;Dependencies fit within platform bundle limits (or container deployment acceptable)&lt;/li&gt;
&lt;li&gt;Workflow can be broken into discrete steps&lt;/li&gt;
&lt;li&gt;Cold start latency is acceptable for use case&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Choose Traditional Infrastructure When:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Workflows require 15+ minutes execution time&lt;/li&gt;
&lt;li&gt;Always-on availability is critical&lt;/li&gt;
&lt;li&gt;Memory requirements exceed 10GB&lt;/li&gt;
&lt;li&gt;Complex multi-agent coordination needed&lt;/li&gt;
&lt;li&gt;Consistent sub-second response times required&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Consider Hybrid When:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Traffic patterns combine baseline and spike loads&lt;/li&gt;
&lt;li&gt;Some workflows fit serverless constraints, others don't&lt;/li&gt;
&lt;li&gt;Cost optimization across variable usage patterns is priority&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Bottom Line
&lt;/h2&gt;

&lt;p&gt;Serverless isn't universally perfect or terrible for AI deployment—it's contextual. Simple, discrete AI operations work excellently in serverless environments, providing cost efficiency and automatic scaling. Complex, long-running AI agent workflows require architectural adaptations or alternative infrastructure.&lt;/p&gt;

&lt;p&gt;The key is matching your specific AI workload characteristics to platform capabilities rather than forcing incompatible patterns. As serverless platforms continue evolving—container support, extended timeouts, &lt;a href="https://aws.amazon.com/blogs/compute/optimizing-cold-start-performance-of-aws-lambda-using-advanced-priming-strategies-with-snapstart/" rel="noopener noreferrer"&gt;better cold start performance&lt;/a&gt;—the viable use cases for serverless AI will expand.&lt;/p&gt;

&lt;p&gt;Start by auditing your current AI deployment challenges against serverless constraints. If timeout limits, bundle sizes, or cold start performance block your use case, consider hybrid architectures or traditional infrastructure. If your workflows fit serverless patterns, you'll benefit from simplified operations and automatic scaling.&lt;/p&gt;

&lt;p&gt;The serverless AI landscape changes rapidly. What's impossible today may be trivial next year. But right now, success depends on honest assessment of your requirements against current platform realities—not wishful thinking about what serverless should support.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>serverless</category>
      <category>devops</category>
    </item>
    <item>
      <title>5 Developer Pain Points Solved by Internal Developer Platforms</title>
      <dc:creator>Geri Máté</dc:creator>
      <pubDate>Fri, 16 May 2025 12:03:56 +0000</pubDate>
      <link>https://forem.com/gerimate/5-developer-pain-points-solved-by-internal-developer-platforms-1bd6</link>
      <guid>https://forem.com/gerimate/5-developer-pain-points-solved-by-internal-developer-platforms-1bd6</guid>
      <description>&lt;p&gt;Ever feel like you spend more time wrestling with tools than actually building stuff? You're not alone.&lt;/p&gt;

&lt;p&gt;According to &lt;a href="https://about.gitlab.com/the-source/platform/devops-teams-want-to-shake-off-diy-toolchains-a-platform-is-the-answer/" rel="noopener noreferrer"&gt;GitLab's research&lt;/a&gt;, developers waste up to 75% of their time just maintaining toolchains rather than coding. Even worse, over 78% of DevOps professionals report wasting between 25-100% of their time keeping their toolchain running.&lt;/p&gt;

&lt;p&gt;Traditional development is like being handed a giant bin of unsorted LEGO bricks and told to build a castle. You spend most of your time digging through the pile looking for the right pieces, and everyone builds differently.&lt;/p&gt;

&lt;p&gt;Platform engineering is like getting those official LEGO kits with sorted pieces, clear instructions, and modular components. You still have creative freedom, but you're not wasting hours hunting for that one specific brick or reinventing foundations that have already been perfected.&lt;/p&gt;

&lt;p&gt;I've spent years documenting developer workflows and watching teams struggle with the same problems over and over. Let's look at five major pain points and how Internal Developer Platforms (IDPs) actually solve them.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's an Internal Developer Platform anyway?
&lt;/h2&gt;

&lt;p&gt;Before diving in, a quick definition: an IDP is a &lt;a href="https://spacelift.io/blog/what-is-an-internal-developer-platform" rel="noopener noreferrer"&gt;self-service layer&lt;/a&gt; that sits on top of your infrastructure and tools, abstracting away complexity so developers can focus on building rather than configuring. Think of it as a unified interface for your entire development lifecycle.&lt;/p&gt;

&lt;p&gt;No more jumping between 10+ tools just to deploy a simple feature.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pain Point #1: Deployment Bottlenecks
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Problem
&lt;/h3&gt;

&lt;p&gt;How long does it take your team to get code from commit to production? For most teams, it's days or weeks. &lt;a href="https://shipyard.build/blog/improve-dora-change-lead-time/" rel="noopener noreferrer"&gt;Elite teams deploy in under a day&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The bottleneck isn't usually the code—it's the deployment process itself. When deployments require specialized knowledge or manual steps, everything slows down. If the one person who knows how to deploy is on vacation, you're stuck.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Solution
&lt;/h3&gt;

&lt;p&gt;IDPs provide self-service templates for deployments. Instead of developers needing to understand the underlying infrastructure, they get standardized workflows with the right guardrails.&lt;/p&gt;

&lt;p&gt;With a platform approach, your team can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Deploy without waiting for DevOps/platform teams&lt;/li&gt;
&lt;li&gt;Use templates that enforce best practices&lt;/li&gt;
&lt;li&gt;Automate the entire CI/CD pipeline&lt;/li&gt;
&lt;li&gt;Deploy with a single click or command&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Getting Started
&lt;/h3&gt;

&lt;p&gt;You don't need a huge budget to implement this. Start with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/features/actions" rel="noopener noreferrer"&gt;GitHub Actions&lt;/a&gt; or GitLab CI for automated pipelines&lt;/li&gt;
&lt;li&gt;Docker (used by &lt;a href="https://survey.stackoverflow.co/2024/technology" rel="noopener noreferrer"&gt;59% of professional developers&lt;/a&gt;) for consistent environments&lt;/li&gt;
&lt;li&gt;Standardized deployment scripts checked into your repo&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Set up templates for your most common deployment types and build from there.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pain Point #2: Context Switching Costs
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Problem
&lt;/h3&gt;

&lt;p&gt;Each interruption costs developers &lt;a href="https://axolo.co/blog/p/cost-context-switching-developer-workflow" rel="noopener noreferrer"&gt;20+ minutes to regain focus&lt;/a&gt;. When developers have to switch between different tasks, tools, and contexts, productivity tanks.&lt;/p&gt;

&lt;p&gt;The math is brutal: for a team of 10 engineers losing 10 minutes per context switch at $72/hour, that's $120 lost per build. With 50 builds per day and 22 working days, you're burning $132,000 monthly in lost productivity.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://www.cortex.io/report/the-2024-state-of-developer-productivity" rel="noopener noreferrer"&gt;2024 State of Developer Productivity report&lt;/a&gt; found "time spent gathering project context" tied for the biggest productivity leak (26%).&lt;/p&gt;

&lt;h3&gt;
  
  
  The Solution
&lt;/h3&gt;

&lt;p&gt;Platform engineering attacks this by creating unified interfaces and standardized workflows. Instead of switching between CI/CD tools, cloud consoles, monitoring dashboards, and ticketing systems, developers get a single interface.&lt;/p&gt;

&lt;p&gt;Implementing an IDP gives you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;One portal for accessing all development resources&lt;/li&gt;
&lt;li&gt;Integrated workflows that reduce tool-switching&lt;/li&gt;
&lt;li&gt;Standardized processes that become muscle memory&lt;/li&gt;
&lt;li&gt;Fewer interruptions due to missing context&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Getting Started
&lt;/h3&gt;

&lt;p&gt;For smaller teams, you can start with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A centralized dashboard linking to your most-used tools&lt;/li&gt;
&lt;li&gt;Consistent CLI tools that work across projects&lt;/li&gt;
&lt;li&gt;Documentation that follows the same structure for all services&lt;/li&gt;
&lt;li&gt;Automating workflows that currently require multiple tools&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Pain Point #3: Environment Inconsistency
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Problem
&lt;/h3&gt;

&lt;p&gt;"It works on my machine" might be the most frustrating phrase in software development. Environment inconsistencies waste countless hours on debugging issues that only appear in specific environments.&lt;/p&gt;

&lt;p&gt;When dev, test, and production environments don't match, you're essentially testing different systems. Problems appear out of nowhere during deployment, and fixing them becomes a painful guessing game.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Solution
&lt;/h3&gt;

&lt;p&gt;IDPs provide standardized environment templates and self-service provisioning. This ensures consistency across all stages of development.&lt;/p&gt;

&lt;p&gt;With a platform approach:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Every environment uses identical configurations&lt;/li&gt;
&lt;li&gt;Developers can spin up environments on-demand&lt;/li&gt;
&lt;li&gt;Configuration changes propagate consistently&lt;/li&gt;
&lt;li&gt;Local development matches production&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Getting Started
&lt;/h3&gt;

&lt;p&gt;Begin with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://www.docker.com/" rel="noopener noreferrer"&gt;Docker&lt;/a&gt; for containerizing applications&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://docs.docker.com/compose/" rel="noopener noreferrer"&gt;Docker Compose&lt;/a&gt; for local development environments&lt;/li&gt;
&lt;li&gt;Environment configuration stored as code&lt;/li&gt;
&lt;li&gt;Automated environment provisioning scripts&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Even small teams can implement these practices incrementally.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pain Point #4: Cognitive Load from Multiple Tools
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Problem
&lt;/h3&gt;

&lt;p&gt;Most teams juggle 6+ different tools, with 13% managing up to 14 different tools in their development chain. Each tool has its own interface, quirks, and mental model.&lt;/p&gt;

&lt;p&gt;Learning and remembering how to use all these tools creates massive cognitive overhead, especially for new team members.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Solution
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/resources/articles/software-development/what-is-platform-engineering" rel="noopener noreferrer"&gt;Platform engineering&lt;/a&gt; streamlines development by providing standardized tools and interfaces. IDPs create a single point of entry for developers to access everything they need.&lt;/p&gt;

&lt;p&gt;Implementing a platform approach gives you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Uniform interfaces across different tools&lt;/li&gt;
&lt;li&gt;Standardized workflows that work the same way everywhere&lt;/li&gt;
&lt;li&gt;Simplified onboarding for new team members&lt;/li&gt;
&lt;li&gt;Lower learning curve for daily tasks&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Getting Started
&lt;/h3&gt;

&lt;p&gt;Start by:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Auditing your current toolchain to identify redundancies&lt;/li&gt;
&lt;li&gt;Creating consistent interfaces for your most-used tools&lt;/li&gt;
&lt;li&gt;Building wrapper scripts that standardize common commands&lt;/li&gt;
&lt;li&gt;Setting up a simple internal portal or wiki that provides single-point access&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Pain Point #5: Security &amp;amp; Compliance Overhead
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Problem
&lt;/h3&gt;

&lt;p&gt;Security is crucial but often becomes a productivity killer. Manual security reviews, compliance checks, and remediations consume valuable development time and delay deployments.&lt;/p&gt;

&lt;p&gt;When security is bolted on at the end rather than built in from the start, it creates friction and frustration.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Solution
&lt;/h3&gt;

&lt;p&gt;Platform engineering embraces "self-service with guardrails." IDPs build security into workflows rather than tacking it on afterward.&lt;/p&gt;

&lt;p&gt;With a platform approach:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Security scanning happens automatically in pipelines&lt;/li&gt;
&lt;li&gt;Compliance checks run continuously&lt;/li&gt;
&lt;li&gt;Policy enforcement happens transparently&lt;/li&gt;
&lt;li&gt;Developers get instant feedback on security issues&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Getting Started
&lt;/h3&gt;

&lt;p&gt;Even small teams can implement:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pre-commit hooks for basic security checks&lt;/li&gt;
&lt;li&gt;Automated vulnerability scanning in CI pipelines&lt;/li&gt;
&lt;li&gt;Compliance-as-code using tools like &lt;a href="https://www.openpolicyagent.org/" rel="noopener noreferrer"&gt;OPA&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Security templates for new projects&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Leveraging What You Already Have
&lt;/h2&gt;

&lt;p&gt;The good news? You probably already have the foundation for platform engineering in place. The trick is connecting these pieces into a cohesive experience:&lt;/p&gt;

&lt;p&gt;Your Git workflow can expand beyond code versioning to include configuration and &lt;a href="https://www.redhat.com/en/topics/automation/what-is-infrastructure-as-code-iac" rel="noopener noreferrer"&gt;Infrastructure as Code&lt;/a&gt; specs.&lt;/p&gt;

&lt;p&gt;Those Docker containers you use for local development? With some standardization, they become the basis for consistent environments across your pipeline.&lt;/p&gt;

&lt;p&gt;That CI/CD pipeline you built for testing? It can become the backbone of a self-service deployment platform.&lt;/p&gt;

&lt;p&gt;The key isn't getting new tools—it's connecting what you have in smarter ways. Focus on eliminating the manual steps between these systems first, then build interfaces that make the process seamless.&lt;/p&gt;

&lt;p&gt;What's your team's biggest development pain point? Let me know in the comments!&lt;/p&gt;

</description>
      <category>devops</category>
      <category>cicd</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Streamlining Multi-Tenant Kubernetes: A Practical Implementation Guide for 2025</title>
      <dc:creator>Geri Máté</dc:creator>
      <pubDate>Wed, 14 May 2025 14:55:05 +0000</pubDate>
      <link>https://forem.com/gerimate/streamlining-multi-tenant-kubernetes-a-practical-implementation-guide-for-2025-1bin</link>
      <guid>https://forem.com/gerimate/streamlining-multi-tenant-kubernetes-a-practical-implementation-guide-for-2025-1bin</guid>
      <description>&lt;p&gt;Let's face it: running multiple applications on separate clusters is a resource nightmare. If you've got different teams or customers needing isolated environments, you're probably spending way more on infrastructure than you need to.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://kubernetes.io/docs/concepts/security/multi-tenancy/" rel="noopener noreferrer"&gt;Multi-tenancy in Kubernetes&lt;/a&gt; offers a solution, but it comes with its own set of challenges. How do you ensure proper isolation? What about resource allocation? And the big one – security?&lt;/p&gt;

&lt;p&gt;This guide provides practical steps for implementing multi-tenant Kubernetes that actually works in production environments. By the end, you'll have a roadmap for consolidating your infrastructure while maintaining isolation where it matters.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Multi-Tenancy Actually Means in 2025
&lt;/h2&gt;

&lt;p&gt;Multi-tenancy has become a bit of a buzzword, but at its core, it still means the same thing: multiple users sharing the same infrastructure. In Kubernetes, we typically see two flavors:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.loft.sh/blog/kubernetes-multi-tenancy-10-essential-considerations" rel="noopener noreferrer"&gt;Multiple teams within an organization&lt;/a&gt;&lt;/strong&gt;: Different departments or projects sharing a cluster, where team members have access through kubectl or GitOps controllers&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Multiple customer instances&lt;/strong&gt;: SaaS applications running customer workloads on shared infrastructure&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The key tradeoffs haven't changed much over the years, either. You're always balancing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Isolation&lt;/strong&gt;: Keeping tenants from accessing or messing with each other's resources&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Resource efficiency&lt;/strong&gt;: Maximizing hardware utilization and reducing costs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Operational complexity&lt;/strong&gt;: Making sure your team can actually manage this setup&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What has changed are the tools and patterns. Pure namespace-based isolation is still common, but we've seen a shift toward more sophisticated approaches using hierarchical namespaces, virtual clusters, and service meshes. Let's start with the building blocks you'll need for a practical implementation.&lt;/p&gt;

&lt;p&gt;For more details about how the platform approaches multi-tenancy, check &lt;a href="https://kubernetes.io/docs/concepts/security/multi-tenancy/" rel="noopener noreferrer"&gt;Kubernetes documentation&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Building Blocks: Practical Implementation Guide
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Namespace Configuration That Actually Works
&lt;/h3&gt;

&lt;p&gt;Namespaces are your first line of defense in multi-tenancy. Here's a modern namespace configuration with isolation in mind:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Namespace&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tenant-a&lt;/span&gt;
  &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;tenant&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tenant-a&lt;/span&gt;
    &lt;span class="na"&gt;pod-security.kubernetes.io/enforce&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;baseline&lt;/span&gt;
    &lt;span class="na"&gt;pod-security.kubernetes.io/audit&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;restricted&lt;/span&gt;
    &lt;span class="na"&gt;pod-security.kubernetes.io/warn&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;restricted&lt;/span&gt;
    &lt;span class="na"&gt;networking.k8s.io/isolation&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;enabled&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This does a few key things:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Creates a dedicated namespace for the tenant&lt;/li&gt;
&lt;li&gt;Labels it for easier filtering and policy targeting&lt;/li&gt;
&lt;li&gt;Applies Pod Security Standards (the modern replacement for Pod Security Policies)&lt;/li&gt;
&lt;li&gt;Marks it for network isolation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When organizing namespaces, many teams follow a pattern like &lt;code&gt;{tenant}-{environment}&lt;/code&gt; (e.g., &lt;code&gt;marketing-dev&lt;/code&gt;, &lt;code&gt;marketing-prod&lt;/code&gt;). For SaaS applications, you might use customer IDs or similar identifiers.&lt;/p&gt;

&lt;p&gt;The key thing to remember: namespaces alone aren't enough for true isolation. They're just containers for resources – you need additional controls to enforce boundaries.&lt;/p&gt;

&lt;h3&gt;
  
  
  RBAC That Actually Isolates Tenants
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://www.illumio.com/cybersecurity-101/rbac" rel="noopener noreferrer"&gt;Role-Based Access Control (RBAC)&lt;/a&gt; is essential for preventing tenants from accessing each other's resources. Here's a pattern that works well in practice:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Tenant admin role&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;rbac.authorization.k8s.io/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Role&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tenant-a&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tenant-admin&lt;/span&gt;
&lt;span class="na"&gt;rules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;apiGroups&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;apps"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;batch"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pods"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;services"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;deployments"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;jobs"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="na"&gt;verbs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;get"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;list"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;watch"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;create"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;update"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;patch"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;delete"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;apiGroups&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;networking.k8s.io"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ingresses"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="na"&gt;verbs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;get"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;list"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;watch"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;create"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;update"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;patch"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;apiGroups&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;configmaps"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;secrets"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="na"&gt;verbs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;get"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;list"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;watch"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;create"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;update"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;patch"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;delete"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="c1"&gt;# Binding for tenant admin&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;rbac.authorization.k8s.io/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;RoleBinding&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tenant-a-admin-binding&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tenant-a&lt;/span&gt;
&lt;span class="na"&gt;subjects&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;User&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tenant-a-admin&lt;/span&gt;
  &lt;span class="na"&gt;apiGroup&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;rbac.authorization.k8s.io&lt;/span&gt;
&lt;span class="na"&gt;roleRef&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Role&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tenant-admin&lt;/span&gt;
  &lt;span class="na"&gt;apiGroup&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;rbac.authorization.k8s.io&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice a few important things here:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The role is scoped to a specific namespace (&lt;code&gt;tenant-a&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;It grants permissions for common resources but nothing cluster-wide&lt;/li&gt;
&lt;li&gt;The binding associates a user with this role&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The pattern is simple but effective: create a set of standard roles for each tenant (admin, developer, viewer), each scoped to the tenant's namespace(s). &lt;/p&gt;

&lt;p&gt;One mistake I see teams make is being too generous with permissions. Start restrictive and loosen gradually as needed – it's much easier than trying to lock things down after a breach.&lt;/p&gt;

&lt;h3&gt;
  
  
  Network Policies That Actually Isolate Traffic
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://workos.com/blog/tenant-isolation-in-multi-tenant-systems" rel="noopener noreferrer"&gt;Network isolation&lt;/a&gt; is critical for multi-tenancy. By default, all pods in a Kubernetes cluster can talk to each other – not what you want in a multi-tenant environment.&lt;/p&gt;

&lt;p&gt;Here's a practical network policy that isolates tenant traffic:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;networking.k8s.io/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;NetworkPolicy&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tenant-isolation&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tenant-a&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;podSelector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;{}&lt;/span&gt;  &lt;span class="c1"&gt;# Applies to all pods in namespace&lt;/span&gt;
  &lt;span class="na"&gt;policyTypes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;Ingress&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;Egress&lt;/span&gt;
  &lt;span class="na"&gt;ingress&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;from&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;namespaceSelector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;matchLabels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;tenant&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tenant-a&lt;/span&gt;
  &lt;span class="na"&gt;egress&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;to&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;namespaceSelector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;matchLabels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;tenant&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tenant-a&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;to&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;namespaceSelector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;matchLabels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;common-services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;true"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This policy does two important things:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Allows ingress traffic only from the same tenant's namespace&lt;/li&gt;
&lt;li&gt;Allows egress traffic only to the same tenant's namespace or to namespaces labeled as common services&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The second part is particularly important – your tenants probably need access to shared services like monitoring, logging, or databases. By labeling those namespaces as &lt;code&gt;common-services: "true"&lt;/code&gt;, you create controlled exceptions to your isolation rules.&lt;/p&gt;

&lt;p&gt;A common mistake is forgetting about DNS and other cluster services. Make sure your network policies allow access to kube-system services that tenants need, or you'll have some very confusing debugging sessions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Resource Quotas to Prevent Noisy Neighbors
&lt;/h3&gt;

&lt;p&gt;One bad tenant can ruin the party for everyone by consuming all available resources. Resource quotas prevent this "noisy neighbor" problem:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ResourceQuota&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tenant-a-quota&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tenant-a&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;hard&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;requests.cpu&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;10"&lt;/span&gt;
    &lt;span class="na"&gt;requests.memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;20Gi&lt;/span&gt;
    &lt;span class="na"&gt;limits.cpu&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;20"&lt;/span&gt; 
    &lt;span class="na"&gt;limits.memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;40Gi&lt;/span&gt;
    &lt;span class="na"&gt;persistentvolumeclaims&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;20"&lt;/span&gt;
    &lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;30"&lt;/span&gt;
    &lt;span class="na"&gt;count/deployments.apps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;25"&lt;/span&gt;
    &lt;span class="na"&gt;count/statefulsets.apps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;10"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This quota sets limits on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;CPU and memory consumption (both requests and limits)&lt;/li&gt;
&lt;li&gt;Number of persistent volume claims (storage)&lt;/li&gt;
&lt;li&gt;Number of services and workloads (deployments, statefulsets)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Setting appropriate quota sizes takes some experimentation. Monitor actual usage patterns and adjust accordingly – too restrictive and legitimate workloads fail, too loose and you're back to the noisy neighbor problem.&lt;/p&gt;

&lt;p&gt;Pro tip: In addition to ResourceQuotas (which operate at namespace level), use LimitRanges to set default and maximum limits for individual containers. This prevents tenants from creating resource-hungry pods that still fit within their overall quota.&lt;/p&gt;

&lt;h2&gt;
  
  
  Real-World Implementation Benefits
&lt;/h2&gt;

&lt;p&gt;Research and industry reports show clear benefits when organizations implement proper multi-tenancy in Kubernetes environments:&lt;/p&gt;

&lt;p&gt;According to documented implementations, organizations typically see:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;30-40% reduction in infrastructure costs by consolidating multiple single-tenant clusters&lt;/li&gt;
&lt;li&gt;Significant decrease in time spent on cluster maintenance and updates&lt;/li&gt;
&lt;li&gt;Improved resource utilization, often doubling from around 30-35% to 70% or more&lt;/li&gt;
&lt;li&gt;Better standardization across development teams&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;However, implementation isn't without challenges. Common issues include:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Resistance from teams concerned about workload security and isolation&lt;/li&gt;
&lt;li&gt;Migration complexity for existing applications&lt;/li&gt;
&lt;li&gt;Learning curve for new multi-tenant tooling and workflows&lt;/li&gt;
&lt;li&gt;Special accommodations needed for resource-intensive or security-sensitive workloads&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This highlights an important point: multi-tenancy isn't all-or-nothing. Many successful implementations use a hybrid approach, keeping some high-security or high-performance workloads on dedicated clusters while consolidating standard workloads in shared environments.&lt;/p&gt;

&lt;h2&gt;
  
  
  Solving the Big Three Challenges
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Challenge 1: Security Vulnerabilities
&lt;/h3&gt;

&lt;p&gt;Cross-tenant data leakage and escalation attacks are the nightmare scenarios in multi-tenant environments. Here's a practical security checklist:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Enforce Pod Security Standards&lt;/strong&gt;:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;   &lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
   &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Namespace&lt;/span&gt;
   &lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
     &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tenant-a&lt;/span&gt;
     &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
       &lt;span class="na"&gt;pod-security.kubernetes.io/enforce&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;restricted&lt;/span&gt;
       &lt;span class="na"&gt;pod-security.kubernetes.io/enforce-version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1.29&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The "restricted" profile prevents pods from running as privileged, accessing host namespaces, or using dangerous capabilities.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Isolate tenant storage&lt;/strong&gt;:&lt;br&gt;
Use StorageClasses with tenant-specific access controls, or better yet, separate storage backends for sensitive data.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Implement regular security scanning&lt;/strong&gt;:&lt;br&gt;
Tools like Trivy, Falco, and Kube-bench can identify vulnerabilities in your multi-tenant setup.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Audit, audit, audit&lt;/strong&gt;:&lt;br&gt;
Enable audit logging and regularly review access patterns – many breaches are detected through unusual access.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Challenge 2: Resource Contention
&lt;/h3&gt;

&lt;p&gt;Even with resource quotas, you can still run into contention issues. Here are some practical solutions:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Pod Priority and Preemption&lt;/strong&gt;:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;   &lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;scheduling.k8s.io/v1&lt;/span&gt;
   &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;PriorityClass&lt;/span&gt;
   &lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
     &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tenant-high-priority&lt;/span&gt;
   &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1000000&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Assign different priority classes to tenant workloads based on their importance.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Node Anti-Affinity&lt;/strong&gt;:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;   &lt;span class="na"&gt;affinity&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
     &lt;span class="na"&gt;podAntiAffinity&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
       &lt;span class="na"&gt;requiredDuringSchedulingIgnoredDuringExecution&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
       &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;labelSelector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
           &lt;span class="na"&gt;matchExpressions&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
           &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tenant&lt;/span&gt;
             &lt;span class="na"&gt;operator&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;In&lt;/span&gt;
             &lt;span class="na"&gt;values&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
             &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;tenant-a&lt;/span&gt;
         &lt;span class="na"&gt;topologyKey&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;kubernetes.io/hostname"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This prevents multiple pods from the same tenant being scheduled on the same node, distributing the load.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Quality of Service Classes&lt;/strong&gt;:
Set appropriate QoS classes (Guaranteed, Burstable, BestEffort) for different tenant workloads to influence how they're treated under resource pressure.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Challenge 3: Operational Complexity
&lt;/h3&gt;

&lt;p&gt;Managing dozens or hundreds of tenants manually isn't feasible. Here's how to simplify operations:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Automate tenant provisioning&lt;/strong&gt;:&lt;br&gt;
Create a standardized process for spinning up new tenant namespaces, applying policies, and setting quotas.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Use a tenant operator&lt;/strong&gt;:&lt;br&gt;
Tools like &lt;a href="https://projectcapsule.dev/" rel="noopener noreferrer"&gt;Capsule&lt;/a&gt; or the &lt;a href="https://developers.redhat.com/articles/2024/02/14/deep-dive-stakaters-multi-tenant-operator" rel="noopener noreferrer"&gt;Multi-Tenant Operator&lt;/a&gt; can handle tenant lifecycle management, from creation to termination:&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;   &lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tenancy.stakater.com/v1alpha1&lt;/span&gt;
   &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Tenant&lt;/span&gt;
   &lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
     &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tenant-a&lt;/span&gt;
   &lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
     &lt;span class="na"&gt;owners&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
     &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tenant-a-admin&lt;/span&gt;
       &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;User&lt;/span&gt;
     &lt;span class="na"&gt;namespaces&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
     &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;tenant-a-dev&lt;/span&gt;
     &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;tenant-a-prod&lt;/span&gt;
     &lt;span class="na"&gt;quota&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
       &lt;span class="na"&gt;hard&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
         &lt;span class="na"&gt;requests.cpu&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;10'&lt;/span&gt;
         &lt;span class="na"&gt;requests.memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;20Gi&lt;/span&gt;
     &lt;span class="na"&gt;resourcePooling&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
     &lt;span class="na"&gt;namespacePrefix&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tenant-a-&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Implement tenant-aware monitoring&lt;/strong&gt;:&lt;br&gt;
Tag all metrics and logs with tenant identifiers to simplify debugging and enable tenant-specific dashboards.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Create self-service capabilities&lt;/strong&gt;:&lt;br&gt;
Build internal tools that let tenants manage their own resources within the constraints you define.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Wrapping Up: Is Multi-Tenancy Right for You?
&lt;/h2&gt;

&lt;p&gt;Multi-tenant Kubernetes isn't a silver bullet, but it can significantly reduce costs and operational overhead when implemented correctly. Here's a quick checklist to decide if it's right for your organization:&lt;/p&gt;

&lt;p&gt;✅ You have multiple teams or customers using similar infrastructure&lt;br&gt;
✅ You're comfortable with the security implications of shared infrastructure&lt;br&gt;
✅ You have the operational maturity to implement and maintain isolation&lt;br&gt;
✅ The cost savings outweigh the increased complexity&lt;/p&gt;

&lt;p&gt;The implementation patterns we've covered – namespace isolation, RBAC, network policies, and resource quotas – provide a solid foundation for most multi-tenant environments. Start small, perhaps with just two teams or customers, and expand as you gain confidence in your isolation mechanisms.&lt;/p&gt;

&lt;p&gt;Remember, you don't have to go all-in on multi-tenancy. Many organizations use a hybrid approach, with shared clusters for most workloads and dedicated clusters for high-security or high-performance applications.&lt;/p&gt;

&lt;p&gt;Whatever approach you choose, make sure your teams understand the boundaries and limitations of your multi-tenant setup. Technical controls are important, but so is user education – a confused tenant can unintentionally cause problems for everyone.&lt;/p&gt;

&lt;p&gt;What's your experience with multi-tenant Kubernetes? Have you implemented any of these patterns, or do you have alternative approaches? Share your thoughts in the comments below.&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>devops</category>
      <category>multitenancy</category>
    </item>
    <item>
      <title>Goodbye, 2023! dyrector.io’s Annual Recap</title>
      <dc:creator>Geri Máté</dc:creator>
      <pubDate>Wed, 20 Dec 2023 11:04:33 +0000</pubDate>
      <link>https://forem.com/dyrectorio/goodbye-2023-dyrectorios-annual-recap-abl</link>
      <guid>https://forem.com/dyrectorio/goodbye-2023-dyrectorios-annual-recap-abl</guid>
      <description>&lt;p&gt;&lt;strong&gt;2023 is coming to an end, which means it's time to revisit what happened with the team and the project of dyrector.io in the past 12 months.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  January – Full Stack Highlighted dyrector.io
&lt;/h2&gt;

&lt;p&gt;After the lengthy Christmas break with a full stomach and a couple extra kilograms the real surprise caught us blind-sided. &lt;strong&gt;&lt;a href="https://thefullstack.network/" rel="noopener noreferrer"&gt;The Full Stack&lt;/a&gt;&lt;/strong&gt; platform featured dyrector.io in its highlights.&lt;/p&gt;

&lt;p&gt;Team-wise the most notable event was our Minus 30 hike in the pleasant January weather, which was a great occasion to have a chat about both technology related and unrelated things, and also to taste some pálinka.&lt;/p&gt;

&lt;h2&gt;
  
  
  February – dyrector.io Alpha Dropped
&lt;/h2&gt;

&lt;p&gt;The first weeks of February were all about attending &lt;strong&gt;&lt;a href="https://fosdem.org/2024/" rel="noopener noreferrer"&gt;FOSDEM&lt;/a&gt;&lt;/strong&gt; and the upcoming launch of dyrector.io on Product Hunt. On the day of the launch we made alpha access available.&lt;/p&gt;

&lt;p&gt;Our &lt;strong&gt;&lt;a href="https://www.producthunt.com/products/dyrector-io-platform#dyrector-io" rel="noopener noreferrer"&gt;Product Hunt launch&lt;/a&gt;&lt;/strong&gt; turned out to be a shot at the buzzer, but we still did nice. With a launch 6 hours into the voting, we reached the #11 spot. The same day we made a new release and a demo video. Busier than planned, but we did good.&lt;/p&gt;

&lt;p&gt;At the conference in Belgium, we were able to catch up with a lot of likeminded people eager to learn about open-source software.&lt;/p&gt;

&lt;p&gt;At the same time, our teammate, Levi showed up in the local cloud meetup scene as organizer and a presenter, too. Another teammate of ours, Nándi was interviewed in the &lt;strong&gt;&lt;a href="https://www.youtube.com/watch?v=_qFJ5GEs2w4" rel="noopener noreferrer"&gt;podcast&lt;/a&gt;&lt;/strong&gt; series of Uptime Community about DevOps, ChatGPT, and open-source.&lt;/p&gt;

&lt;h2&gt;
  
  
  March – Three (Hundred) Is the Magic Number
&lt;/h2&gt;

&lt;p&gt;We doubled down on catering to a self-hosting audience in the first months of 2023, which helped us reach 300 stars on GitHub on the 3rd of March. We published a bunch of blog posts about self-hosting certain types of applications, which you can find here.&lt;/p&gt;

&lt;p&gt;In March, we published our &lt;strong&gt;&lt;a href="https://github.com/dyrector-io/awesome-infrastructure-questions" rel="noopener noreferrer"&gt;Awesome repository&lt;/a&gt;&lt;/strong&gt; containing infrastructure related questions. We consider it useful when someone is onboarded to a new project maintaining infrastructure.&lt;/p&gt;

&lt;p&gt;Another important event of the month was when &lt;strong&gt;&lt;a href="https://blog.dyrector.io/2023-03-21-docker-hub-registry-alternatives/" rel="noopener noreferrer"&gt;Docker announced the end of Free Teams on Docker Hub&lt;/a&gt;&lt;/strong&gt;. Backlash was inevitable and so was the organization backing out of their plans of monetizing Free Teams.&lt;/p&gt;

&lt;h2&gt;
  
  
  April – Adventures in the UK and Hungary
&lt;/h2&gt;

&lt;p&gt;A portion of our team took a business trip in the UK to visit Hanover Displays at their HQ in Brighton. While Levi and Gopher was there, they paid a visit to the LEGO HQ for a meetup, as well.&lt;/p&gt;

&lt;p&gt;After the trip in the UK, we went out of the office for a few days of team building when we could unwind with the whole team.&lt;/p&gt;

&lt;p&gt;Levi attended KubeCon in Amsterdam, too, which turned out to be the funniest way to reach 420 stars on GitHub on April 20th. Trust me, we didn’t plan this whatsoever.&lt;/p&gt;

&lt;h2&gt;
  
  
  May – 0.4.0 &amp;amp; Roadmap Published
&lt;/h2&gt;

&lt;p&gt;After a Q1 busy with refactoring and making dyrector.io’s code more efficient, we started to make new releases faster. The first step was making 0.4.0, which didn’t deliver any significant changes to functionality, but it was important to accelerate our release cycle in the long run.&lt;/p&gt;

&lt;p&gt;At the same time, we published our &lt;strong&gt;&lt;a href="https://github.com/orgs/dyrector-io/projects/2" rel="noopener noreferrer"&gt;roadmap&lt;/a&gt;&lt;/strong&gt; on GitHub and added new issues to the repository.&lt;/p&gt;

&lt;p&gt;We also made some new friends: ConfigCat reviewed the platform on their &lt;strong&gt;&lt;a href="https://configcat.com/blog/2023/05/16/introducing-dyrectorio-to-configcat-users/" rel="noopener noreferrer"&gt;blog&lt;/a&gt;&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  June – Team Building in Croatia &amp;amp; 1000000000 Stars
&lt;/h2&gt;

&lt;p&gt;Release 0.5.0 was a special moment for our team. It was the first version in months that included new features. To celebrate this special moment, we went to Croatia to finish working on the new version and chill at the sunny beach.&lt;/p&gt;

&lt;p&gt;This was the perfect way to kick off our summer. After the trip to Croatia, we were able to consistently release on a bi-weekly basis, shipping new features again and again.&lt;/p&gt;

&lt;p&gt;After 0.5.0 dropped, we passed 512 stars on GitHub, or 1000000000 in binary.&lt;/p&gt;

&lt;h2&gt;
  
  
  July – Automated Deployments With dyrector.io
&lt;/h2&gt;

&lt;p&gt;One of the most significant features we added this year was the auto-deployment capability. The &lt;strong&gt;&lt;a href="https://blog.dyrector.io/2023-07-20-dyrector-io-github-actions-continuous-deployment/" rel="noopener noreferrer"&gt;GitHub Actions compatible feature&lt;/a&gt;&lt;/strong&gt; came out on July 14 in release 0.6.0.&lt;/p&gt;

&lt;p&gt;A very pleasant surprise was when Nevo David mentioned dyrector.io in his &lt;strong&gt;&lt;a href="https://dev.to/github20k/7-open-source-projects-you-should-contribute-to-in-2023-1nph"&gt;blog post&lt;/a&gt;&lt;/strong&gt;, which resulted in increased exposure and interest in the platform. In a few days we gained hundreds of stars on GitHub.&lt;/p&gt;

&lt;p&gt;Even though it was the middle of the summer, we took no breaks. Between publishing new releases full of new features, we went to Lake Balaton to sail and Nándi and Geri even completed the Lake Balaton Cross Swimming.&lt;/p&gt;

&lt;p&gt;At the end of July Levi attended WeAreDevelopers 2023 in Berlin.&lt;/p&gt;

&lt;h2&gt;
  
  
  August – dyrector.io Turns International
&lt;/h2&gt;

&lt;p&gt;The most significant change was an internal change: our teammate, Nándi moved to the Netherlands with his girlfriend. We officially became a remote-first company, while the rest of the team still showed up at the office every day. We had a goodbye party for him where we said farewell with a few cans of his favorite beverages for the road.&lt;/p&gt;

&lt;p&gt;We launched dyrector.io on a new platform called &lt;strong&gt;&lt;a href="https://devhunt.org/tool/dyrectorio" rel="noopener noreferrer"&gt;Dev Hunt&lt;/a&gt;&lt;/strong&gt;, which is an open-source Product Hunt alternative. With the help of our community, we were able to reach the #1 spot and the Developer Tool of the Week title that comes with it.&lt;/p&gt;

&lt;p&gt;In other cloud-related news HashiCorp &lt;strong&gt;&lt;a href="https://www.hashicorp.com/blog/hashicorp-adopts-business-source-license" rel="noopener noreferrer"&gt;announced&lt;/a&gt;&lt;/strong&gt; they're changing their products' license, including Terraform’s, to Business Source License, which sparked the foundation of OpenTF, which later was named OpenTofu.&lt;/p&gt;

&lt;h2&gt;
  
  
  September – Product Hunt Launch #2
&lt;/h2&gt;

&lt;p&gt;The majority of August was spent on preparations for our &lt;strong&gt;&lt;a href="https://www.producthunt.com/products/dyrector-io-platform#dyrector-io-3" rel="noopener noreferrer"&gt;Product Hunt launch&lt;/a&gt;&lt;/strong&gt; in September. The date was set – September 8th. We knew a product like ours only has a chance of a significant result on a Friday.&lt;/p&gt;

&lt;p&gt;The result: #6 in the daily rankings, top 50 in the weekly with around 260 votes. Definitely an impressive result with a heavily developer-focused tool.&lt;/p&gt;

&lt;p&gt;In the meantime, Levi took care of networking: he appeared in the &lt;strong&gt;&lt;a href="https://www.youtube.com/watch?v=ZU6ql5Wlcs0" rel="noopener noreferrer"&gt;Follow The Pattern&lt;/a&gt;&lt;/strong&gt; podcast, attended InfoBip’s Shift conference in Croatia, and went to Kubernetes Community Days in Vienna.&lt;/p&gt;

&lt;h2&gt;
  
  
  October – darklens Enters the Scene
&lt;/h2&gt;

&lt;p&gt;The biggest achievement of October in our household was a one-week sprint when more than half of the team was on vacation. Three teammates of ours joined forces, two developers and one marketer, to develop a complimentary product to dyrector.io.&lt;/p&gt;

&lt;p&gt;We named this tool &lt;strong&gt;&lt;a href="https://github.com/dyrector-io/darklens" rel="noopener noreferrer"&gt;darklens&lt;/a&gt;&lt;/strong&gt;, which makes Docker logs and container settings available in your browser. A week after the sprint we launched darklens on Product Hunt for an impressive #14 spot with 140 upvotes.&lt;/p&gt;

&lt;h2&gt;
  
  
  November – Team Building in Portugal
&lt;/h2&gt;

&lt;p&gt;Over the summer, the whole team was able to snag developer tickets to Web Summit in Lisbon. Soon as we got the confirmation, we started planning our travel to Portugal. With a little sightseeing and networking at the conference, the week we spent in Lisbon turned out to be a blast. We made a lot of new connections.&lt;/p&gt;

&lt;p&gt;One of the coolest things of the year was when people found the invitation card for our CTF puzzle and came to our Discord channel or stopped by to say hi at Web Summit.&lt;/p&gt;

&lt;h2&gt;
  
  
  December – 0.10.0. Dropped
&lt;/h2&gt;

&lt;p&gt;The latest release of dyrector.io, 0.10.0 dropped in early December. You can find out more about it on &lt;strong&gt;&lt;a href="https://github.com/dyrector-io/dyrectorio/releases/tag/0.10.0" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;That’s it for 2023. So long, and thanks for all the fish!&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This blogpost was written by the team of &lt;a href="https://dyrectorio.com" rel="noopener noreferrer"&gt;dyrector.io&lt;/a&gt;. dyrector.io is an open-source continuous delivery &amp;amp; deployment platform with version management.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Support us with a star on &lt;a href="https://github.com/dyrector-io/dyrectorio/" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;.&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>startup</category>
    </item>
    <item>
      <title>5 Use Cases When Containerization Is Absolutely Useless for You</title>
      <dc:creator>Geri Máté</dc:creator>
      <pubDate>Thu, 30 Nov 2023 14:19:59 +0000</pubDate>
      <link>https://forem.com/dyrectorio/5-use-cases-when-containerization-is-absolutely-useless-for-you-c3p</link>
      <guid>https://forem.com/dyrectorio/5-use-cases-when-containerization-is-absolutely-useless-for-you-c3p</guid>
      <description>&lt;h2&gt;
  
  
  #1 Static, Unchanging Environments
&lt;/h2&gt;

&lt;p&gt;If your application has minimal dependencies and operates consistently across different environments without the need for isolation, containerization may offer little benefit.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If your application will be the only process executed on the machine.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  #2 Limited Scalability Needs
&lt;/h2&gt;

&lt;p&gt;For applications with predictable and steady workloads that do not require rapid scaling or dynamic resource allocation, the overhead of containerization might outweigh the advantages.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Small scale IoT apps.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  #3 Simple, Standalone Applications
&lt;/h2&gt;

&lt;p&gt;In cases where your application is straightforward, lacks dependencies, and isn't part of a larger ecosystem with varied technologies, containerization may introduce unnecessary complexity.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Zero dependency binaries, and also debugging a host process is more straightforward than doing the same with a container.&lt;/li&gt;
&lt;li&gt;Offline applications installed from external medium, running without internet connection.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  #4 Resource-Constrained Environments
&lt;/h2&gt;

&lt;p&gt;On systems with extremely limited resources, such as embedded devices or constrained hardware, the overhead of running containerization platforms might not be justified.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Microelectronics.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  #5 Desktop Applications
&lt;/h2&gt;

&lt;p&gt;Sounds exotic, huh? For a good reason. It would be very unusual to use containers for desktop applications. Though similar isolation techniques exist, it is not widespread.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;cs_16_nosteam_portable.exe😅&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  If You Really Need to Containerize...
&lt;/h2&gt;

&lt;p&gt;You can use dyrector.io to deploy and manage containerized services.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;⭐ Star dyrector.io on GitHub:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/dyrector-io/dyrectorio" rel="noopener noreferrer"&gt;https://github.com/dyrector-io/dyrectorio&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>containers</category>
      <category>docker</category>
      <category>kubernetes</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Dagger 101: How to Get Started with Containerized CI Workflows</title>
      <dc:creator>Geri Máté</dc:creator>
      <pubDate>Thu, 23 Nov 2023 11:04:26 +0000</pubDate>
      <link>https://forem.com/dyrectorio/dagger-101-how-to-get-started-with-containerized-ci-workflows-105m</link>
      <guid>https://forem.com/dyrectorio/dagger-101-how-to-get-started-with-containerized-ci-workflows-105m</guid>
      <description>&lt;p&gt;&lt;strong&gt;Continuous Integration and Continuous Delivery are the secret sauces of shipping new features consistently and reliably to your software. However, the effectiveness of this process is closely tied to the tooling that orchestrates it. Some of the pain points of CI/CD systems are slow feedback loops, vendor lock-in, lack of abstraction, limited composability, or YAML itself. This is where Dagger comes into the spotlight, promising a more unified and accelerated path.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;The development and deployment process at dyrector.io has already become much faster each year as we adopt and integrate better tools and methods. However, we aim to further unify and accelerate this. Dagger philosophy aligns with what we consider crucial for a truly rapid and seamless process:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Local testing: Enable developers to test their code instantly, locally&lt;/li&gt;
&lt;li&gt;Programmable CI: Replace messy YAML-based, complex CI with code&lt;/li&gt;
&lt;li&gt;Compatibility: If it runs in a container, you can add it to your pipeline&lt;/li&gt;
&lt;li&gt;Portability: The same pipeline can run on your local machine, a CI runner, a dedicated server, or any container hosting service&lt;/li&gt;
&lt;li&gt;Universal caching: Every operation is cached by default, and caching works the same everywhere&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Currently, we have the option to use our own dyrector.io (we’ll refer to it as dyo many times in this blog post) go CLI with our commands or Docker Compose with its YAML to spin up our stack for local testing, while we also maintain a GitHub Actions workflow for running end-to-end tests on GitHub. This setup lacks coherence, as we cannot employ the specialized GitHub Actions workflow YAML in a local setting or with a different CI/CD environment.&lt;/p&gt;

&lt;p&gt;We want to get closer to being able to ship every single day, or even multiple times a day, as quickly as we possibly can, using the same tool running locally and in CI. Dagger feels like an actual innovation in CI/CD, and it seems it will enable us to do that. There is also a strong focus on getting feedback from the community and utilizing it when we’re designing and building something that people really need.&lt;/p&gt;

&lt;h2&gt;
  
  
  Setting up Dagger CI/CD
&lt;/h2&gt;

&lt;p&gt;We would like to use Dagger locally with the dyo Go CLI, and for this we need the Dagger Go SDK for integration (there are many Dagger SDKs) and the Dagger Engine, which will run our pipelines. We developed a small proof of concept (POC) to evaluate if we could use our entire stack locally with Dagger. If this POC will be successful, we plan to use the same setup in our GitHub workflow, essentially using GitHub Actions just to trigger the Dagger pipeline.&lt;/p&gt;

&lt;p&gt;Steps to set up Dagger for our project:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Install the Dagger Go SDK
(again, you can use any other Dagger SDK for your project, but we use Go)
Go to your existing project – in our case it is dyrectorio.
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ go get dagger.io/dagger
$ go mod tidy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Add local Dagger test to our Makefile
It is for simple and fast “make test” (similarly to our other commands).
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Shortcut for local testing
.PHONY: test
test:
    go run golang/cmd/dagger/main.go
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Create Dagger main.go&lt;br&gt;
We already have dyo, dagent and crane in our golang/cmd, so put dagger here too.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Import Dagger SDK&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Create a Dagger client using the SDK&lt;br&gt;
This will allow you to interact with the Dagger Engine and create pipelines.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Create Dagger pipelines&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Additional note:&lt;br&gt;
We can also install the Dagger CLI if we want to, but this is an optional tool to interact with the Dagger Engine from the command-line – it has a nice terminal UI though, with parallel progress bars that are visually impressive if you are into that sort of thing.&lt;/p&gt;

&lt;p&gt;Install the Dagger CLI&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ cd /usr/local
$ curl -L https://dl.dagger.io/dagger/install.sh | sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Workflow Integration
&lt;/h2&gt;

&lt;p&gt;As you will see, the “Dagger way” is a very “Docker-ish” way - no surprise, one of the co-founders of Dagger is Solomon Hykes, earlier founder and technical director of Docker.&lt;/p&gt;

&lt;p&gt;To show you concrete code examples from our POC:&lt;/p&gt;

&lt;p&gt;Import Dagger SDK&lt;br&gt;
In our main.go:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import (
    "context"
    "dagger.io/dagger"
    …)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Create a Dagger client using the SDK&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;func initDaggerClient(ctx context.Context) *dagger.Client {
    client, err := dagger.Connect(ctx, dagger.WithLogOutput(os.Stdout))
    if err != nil {
        panic(err)
    }
    return client
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And we can call this initDaggerClient() function in our main() like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;    ctx := context.Background()
    client := initDaggerClient(ctx)
    defer client.Close()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run unit tests on our NestJS-based Crux backend:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;func runCruxUnitTestPipeline(ctx context.Context, client *dagger.Client) {
    log.Info().Msg("Run crux unit test pipeline...")

    _, err := client.Container().From("node:20-alpine").
        WithDirectory("/src", client.Host().Directory("web/crux/"), dagger.ContainerWithDirectoryOpts{
            Exclude: []string{"node_modules"},
        }).
        WithWorkdir("/src").
        WithExec([]string{"npm", "ci"}).
        WithExec([]string{"npm", "run", "test"}).
        Stdout(ctx)
    if err != nil {
        panic(err)
    }

    log.Info().Msg("Crux unit test pipeline done.")
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can call this runCruxUnitTestPipeline() function in our main():&lt;br&gt;
    runCruxUnitTestPipeline(ctx, client)&lt;/p&gt;

&lt;p&gt;Run unit tests on our Next.js-based Crux UI frontend is very similar to the above code, we only need to change the host directory to “web/crux-ui/” and an additional “.next” exclusion, everything else remains the same:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;    WithDirectory("/src", client.Host().Directory("web/crux-ui/"), dagger.ContainerWithDirectoryOpts{
        Exclude: []string{"node_modules", ".next"},
    }).
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A slightly more advanced example when we run our Crux backend in production mode (as we do for e2e test) with a connected PostgreSQL DB service container:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;func getEnv(envPath string) map[string]string {
    cruxEnv, err := godotenv.Read(envPath)
    if err != nil {
        panic(err)
    }
    return cruxEnv
}

func getCruxPostgres(client *dagger.Client, cruxEnv map[string]string) *dagger.Container {
    databaseURL := cruxEnv["DATABASE_URL"]
    parsedURL, err := url.Parse(databaseURL)
    if err != nil {
        panic(err)
    }
    postgresUsername := parsedURL.User.Username()
    postgresPassword, _ := parsedURL.User.Password()
    postgresDB := strings.TrimPrefix(parsedURL.Path, "/")

    dataCache := client.CacheVolume("data")

    cruxPostgres := client.Pipeline("crux-postgres").Container().From("postgres:14.2-alpine").
        WithMountedCache("/data", dataCache).
        WithEnvVariable("POSTGRES_USER", postgresUsername).
        WithEnvVariable("POSTGRES_PASSWORD", postgresPassword).
        WithEnvVariable("POSTGRES_DB", postgresDB).
        WithEnvVariable("PGDATA", "/data/postgres").
        WithExposedPort(5432)

    return cruxPostgres
}

func runCruxProd(ctx context.Context, client *dagger.Client, cruxPostgres *dagger.Container) *dagger.Container {
    crux := client.Pipeline("crux").Container().From("node:20-alpine")
    crux = crux.
        WithDirectory("/src", client.Host().Directory("web/crux/"), dagger.ContainerWithDirectoryOpts{
            Exclude: []string{"node_modules"},
        }).
        WithWorkdir("/src").
        WithServiceBinding("localhost", cruxPostgres).
        // WithEnvVariable("NOCACHE", time.Now().String()).
        WithExec([]string{"npm", "ci"}).
        WithExec([]string{"npm", "run", "build"}).
        WithExec([]string{"npm", "run", "prisma:migrate"}).
        WithExec([]string{"npm", "run", "start:prod"})

    _, err := crux.Stdout(ctx)
    if err != nil {
        panic(err)
    }

    return crux
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can run the above code in our main() like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;    cruxEnv := getEnv("web/crux/.env") 
    cruxPostgres := getCruxPostgres(client, cruxEnv) 
    runCruxProd(ctx, client, cruxPostgres) 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We would like to note that we made our POC with Dagger 0.8.x during September, so the code snippets above will show that. But even then the new API development of Dagger Services v2 (which we will need for our complex e2e pipeline) was in progress at Dagger in a separate feature branch and they promised on their Discord forum back then that this new API with some breaking changes will be included in Dagger 0.9. It wasn’t just us showing demand for parallel long running service containers - and they kept their word and it is indeed included in Dagger 0.9.0 released at the end of October. Shouts to Team Dagger!&lt;/p&gt;

&lt;p&gt;We put our POC on hold in October, but we have been keeping an eye on Service v2 developments and news. We will try out Service v2 in the near future and dedicate another blog post to whether we managed to solve our entire e2e pipeline with Dagger.&lt;/p&gt;

&lt;p&gt;Dagger efficiently caches each step of the pipelines, automatically handling the caching of source code copies, containers and builds, and when developers configure it programmatically, it also caches mounted volumes such as database data, node_modules, and Go build-cache. Our logs provide clear examples of this on reruns without code modifications.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;    copy web/crux/ CACHED
    &amp;gt; in host.directory web/crux/
    …
    pull docker.io/library/postgres:14.2-alpine CACHED
    &amp;gt; in crux-postgres &amp;gt; from postgres:14.2-alpine
    &amp;gt; in crux &amp;gt; service bvqf991cmob5i.97ul8ph8qf1qc.dagger.local
    …
    exec docker-entrypoint.sh postgres
    &amp;gt; in crux &amp;gt; service bvqf991cmob5i.97ul8ph8qf1qc.dagger.local
    [0.15s] PostgreSQL Database directory appears to contain a database; Skipping initialization
    …
    [0.30s] 2023-11-08 10::11.131 UTC [15] LOG:  database system is ready to accept connections
    ...
    exec docker-entrypoint.sh npm run build CACHED
    &amp;gt; in crux
    exec docker-entrypoint.sh npm run prisma:migrate CACHED
    &amp;gt; in crux
    exec docker-entrypoint.sh npm ci CACHED
    &amp;gt; in crux
    copy / /src CACHED
    &amp;gt; in crux
    exec docker-entrypoint.sh npm run start:prod
    &amp;gt; in crux
    [0.57s] &amp;gt; crux@0.7.0 start:prod
    [0.57s] &amp;gt; node dist/main
    [2.31s] [Nest] 33  - 11/07/2023, 14:24:13.142 AM     LOG [NestFactory] Starting Nest application...
    ...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Challenges and Lessons Learned
&lt;/h2&gt;

&lt;p&gt;We were able to run most of our stack with Dagger 0.8.x, the Crux backend and the Crux-UI frontend separately, but our entire e2e test will require Dagger 0.9.x with the Services v2 API that we can run Crux, Crux-ui, Traefik and Kratos as long running service containers for the Playwright e2e container. &lt;/p&gt;

&lt;p&gt;If you want to know more about the Services v2, Dagger wrote a blog post about it here:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Dagger 0.9: Host-to-container, container-to-host, and other networking improvements:&lt;/strong&gt; &lt;a href="https://dagger.io/blog/dagger-0-9" rel="noopener noreferrer"&gt;https://dagger.io/blog/dagger-0-9&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Best Practices for Dagger CI/CD
&lt;/h2&gt;

&lt;p&gt;The fact that we can write the CI/CD code in Go and in a docker-like style had a refreshing effect on us. Here are some general tips:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Iterate small:&lt;/strong&gt; Start with a small POC to understand how Dagger fits into your workflow before scaling up&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Community engagement:&lt;/strong&gt; Stay active in Dagger's community forums or Discord channels for support and to keep up with the latest developments&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Documentation:&lt;/strong&gt; Keep your Dagger configurations well-documented to ease onboarding and maintenance&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Monitor and optimize:&lt;/strong&gt; Regularly review the performance of your pipelines and optimize caching strategies for better efficiency&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;We have seen firsthand the transformative nature of Dagger and the flexibility of its programmable pipelines. It stands out as a forward-thinking solution, addressing typical CI/CD bottlenecks with a developer-centric approach. Since Dagger is relatively new and evolving, keeping an eye on updates and community feedback can help in adopting best practices as they emerge.&lt;/p&gt;

&lt;h2&gt;
  
  
  Dagger Resources
&lt;/h2&gt;

&lt;p&gt;There's still lot to learn about Dagger, so it might be worth the time to check out the following resources to learn about this tool:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You can explore further on Dagger's official website: &lt;strong&gt;&lt;a href="https://dagger.io" rel="noopener noreferrer"&gt;https://dagger.io&lt;/a&gt;&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;For those eager to dive deeper into Dagger's capabilities, the Dagger documentation is an excellent resource: &lt;strong&gt;&lt;a href="https://docs.dagger.io" rel="noopener noreferrer"&gt;https://docs.dagger.io&lt;/a&gt;&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;For absolute hackers: &lt;strong&gt;&lt;a href="https://github.com/dagger/dagger" rel="noopener noreferrer"&gt;https://github.com/dagger/dagger&lt;/a&gt;&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Dagger Discord community: &lt;strong&gt;&lt;a href="https://discord.gg/dagger-io" rel="noopener noreferrer"&gt;https://discord.gg/dagger-io&lt;/a&gt;&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;This blogpost was written by the team of &lt;a href="https://dyrectorio.com" rel="noopener noreferrer"&gt;dyrector.io&lt;/a&gt;. dyrector.io is an open-source continuous delivery &amp;amp; deployment platform with version management.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Support us with a star on &lt;a href="https://github.com/dyrector-io/dyrectorio/" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;.&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>dagger</category>
      <category>cicd</category>
      <category>containers</category>
      <category>docker</category>
    </item>
    <item>
      <title>The One API - DevHunt Digest #6</title>
      <dc:creator>Geri Máté</dc:creator>
      <pubDate>Mon, 13 Nov 2023 09:48:53 +0000</pubDate>
      <link>https://forem.com/gerimate/the-one-api-devhunt-digest-6-1m23</link>
      <guid>https://forem.com/gerimate/the-one-api-devhunt-digest-6-1m23</guid>
      <description>&lt;p&gt;&lt;strong&gt;&lt;a href="https://devhunt.org/" rel="noopener noreferrer"&gt;DevHunt&lt;/a&gt; is the open-source platform where you can showcase your developer tool. Tools compete every week for the top spot. Here's a look at who's in the race this time.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Unified.to
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://unified.to/" rel="noopener noreferrer"&gt;Unified&lt;/a&gt; is an API platform that's supposed to substitute for API integrations, instead developers integrate Unified once and have access to 127 integrations available.&lt;/p&gt;

&lt;p&gt;After signing up, I was immediately directed to the dashboard where I'll see statistics about my integrations. Onboarding is easy, I like that they point to the resources you'd need in case you get stuck. Also I like that documentation isn't hidden somewhere, you navigate to Help menu and you can go check the docs.&lt;/p&gt;

&lt;p&gt;At first glance documentation might feel weird with all the section namings, but I liked that you can navigate to the integration's documentation that you'd like to use. Pretty good tool in general!&lt;/p&gt;

&lt;h2&gt;
  
  
  Papermark
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/mfts/papermark" rel="noopener noreferrer"&gt;Papermark&lt;/a&gt; is an open-source DocSend alternative. Checking out the landing page, I'm not sure whether I like the dude with sign image. First impression: I don't care if you're looking for an investor. BUT, when I think about it, it's a good signal for expectations.&lt;/p&gt;

&lt;p&gt;And let me tell you: Papermark is very good at what it's supposed to achieve. Send a pitch deck in PDF format and get analytics about it. I haven't tried setting it up for myself, but I might give it a try one day when I feel like it.&lt;/p&gt;

&lt;p&gt;Another thing I liked about the landing page is the alternatives section in the footer. As a user you're probably not familiar to what you can do with Papermark but especially if you're working in a startup, it's realistic that you're using some kind of tool to send decks and such. Pretty useful to have some comparison to other tools.&lt;/p&gt;

&lt;h2&gt;
  
  
  Task Badger
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://taskbadger.net/" rel="noopener noreferrer"&gt;Task Badger&lt;/a&gt; is a monitoring solution for backend tasks and queues. It's a useful tool when you'd like to visualize your backend's performance.&lt;/p&gt;

&lt;p&gt;Task Badger is designed for engineers and they included lots of examples to provide starting points. Right next to the sign up button, they included a button that directs users to documentation - another brownie point for Task Badger.&lt;/p&gt;

&lt;p&gt;I like the docs, too, but I think it could be improved with some of the individual sections turned into separate, smaller sections. For example, the quick start guide can be broken down into a separate sections for API and CLI users. I've found a weird thing though: this &lt;a href="https://docs.taskbadger.net/web_example/" rel="noopener noreferrer"&gt;page&lt;/a&gt; of the documentation can't be accessed from the sections list or the table of content, just through a link in the getting started guide. I wouldn't hide it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pontus
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://www.pontus.so/" rel="noopener noreferrer"&gt;Pontus&lt;/a&gt; is a privacy-focussed AI tool. I can't try or look at it because you can only request a demo as of now.&lt;/p&gt;

&lt;h2&gt;
  
  
  Recombinant AI
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://recombinant.ai/" rel="noopener noreferrer"&gt;Recombinant AI&lt;/a&gt; is a conversational IDE tool which based on &lt;a href="https://www.youtube.com/watch?v=nZY0M8KOlk4" rel="noopener noreferrer"&gt;this demo video&lt;/a&gt; can only be used with paid access to ChatGPT because it's essentially a ChatGPT plugin. It's not easy to find out more about the project, as the landing page itself isn't really informative about what you can do with Recombinant AI.&lt;/p&gt;

&lt;h2&gt;
  
  
  DailyDomains
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://dailydomains.io/" rel="noopener noreferrer"&gt;DailyDomains&lt;/a&gt; is a simple tool for domain hoarding enthusiasts. I mean, 2023 was the first year I purchased a domain and I didn't stop there. I assume there are many people who just think about an idea and immediately buy the domain knowing well they'll never make the solution.&lt;/p&gt;

&lt;p&gt;Anyway, DailyDomains takes it a step further. It'll suggest you a few domains and generate a business idea for it. I kind of like this approach! For the small price of $12/month, you can use it to brainstorm domains for your business idea, which is probably a gamechanger to any indie hacker struggling to name their thing.&lt;/p&gt;

&lt;p&gt;My only question is: how come this has so few upvotes days into voting?&lt;/p&gt;

&lt;h2&gt;
  
  
  Squirrelsong
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/sapegin/squirrelsong" rel="noopener noreferrer"&gt;Squirrelsong&lt;/a&gt; is a low-contrast light and dark theme. You can find out more about the themes &lt;a href="https://sapegin.me/squirrelsong/" rel="noopener noreferrer"&gt;here&lt;/a&gt;. I recommend at least a look at this, because I tried it with Google Chrome and it looks great.&lt;/p&gt;

&lt;h2&gt;
  
  
  Maruti.io
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://maruti.io/" rel="noopener noreferrer"&gt;Maruti.io&lt;/a&gt; is an API for open-source language models. Based on the landing page it's difficult to figure out what's the purpose of this project, but after looking around and checking out the launch, it seems like an MLOps platform that can be utilized via an API.&lt;/p&gt;

&lt;p&gt;I think there's a lot to improve, because documentation is very rudimentary, you can see it for yourself &lt;a href="https://maruti.io/docs" rel="noopener noreferrer"&gt;here&lt;/a&gt;. And as someone who's not a native English speaker, I can understand how difficult it can be to write copy, but the lack of copy is a bigger problem than the quality of the copy.&lt;/p&gt;

&lt;h2&gt;
  
  
  Vite Plugin
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://www.npmjs.com/package/react-remove-attr" rel="noopener noreferrer"&gt;Vite Plugin&lt;/a&gt; is an open-source plugin that removes React.js attributes. It's useful for excluding attributes like 'data-testid' used in testing. Options include specific file extensions, attributes, ignored folders, and files.&lt;/p&gt;

&lt;h2&gt;
  
  
  blogfactory.dev
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://blogfactory.dev/" rel="noopener noreferrer"&gt;Blog Factory&lt;/a&gt; is a blog post generator tool. Getting started needs a bit of fixing: when you click on the Get started button, you should be directed to the log in page. Without log in you're stuck in the Create your first article flow where you can't do anything.&lt;/p&gt;

&lt;p&gt;When you'd like to generate a blog post, you can specify a title and keywords, then set style-related options, including language, flavor (SEO friendly article, how-to guides, etc.) and writer. It has a persona option, too, but the only option for that is none as of now.&lt;/p&gt;

&lt;p&gt;I gave it a test run to write a similar how-to blog post to our latest one of &lt;a href="https://github.com/dyrector-io/dyrectorio" rel="noopener noreferrer"&gt;dyrector.io's&lt;/a&gt; blog discussing &lt;a href="https://dev.to/dyrectorio/why-you-should-self-host-github-runners-or-stay-away-from-it-3p84"&gt;self-hosted GitHub runners&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0f6n0jzh7keas7zsq1rp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0f6n0jzh7keas7zsq1rp.png" alt="Blog Factory generated a blog post about self-hosted GitHub Runners" width="800" height="431"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Of course, it's not going to be as detailed as written by a human, and our case is very specific when it comes to GitHub runners, but I think there's potential in Blog Factory. It would be pretty dope to have a tool that can accelerate content writing for developer tools, because small teams and indie hackers usually can't find a way to consistently create new content.&lt;/p&gt;

&lt;h2&gt;
  
  
  NoCode Animations
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://bubble.https://bubble.io/plugin/animations-%7C-morph-%7C-animejs-1691691994408x352894891788861440" rel="noopener noreferrer"&gt;NoCode Animations&lt;/a&gt; is an animation Anime.js tool on Bubble.&lt;/p&gt;

&lt;h2&gt;
  
  
  SkillAI
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://skillai.io/" rel="noopener noreferrer"&gt;SkillAI&lt;/a&gt; is an AI generator tool that helps you design learn paths for skills you'd like to develop. You can input any skill you'd like, so I went with this below:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhlmg4fwxefrq11wa7t0b.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhlmg4fwxefrq11wa7t0b.png" alt="Image description" width="800" height="431"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Bricks AI
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://www.bricks.sh/" rel="noopener noreferrer"&gt;Bricks AI&lt;/a&gt; is a tool that helps teams use business applications in a conversational way. Right now it can't be used, only a waiting list is available.&lt;/p&gt;

&lt;h2&gt;
  
  
  Bird Eats Bug
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://birdeatsbug.com/feature/tech-mode" rel="noopener noreferrer"&gt;Bird Eats Bug&lt;/a&gt; is a tool that helps you manage bug reports and fixes more efficiently. One of the coolest things about this is the bug replay feature which allows you to recreate bugs that you missed tracking somehow.&lt;/p&gt;

&lt;h2&gt;
  
  
  Kropply
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://www.kropply.com/" rel="noopener noreferrer"&gt;Kropply&lt;/a&gt; is a coding assistant tool that helps you discover bugs within your code. It works as a VS Code extension. It's compatible with some of the most popular languages: C#, C++, C, Java, Go, Rust, JS, TS, Python.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;That's it for the weekly batch of developer tools that launched on DevHunt. What's your favorite project out of them? Leave it in the comments and show some love by casting a vote!&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>devhunt</category>
      <category>api</category>
      <category>backend</category>
    </item>
    <item>
      <title>Why You Should Self-Host GitHub Runners – Or Stay Away from It</title>
      <dc:creator>Geri Máté</dc:creator>
      <pubDate>Wed, 08 Nov 2023 12:23:24 +0000</pubDate>
      <link>https://forem.com/dyrectorio/why-you-should-self-host-github-runners-or-stay-away-from-it-3p84</link>
      <guid>https://forem.com/dyrectorio/why-you-should-self-host-github-runners-or-stay-away-from-it-3p84</guid>
      <description>&lt;p&gt;&lt;strong&gt;GitHub Actions is the Alfred to your Batman. When you don’t feel like doing something or simply don’t have the capacity to handle various tasks, you can rely on GitHub Actions to automate workflows. You can take GitHub Actions to the next level by self-hosting runners, though. But should you? Let’s find out!&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Self-Hosted GitHub Runners Are Beneficial
&lt;/h2&gt;

&lt;p&gt;We’ve been managing dyrector.io’s code on GitHub for more than a year now. One thing we’ve always struggled with was slow GitHub Actions workflows. Here’s why we’ve been contemplating switching to self-hosting our GitHub Runners.&lt;/p&gt;

&lt;h3&gt;
  
  
  Speed
&lt;/h3&gt;

&lt;p&gt;The default GitHub runner takes longer to execute as it initializes an ephemeral runner for each job in a workflow from scratch. This method, chosen by GitHub for its simplicity and security, has its merits. Compared to this, self-hosted runners remain active, bypassing the initialization phase for every job, thus providing quicker execution. This continuous availability demands proper management to ensure subsequent runs are not interfered with by remnants from previous executions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Control
&lt;/h3&gt;

&lt;p&gt;GitHub-hosted runners run on Ubuntu with a 2-core CPU, limiting parallel job executions to four. In a self-hosted scenario, we have the liberty to choose other OSes. We opted for Rocky Linux over Ubuntu for its open-source, enterprise-grade, and 100% Red Hat compatibility. This choice also allowed us to define the VM's hardware parameters like CPU, memory and disk type/size. However, this freedom comes at the cost of increased maintenance overhead. &lt;/p&gt;

&lt;h3&gt;
  
  
  Debugging / Monitoring
&lt;/h3&gt;

&lt;p&gt;Debugging is more challenging on GitHub-hosted runners as only error messages and logs are retained. In the meantime, self-hosted runners keep everything in the “_work” and “_diag” directories, allowing real-time monitoring to understand precisely what is happening and the resources being consumed, as the running VM is under our control. &lt;/p&gt;

&lt;p&gt;As we look into the future and explore opportunities for further improvement: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Writing in YAML, especially for CI/CD purposes, often necessitates additional scripts to handle various build and runtime conditions in a workflow. This can result in a fragmented view of the process.&lt;/li&gt;
&lt;li&gt;Alternatively, or in addition, leveraging the power of &lt;strong&gt;&lt;a href="https://dagger.io/" rel="noopener noreferrer"&gt;Dagger&lt;/a&gt;&lt;/strong&gt; CI/CD could offer a more streamlined approach to creating workflows. Dagger CI/CD allows you to use real programming languages through the Dagger SDK.&lt;/li&gt;
&lt;li&gt;For example, we have chosen to use the Dagger Go SDK, which enables the creation of unified workflows. These workflows can run seamlessly, whether it's locally, on GitHub-hosted runners, self-hosted runners, or other CI/CD frameworks, with minimal or no need for significant modifications. This approach entirely avoids the need for extensive YAML configurations, providing a more efficient and flexible way to manage your CI/CD pipelines.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Few Reasons Why You Shouldn’t Self-Host Runners
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Convenience
&lt;/h3&gt;

&lt;p&gt;The default GitHub hosted runner functionality is free and comes with autoscaling if we look at the submitted parallel pull requests, so you don't have to do anything for them, they are simply there and doing their job. We obviously lose this default behavior if we go on the self-hosted route.&lt;/p&gt;

&lt;h3&gt;
  
  
  Setup/Maintenance
&lt;/h3&gt;

&lt;p&gt;The initial setup requires a learning curve, and maintaining the runners can demand a fair share of time. It is not so much the setting of the runners themselves, but rather the maintenance, updating, securing the VM(s) and the correct initial setting of the workflow to manage the clean up and teardown side steps for every job and job step, if necessary.&lt;/p&gt;

&lt;h3&gt;
  
  
  Security Concerns
&lt;/h3&gt;

&lt;p&gt;Self-hosted runners may expose your environment to potential security risks if not configured and managed properly. Something even GitHub recommends in its official &lt;strong&gt;&lt;a href="https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners/adding-self-hosted-runners" rel="noopener noreferrer"&gt;docs&lt;/a&gt;&lt;/strong&gt; is to use self-hosted runners with private repositories. Here's a more detailed &lt;strong&gt;&lt;a href="https://docs.github.com/en/actions/security-guides/security-hardening-for-github-actions#hardening-for-self-hosted-runners" rel="noopener noreferrer"&gt;description&lt;/a&gt;&lt;/strong&gt; about security measures for GitHub runners.&lt;/p&gt;

&lt;h2&gt;
  
  
  Setting Up Self-Hosted Runners
&lt;/h2&gt;

&lt;p&gt;Ensure your system meets GitHub's minimum requirements, which include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;2-core CPU&lt;/li&gt;
&lt;li&gt;7 GB RAM&lt;/li&gt;
&lt;li&gt;14 GB SSD storage&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We used a larger machine with the following specifications:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;16 vCPUs&lt;/li&gt;
&lt;li&gt;32 GiB memory&lt;/li&gt;
&lt;li&gt;Initially, 16 GB SSD (later upgraded to 64 GB)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The upgrade was necessary due to the combined temporary space needs of our code, node.js, and about 10 docker containers, including a playwright container for testing. Our runners resided on an additional data disk, leaving about 8 GB free on the system disk.&lt;/p&gt;

&lt;p&gt;Instead of using multiple small VMs with one runner each, we chose to use one large VM hosting several parallel runners. This approach minimizes VM maintenance overhead and is designed to efficiently handle multiple parallel GitHub pull requests.&lt;/p&gt;

&lt;p&gt;Future scaling is straightforward as setting up additional runners and/or VMs is not complicated; runners distribute workflow jobs based on common labels regardless of their VM location.&lt;/p&gt;

&lt;p&gt;We set up our self-hosted runners with these steps, here we will show actions-runner-001, but it was done in a similar way for our runners 002, 003 and so on.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Create a new runner
&lt;/h3&gt;

&lt;p&gt;At your GitHub repository’s Settings, in the left sidebar click Actions, then click Runners and finally click New self-hosted runner. Select the OS image and architecture of your self-hosted runner machine. In our case it is Linux and x64.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Download the runner installer
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Create a folder
$ mkdir actions-runner-001 &amp;amp;&amp;amp; cd actions-runner-001
# Download the latest runner package
$ curl -o actions-runner-linux-x64-2.311.0.tar.gz -L https://github.com/actions/runner/releases/download/v2.311.0/actions-runner-linux-x64-2.311.0.tar.gz
# Optional: Validate the hash
# On Rocky Linux you may need to install shasum once for this validation
$ sudo dnf update
$ sudo dnf install -y perl-Digest-SHA
$ echo "29fc8cf2dab4c195bb147384e7e2c94cfd4d4022c793b346a6175435265aa278  actions-runner-linux-x64-2.311.0.tar.gz" | shasum -a 256 –c
# Extract the installer
$ tar xzf ./actions-runner-linux-x64-2.311.0.tar.gz
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 3: Install runner dependencies &lt;em&gt;(if needed)&lt;/em&gt;
&lt;/h3&gt;

&lt;p&gt;We only need to do this step once per VM, not per runner. You can skip this step if your OS already contains these dependencies, but for Rocky Linux 9.2 it was necessary.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Install dependencies (on Rocky Linux dotnet core 6 was missing by default) 
$ sudo ./bin/installdependencies.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We also installed node.js, go and docker on our VM for our workflow, but these are not runner dependencies, so we will not go into detail about that here.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4: Configure the runner
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Create the runner and configure it
$ ./config.sh --url https://github.com/dyrector-io/dyrectorio --token &amp;lt;RUNNER_TOKEN&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;During the configuration process, you can keep most settings at their default values, but we chose to make our runners easily identifiable by giving them unique names and adding extra labels. Initially, the configuration script provides a common name, but our objective was to test multiple runners on a single VM.&lt;/p&gt;

&lt;p&gt;By default, a runner is tagged with three labels for Linux x64: self-hosted, Linux, and X64. However, you have the flexibility to specify additional labels during the initial configuration or later on the GitHub repository website. Unlike the default labels, you can add or remove these custom labels at any time. These labels come in handy for targeting specific groups of runners or individual runners within your workflow.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 5: Set pre-job script
&lt;/h3&gt;

&lt;p&gt;Pre-job script is not mandatory if you do not want to use it, but we need it. &lt;br&gt;
In the runner directory just create a .env file with this content:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;ACTIONS_RUNNER_HOOK_JOB_STARTED=pre-job-script.sh&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;And in the pre-job bash script file you can use your additional VM specific logic which will run before every job. Important to write “exit 0” at the end of the script file, because this means the script run without errors – otherwise or if you return any other value the runner will skip this job. You can also use this to your advantage for pre checks.&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 6: Start the runner
&lt;/h3&gt;

&lt;p&gt;You can start the runner with its run script (&lt;code&gt;$ ./run.sh&lt;/code&gt;), but we want to run it as a service so first need to install the service and on Rocky Linux we also need to set the SELinux security context for the runsvc.sh file to ensure it operates correctly within the SELinux security policy (otherwise it will be blocked). We only need to set SELinux context and service install once.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Set SELinux context for the runsvc script to s0 (standard security level)
$ sudo chcon system_u:object_r:usr_t:s0 runsvc.sh
# Install the runner as a service
$ sudo ./svc.sh install
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now we can use the service with its start, stop, status commands.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Start the runner service
$ sudo ./svc.sh start
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After completing these steps, the runner and its status are now listed under "Runners" of the GitHub repository.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 7: Execute a workflow on self-hosted runners
&lt;/h3&gt;

&lt;p&gt;In your workflow file, use the following YAML for each job, adjusting the label(s) as per your runner configuration:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;runs-on: self-hosted&lt;/code&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Security Tips
&lt;/h2&gt;

&lt;p&gt;Additional security measures for our public open-source repository: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;We use CODEOWNERS file for our repository.&lt;/li&gt;
&lt;li&gt;In the repository settings, we have the "Require approval for all outside collaborators" option enabled instead of the default "Require approval for first-time contributors".&lt;/li&gt;
&lt;li&gt;Before allowing any external pull requests to run, we check if any workflow files have been modified! (It is easy to spot if anything appears in .github/workflows, without much approval overhead)&lt;/li&gt;
&lt;li&gt;We use our self-hosted GitHub runner with an isolated Azure VM in its own resource group.&lt;/li&gt;
&lt;li&gt;We take care of updating the runner VM's OS to ensure it is always up to date from a security perspective.&lt;/li&gt;
&lt;li&gt;We run external pull requests on a GitHub runner, while we run our own pull requests on our self-hosted runner. This is determined by a necessary pre-job in our workflows, based on the submitter's identity, assigning the appropriate "runs-on" label to the subsequent jobs.&lt;/li&gt;
&lt;li&gt;In the runner's “_diag“ and “_work“ directories, we can review diagnostic logs for both the workflow runs and the runner itself, as well as the checked-out code in the "workflows private" directory."&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Self-hosted GitHub runners offer more freedom and level of control that can significantly boost the efficiency of your development workflow. However, they come with the overhead of setup, maintenance, and potential security concerns. Assessing your project’s needs and your team’s capacity to manage self-hosted runners is crucial before diving in. With proper setup and management, self-hosted runners can indeed be a valuable asset to your development process.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This blogpost was written by the team of &lt;a href="https://dyrectorio.com" rel="noopener noreferrer"&gt;dyrector.io&lt;/a&gt;. dyrector.io is an open-source continuous delivery &amp;amp; deployment platform with version management.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Support us with a star on &lt;a href="https://github.com/dyrector-io/dyrectorio/" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;.&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>github</category>
      <category>cicd</category>
      <category>githubactions</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
