<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Ashok Kanjarla</title>
    <description>The latest articles on Forem by Ashok Kanjarla (@ashok_kanjarla_ai).</description>
    <link>https://forem.com/ashok_kanjarla_ai</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3634061%2F6a0fedd7-4244-439c-976e-23032ec30d33.png</url>
      <title>Forem: Ashok Kanjarla</title>
      <link>https://forem.com/ashok_kanjarla_ai</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/ashok_kanjarla_ai"/>
    <language>en</language>
    <item>
      <title>AI Will Not Replace Engineers — But It Will Expose Shallow Thinkers.</title>
      <dc:creator>Ashok Kanjarla</dc:creator>
      <pubDate>Thu, 05 Mar 2026 04:24:02 +0000</pubDate>
      <link>https://forem.com/ashok_kanjarla_ai/ai-will-not-replace-engineers-but-it-will-expose-shallow-thinkers-1hbe</link>
      <guid>https://forem.com/ashok_kanjarla_ai/ai-will-not-replace-engineers-but-it-will-expose-shallow-thinkers-1hbe</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Over the past few years, artificial intelligence has rapidly entered the daily workflow of software engineers. Tools capable of generating code, explaining complex algorithms, debugging issues, and even proposing architectural solutions have changed the development landscape dramatically. As these tools improved, a new narrative began to dominate discussions across technology forums, social media, and industry conferences: &lt;strong&gt;AI will replace software engineers.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;At first glance, the claim appears convincing. AI systems can generate working code within seconds, automate repetitive tasks, and significantly reduce the time required to build features. To many observers, it seems inevitable that automation will reduce the need for human developers.&lt;/p&gt;

&lt;p&gt;However, this conclusion misunderstands the real shift happening inside engineering teams.&lt;/p&gt;

&lt;p&gt;Artificial intelligence is not replacing engineers. Instead, it is revealing a deeper distinction that has always existed in the industry: the difference between engineers who understand systems and engineers who only understand syntax.&lt;/p&gt;

&lt;p&gt;The true disruption is not the elimination of coding jobs. It is the &lt;strong&gt;automation of shallow work&lt;/strong&gt;—tasks that rely on pattern recognition rather than deep reasoning. As AI becomes more capable of generating code, the value of engineers who can reason about architecture, constraints, and long-term system behavior increases dramatically.&lt;/p&gt;

&lt;p&gt;The future of engineering will not be defined by who can write code the fastest. It will be defined by who can &lt;strong&gt;evaluate, design, and sustain systems under real-world complexity&lt;/strong&gt;.&lt;/p&gt;




&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdvf9yn3sdoxelhxex7ou.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdvf9yn3sdoxelhxex7ou.jpeg" alt="Image" width="800" height="457"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa4ma8fcl1g1v0b1mlhpe.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa4ma8fcl1g1v0b1mlhpe.png" alt="Image" width="800" height="484"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy9022vlbf3ca4h4wqgcl.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy9022vlbf3ca4h4wqgcl.jpg" alt="Image" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmedia.licdn.com%2Fdms%2Fimage%2Fv2%2FD4D12AQF2ATFol_OqBw%2Farticle-cover_image-shrink_720_1280%2Farticle-cover_image-shrink_720_1280%2F0%2F1727158304223%3Fe%3D2147483647%26t%3D3b68_QfTzjTCZEBrCq0MVVtoDpqJtv-vABAPV9MThfQ%26v%3Dbeta" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmedia.licdn.com%2Fdms%2Fimage%2Fv2%2FD4D12AQF2ATFol_OqBw%2Farticle-cover_image-shrink_720_1280%2Farticle-cover_image-shrink_720_1280%2F0%2F1727158304223%3Fe%3D2147483647%26t%3D3b68_QfTzjTCZEBrCq0MVVtoDpqJtv-vABAPV9MThfQ%26v%3Dbeta" alt="Image" width="760" height="427"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Anti-Pattern: Tool Dependency Without Understanding
&lt;/h2&gt;

&lt;p&gt;The early adoption of AI coding tools has introduced a dangerous pattern in some engineering environments. Developers increasingly treat large language models as authoritative sources of solutions rather than as productivity accelerators.&lt;/p&gt;

&lt;p&gt;The workflow often becomes simple and repetitive: describe the problem, receive generated code, paste it into the project, and move forward. For small scripts or isolated utilities, this approach can appear efficient and harmless. However, the problems emerge when the same workflow is applied to complex systems.&lt;/p&gt;

&lt;p&gt;AI-generated solutions frequently work under ideal conditions but fail under real-world constraints. Generated database queries may appear correct yet introduce performance bottlenecks at scale. Suggested concurrency implementations may ignore subtle race conditions. Generated API patterns may overlook security considerations such as injection vulnerabilities or authentication weaknesses.&lt;/p&gt;

&lt;p&gt;When these issues arise in production environments, someone must diagnose the root cause. Someone must understand the underlying architecture well enough to identify the flaw and design a stable correction.&lt;/p&gt;

&lt;p&gt;If the developer relying on AI lacks that depth of understanding, debugging becomes slow and unpredictable. The tool that was intended to accelerate productivity instead multiplies complexity and risk.&lt;/p&gt;

&lt;p&gt;Artificial intelligence can increase speed, but speed alone does not guarantee correctness. In distributed systems, speed without architectural judgment often leads to instability.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Core Engineering Requirement in the AI Era
&lt;/h2&gt;

&lt;p&gt;Before AI-assisted development became common, the primary limitation of less experienced engineers was often their ability to implement code efficiently. Writing large amounts of working code required time, concentration, and practice.&lt;/p&gt;

&lt;p&gt;Today, that limitation has shifted dramatically.&lt;/p&gt;

&lt;p&gt;AI systems can generate entire modules, refactor legacy functions, and suggest implementations within seconds. This means the bottleneck in engineering workflows is no longer code production—it is &lt;strong&gt;code evaluation&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Engineers must now determine whether generated solutions are safe, scalable, and maintainable. They must evaluate whether an AI-generated algorithm will degrade under heavy load, whether a proposed architecture will survive service failures, and whether a generated pattern aligns with long-term system goals.&lt;/p&gt;

&lt;p&gt;This shift increases the importance of foundational engineering knowledge. Skills such as distributed system reasoning, database modeling, concurrency control, and failure-mode analysis become even more valuable in an AI-assisted environment.&lt;/p&gt;

&lt;p&gt;When solutions become abundant, &lt;strong&gt;judgment becomes the scarce resource&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The engineers who thrive in the AI era will not be those who generate the most code. They will be those who can &lt;strong&gt;identify which generated code should never reach production&lt;/strong&gt;.&lt;/p&gt;




&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7shoxxnl5iigkpvz8apf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7shoxxnl5iigkpvz8apf.png" alt="Image" width="800" height="865"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzj324b018u1ptl1grjq4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzj324b018u1ptl1grjq4.png" alt="Image" width="701" height="633"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.openai.com%2Fstatic-rsc-3%2FKDgHYsTig07uGZ3vPFLHyX4AribHZFETVVlK6ddtL0lEIgYrOl8ZlfIiu88yv_ziU_4bMFCjqsgjRuu-7tDscBv_73ezIRgmF1_80BrI3rc%3Fpurpose%3Dfullsize%26v%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.openai.com%2Fstatic-rsc-3%2FKDgHYsTig07uGZ3vPFLHyX4AribHZFETVVlK6ddtL0lEIgYrOl8ZlfIiu88yv_ziU_4bMFCjqsgjRuu-7tDscBv_73ezIRgmF1_80BrI3rc%3Fpurpose%3Dfullsize%26v%3D1" alt="Image" width="2309" height="1299"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.openai.com%2Fstatic-rsc-3%2F9Kh8tBwLkmRza-llkeWzWqTbbnYZ6VBKuWPW7bHtJVztJp_EzRLNNlcT9jSBnFKseDc01BPY-lgQc5X0cJvOBM_mSAHQaWWzUIdTXZH7UCw%3Fpurpose%3Dfullsize%26v%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.openai.com%2Fstatic-rsc-3%2F9Kh8tBwLkmRza-llkeWzWqTbbnYZ6VBKuWPW7bHtJVztJp_EzRLNNlcT9jSBnFKseDc01BPY-lgQc5X0cJvOBM_mSAHQaWWzUIdTXZH7UCw%3Fpurpose%3Dfullsize%26v%3D1" alt="Image" width="2309" height="1299"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Mental Model Shift
&lt;/h2&gt;

&lt;p&gt;As artificial intelligence becomes embedded in development workflows, two distinct mindsets are emerging among engineers.&lt;/p&gt;

&lt;p&gt;The first group can be described as &lt;strong&gt;tool operators&lt;/strong&gt;. These developers focus primarily on maximizing output by using AI prompts to generate solutions quickly. Their primary goal is speed and immediate functionality.&lt;/p&gt;

&lt;p&gt;The second group can be described as &lt;strong&gt;system designers&lt;/strong&gt;. These engineers use AI tools as accelerators, but they remain focused on the structural integrity of the system being built. They evaluate solutions based on constraints such as reliability, scalability, maintainability, and operational complexity.&lt;/p&gt;

&lt;p&gt;Tool operators typically ask a simple question: &lt;em&gt;“What prompt will generate the code I need?”&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;System designers ask a far more important question: &lt;em&gt;“Under what conditions will this solution fail?”&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This difference in perspective becomes increasingly important as systems grow in complexity. Artificial intelligence can generate structure, but it cannot assume responsibility for production incidents, service outages, or security breaches.&lt;/p&gt;

&lt;p&gt;Engineers still own those outcomes.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Trade-Off Reality
&lt;/h2&gt;

&lt;p&gt;Software engineering has always been about trade-offs. Every system design must balance competing priorities such as performance, reliability, cost, and maintainability. Artificial intelligence does not eliminate these trade-offs—it simply accelerates the creation of solutions that must still be evaluated against them.&lt;/p&gt;

&lt;p&gt;AI-generated code often prioritizes functional correctness, meaning the code compiles and produces the expected output for common inputs. However, non-functional requirements are frequently overlooked.&lt;/p&gt;

&lt;p&gt;These requirements include system scalability, latency tolerance, security constraints, data consistency, and operational observability. A piece of code that functions perfectly in isolation may fail when deployed into a distributed environment handling thousands of requests per second.&lt;/p&gt;

&lt;p&gt;The responsibility for identifying these risks remains with engineers.&lt;/p&gt;

&lt;p&gt;AI can suggest solutions, but it cannot fully reason about complex operational environments or anticipate every edge case that arises in production systems.&lt;/p&gt;

&lt;p&gt;Engineering maturity is therefore not measured by how quickly code can be produced. It is measured by how accurately engineers can anticipate where that code might fail.&lt;/p&gt;




&lt;h2&gt;
  
  
  Responsible Use of AI in Engineering
&lt;/h2&gt;

&lt;p&gt;Integrating artificial intelligence into development workflows requires discipline. Engineers must treat AI-generated output as &lt;strong&gt;suggestions rather than authoritative answers&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;In practice, this means applying several principles when working with AI-assisted tools.&lt;/p&gt;

&lt;p&gt;AI-generated code should always undergo the same level of review as code written manually. Architectural decisions should be validated against real-world system constraints rather than accepted at face value. Generated solutions must be evaluated for security vulnerabilities, performance implications, and maintainability.&lt;/p&gt;

&lt;p&gt;AI is extremely effective at accelerating repetitive or boilerplate tasks. It can also assist with brainstorming implementation strategies and explaining unfamiliar concepts. However, the responsibility for evaluating correctness, safety, and long-term viability must remain with the engineer.&lt;/p&gt;

&lt;p&gt;The difference between responsible AI usage and careless reliance lies in whether the engineer remains in control of the reasoning process.&lt;/p&gt;




&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh3yv2mj4s52cq4nxy689.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh3yv2mj4s52cq4nxy689.jpg" alt="Image" width="800" height="449"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.dau.edu%2Fsites%2Fdefault%2Ffiles%2Finline-images%2FTechnical-mgmt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.dau.edu%2Fsites%2Fdefault%2Ffiles%2Finline-images%2FTechnical-mgmt.png" alt="Image" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw0v8wf1xn6ynb2ar2r4v.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw0v8wf1xn6ynb2ar2r4v.png" alt="Image" width="800" height="952"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Observed Impact in Engineering Teams
&lt;/h2&gt;

&lt;p&gt;Organizations that adopt AI tools responsibly often experience measurable improvements in productivity. Engineers can reduce time spent on repetitive coding tasks, accelerate debugging processes, and prototype solutions more rapidly.&lt;/p&gt;

&lt;p&gt;When used correctly, AI becomes a powerful assistant that frees engineers to focus on higher-level design and decision-making.&lt;/p&gt;

&lt;p&gt;However, teams that rely on AI without sufficient engineering discipline often encounter the opposite effect. Debugging cycles become longer as poorly understood generated code introduces hidden complexity. Systems become harder to maintain as inconsistent patterns emerge throughout the codebase. Security vulnerabilities appear in places where generated solutions were accepted without proper review.&lt;/p&gt;

&lt;p&gt;The difference between these outcomes is not the technology itself but the &lt;strong&gt;level of engineering maturity applied to its use&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Final Thought
&lt;/h2&gt;

&lt;p&gt;Artificial intelligence will not eliminate the need for engineers. What it will eliminate is the illusion that writing code alone defines engineering expertise.&lt;/p&gt;

&lt;p&gt;As code generation becomes easier, the value of architectural thinking becomes greater. Engineers who understand distributed systems, performance constraints, and long-term design trade-offs will become even more essential in the development process.&lt;/p&gt;

&lt;p&gt;The competitive advantage in the AI era will not belong to those who generate the most code. It will belong to those who can evaluate systems deeply, design resilient architectures, and make sound technical decisions under uncertainty.&lt;/p&gt;

&lt;p&gt;AI will not replace engineers.&lt;/p&gt;

&lt;p&gt;But it will expose the difference between &lt;strong&gt;those who merely produce code and those who truly understand how software systems behave&lt;/strong&gt;. And in the long run, depth always compounds.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>softwaredevelopment</category>
      <category>careerdevelopment</category>
      <category>programming</category>
    </item>
    <item>
      <title>Building a Production-Ready AI Gateway in ASP.NET Core</title>
      <dc:creator>Ashok Kanjarla</dc:creator>
      <pubDate>Tue, 17 Feb 2026 17:22:13 +0000</pubDate>
      <link>https://forem.com/ashok_kanjarla_ai/building-a-production-ready-ai-gateway-in-aspnet-core-2bm</link>
      <guid>https://forem.com/ashok_kanjarla_ai/building-a-production-ready-ai-gateway-in-aspnet-core-2bm</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;AI integration in backend systems looks deceptively simple at first glance. You inject an OpenAI client, call &lt;code&gt;GenerateAsync&lt;/code&gt;, and return the result. In development, everything behaves predictably. Under light load, the system performs well. The abstraction feels clean.&lt;/p&gt;

&lt;p&gt;As usage increased within our ASP.NET Core services, AI quickly evolved from a helpful feature into a systemic risk. It became a latency bottleneck during peak traffic, a cost multiplier as token usage scaled, a failure amplifier whenever the provider experienced issues, and a debugging challenge when unexpected behavior surfaced in production.&lt;/p&gt;

&lt;p&gt;We were handling thousands of concurrent requests per hour. External LLM providers introduced unpredictable latency spikes, occasional rate limits, and intermittent outages. Each of these external behaviors directly affected our API performance because we had tightly coupled AI calls to our request pipeline.&lt;/p&gt;

&lt;p&gt;Calling providers directly from controllers created architectural fragility. What looked convenient initially was in fact exposing our core system to external volatility.&lt;/p&gt;

&lt;p&gt;We redesigned the system around one core principle:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;AI is not a feature. It is a distributed dependency.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The Anti-Pattern: Direct Controller Calls
&lt;/h2&gt;

&lt;p&gt;Our initial implementation followed a common pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;HttpPost&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"generate"&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;
&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="n"&gt;Task&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;IActionResult&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;Generate&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;FromBody&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="n"&gt;PromptRequest&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;_openAiService&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;GenerateAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Prompt&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;Ok&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;From a surface-level perspective, this approach appears clean and efficient. However, at scale, this design introduces multiple systemic weaknesses.&lt;/p&gt;

&lt;p&gt;There were no retry policies to handle transient network failures. There was no circuit breaker to protect the system during sustained provider outages. The implementation lacked provider abstraction, meaning vendor changes would ripple across the codebase. There was no rate limiting to protect upstream dependencies. Token cost tracking was absent, making budgeting reactive rather than proactive. Observability was fragmented, making incident investigation difficult.&lt;/p&gt;

&lt;p&gt;If OpenAI throttled requests, our API would fail immediately. If latency spiked, request threads remained blocked longer than expected. If pricing models changed, we had no structured mechanism to monitor impact.&lt;/p&gt;

&lt;p&gt;This design violated fundamental distributed systems resiliency principles.&lt;/p&gt;




&lt;h2&gt;
  
  
  Architectural Goal
&lt;/h2&gt;

&lt;p&gt;Before rewriting anything, we defined explicit engineering requirements.&lt;/p&gt;

&lt;p&gt;We needed the AI provider to be abstracted so that business logic remained vendor-agnostic. Failures from external providers could not cascade into the API layer. Token usage needed to be measurable to support cost governance and forecasting. Providers had to be swappable to support future multi-provider strategies. Latency required containment mechanisms. Observability needed to be built into the architecture rather than added later.&lt;/p&gt;

&lt;p&gt;In short, AI integration had to be treated as infrastructure, not as a helper library.&lt;/p&gt;




&lt;h2&gt;
  
  
  High-Level Architecture
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Client Request
      │
      ▼
ASP.NET Core API
      │
      ▼
AI Gateway Layer
 ├── Provider Abstraction
 ├── Retry Policy
 ├── Circuit Breaker
 ├── Rate Limiter
 ├── Token Metering
 ├── Structured Logging
 └── Fallback Strategy
      │
      ▼
AI Provider(s)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key shift was introducing a dedicated AI Gateway layer. This layer became the containment boundary between core business logic and external AI providers. It absorbed volatility, enforced governance, and centralized resilience patterns.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 1 — Provider Abstraction
&lt;/h2&gt;

&lt;p&gt;We began by defining a strict contract for AI providers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;interface&lt;/span&gt; &lt;span class="nc"&gt;IAIProvider&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;Task&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;AIResponse&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;GenerateAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;AIRequest&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;CancellationToken&lt;/span&gt; &lt;span class="n"&gt;ct&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A concrete implementation looked like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;OpenAIProvider&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;IAIProvider&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="n"&gt;HttpClient&lt;/span&gt; &lt;span class="n"&gt;_client&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="nf"&gt;OpenAIProvider&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;HttpClient&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;_client&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="n"&gt;Task&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;AIResponse&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;GenerateAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;AIRequest&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;CancellationToken&lt;/span&gt; &lt;span class="n"&gt;ct&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;PostAsJsonAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="s"&gt;"/v1/chat/completions"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;ct&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

        &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;EnsureSuccessStatusCode&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ReadFromJsonAsync&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;AIResponse&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;(&lt;/span&gt;&lt;span class="n"&gt;cancellationToken&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ct&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This abstraction decouples business logic from the vendor, enables a multi-provider strategy, improves unit testing, and prevents long-term vendor lock-in. More importantly, it creates a seam in the architecture where resilience policies can be applied consistently.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 2 — Resilience with Polly
&lt;/h2&gt;

&lt;p&gt;External AI providers are network dependencies. Network dependencies fail. Therefore, resilience policies must be applied systematically.&lt;/p&gt;

&lt;p&gt;We introduced exponential backoff retry policies:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Services&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;AddHttpClient&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;IAIProvider&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;OpenAIProvider&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;()&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;AddPolicyHandler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;Policy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Handle&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;HttpRequestException&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;()&lt;/span&gt;
        &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WaitAndRetryAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;retry&lt;/span&gt; &lt;span class="p"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;TimeSpan&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;FromSeconds&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Pow&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;retry&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;
    &lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We also implemented circuit breakers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;AddPolicyHandler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;Policy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Handle&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;HttpRequestException&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;()&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;CircuitBreakerAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;TimeSpan&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;FromSeconds&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;30&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Retries handle transient failures. Circuit breakers prevent cascading system degradation during sustained outages. Without a breaker, retry mechanisms can unintentionally amplify failure by increasing pressure on an already struggling provider.&lt;/p&gt;

&lt;p&gt;Together, they create controlled resilience.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 3 — Token Cost Governance
&lt;/h2&gt;

&lt;p&gt;AI usage scales cost non-linearly. Without visibility, cost becomes reactive and unpredictable.&lt;/p&gt;

&lt;p&gt;We implemented middleware to track token usage centrally:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;TokenTrackingMiddleware&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="n"&gt;RequestDelegate&lt;/span&gt; &lt;span class="n"&gt;_next&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="n"&gt;ILogger&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;TokenTrackingMiddleware&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;_logger&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="nf"&gt;TokenTrackingMiddleware&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;RequestDelegate&lt;/span&gt; &lt;span class="n"&gt;next&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ILogger&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;TokenTrackingMiddleware&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;_next&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;next&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="n"&gt;_logger&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="n"&gt;Task&lt;/span&gt; &lt;span class="nf"&gt;Invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;HttpContext&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;_next&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Items&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;TryGetValue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"TokenUsage"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;out&lt;/span&gt; &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;tokens&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;_logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;LogInformation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Token usage: {Tokens}"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tokens&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This enabled cost forecasting, per-tenant billing capabilities, and anomaly detection. AI usage stopped being opaque and became measurable infrastructure.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 4 — Rate Limiting
&lt;/h2&gt;

&lt;p&gt;To prevent AI overload and protect upstream providers, we introduced rate limiting:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Services&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;AddRateLimiter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;options&lt;/span&gt; &lt;span class="p"&gt;=&amp;gt;&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;options&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;AddFixedWindowLimiter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"aiPolicy"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;limiterOptions&lt;/span&gt; &lt;span class="p"&gt;=&amp;gt;&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;limiterOptions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;PermitLimit&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="m"&gt;100&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="n"&gt;limiterOptions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Window&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;TimeSpan&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;FromMinutes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Applied at the endpoint level:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;EnableRateLimiting&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"aiPolicy"&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This provided guardrails that preserved both system stability and provider relationships.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 5 — AI Gateway Orchestrator
&lt;/h2&gt;

&lt;p&gt;The gateway orchestrates provider selection and failover logic:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;AIGateway&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="n"&gt;IEnumerable&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;IAIProvider&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;_providers&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="n"&gt;ILogger&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;AIGateway&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;_logger&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="nf"&gt;AIGateway&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;IEnumerable&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;IAIProvider&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;providers&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ILogger&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;AIGateway&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;_providers&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;providers&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="n"&gt;_logger&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="n"&gt;Task&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;AIResponse&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;GenerateAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;AIRequest&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;CancellationToken&lt;/span&gt; &lt;span class="n"&gt;ct&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;foreach&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;provider&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="n"&gt;_providers&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;try&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;provider&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;GenerateAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ct&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="k"&gt;catch&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Exception&lt;/span&gt; &lt;span class="n"&gt;ex&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="n"&gt;_logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;LogWarning&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ex&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"Provider failed. Trying next."&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;Exception&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"All AI providers unavailable."&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This design implements controlled failover and prepares the architecture for future multi-provider routing strategies.&lt;/p&gt;




&lt;h2&gt;
  
  
  What We Explicitly Rejected
&lt;/h2&gt;

&lt;p&gt;We deliberately avoided patterns that appear convenient but degrade system integrity over time. We rejected direct SDK calls in controllers, blocking calls such as &lt;code&gt;.Result&lt;/code&gt;, static singletons without resilience policies, hard-coded API keys, and ignoring cancellation tokens.&lt;/p&gt;

&lt;p&gt;These approaches are acceptable in prototypes. They are not acceptable in production-grade distributed systems.&lt;/p&gt;




&lt;h2&gt;
  
  
  Operational Impact
&lt;/h2&gt;

&lt;p&gt;Within three months of implementing the AI Gateway architecture, we observed measurable improvements. AI-related failures dropped by 42%. Token costs were reduced by 31% due to better visibility and governance. There were zero cascading API outages during provider incidents. Response latency stabilized within SLA thresholds.&lt;/p&gt;

&lt;p&gt;Most importantly, AI became predictable infrastructure rather than a systemic risk.&lt;/p&gt;




&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;External dependencies will fail. Latency will spike. Costs will fluctuate. These realities do not disappear with better SDKs or cleaner controller methods.&lt;/p&gt;

&lt;p&gt;Architecture does not eliminate volatility. It contains it.&lt;/p&gt;

&lt;p&gt;By introducing an AI gateway layer with abstraction, resilience policies, rate limiting, and observability, we transformed AI from an unstable integration into governed infrastructure. The difference is not merely technical elegance — it is operational confidence.&lt;/p&gt;

&lt;p&gt;Treat AI as infrastructure, not as a feature, and your systems will scale with clarity rather than uncertainty.&lt;/p&gt;

</description>
      <category>dotnet</category>
      <category>ai</category>
      <category>aspdotnet</category>
      <category>architecture</category>
    </item>
  </channel>
</rss>
