<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Shivam Saluja</title>
    <description>The latest articles on Forem by Shivam Saluja (@shivamsaluja).</description>
    <link>https://forem.com/shivamsaluja</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F827758%2F25a6411c-fb31-488b-a61d-d01b9d621e95.jpeg</url>
      <title>Forem: Shivam Saluja</title>
      <link>https://forem.com/shivamsaluja</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/shivamsaluja"/>
    <language>en</language>
    <item>
      <title>Sync-over-Async: Bypassing Azure Service Bus Session Limits for AI Workloads</title>
      <dc:creator>Shivam Saluja</dc:creator>
      <pubDate>Wed, 08 Apr 2026 09:58:29 +0000</pubDate>
      <link>https://forem.com/shivamsaluja/sync-over-async-bypassing-azure-service-bus-session-limits-for-ai-workloads-269d</link>
      <guid>https://forem.com/shivamsaluja/sync-over-async-bypassing-azure-service-bus-session-limits-for-ai-workloads-269d</guid>
      <description>&lt;p&gt;How to bridge legacy HTTP clients to long-running AI tasks without 504 Timeouts or Stateful Bottlenecks.&lt;/p&gt;

&lt;p&gt;The business wants you to integrate a new LLM feature. You wire up a &lt;br&gt;
standard REST endpoint, deploy it, and it works flawlessly in testing. Then it hits production. The AI takes 45 seconds to generate a response during peak load. Your API Gateway drops the connection at 30 seconds. The client gets a &lt;code&gt;504 Gateway Timeout&lt;/code&gt;, the user furiously clicks retry, and suddenly you have a thundering herd that takes down your entire connection pool.&lt;/p&gt;

&lt;p&gt;Welcome to the era of AI workloads on legacy HTTP infrastructure.&lt;/p&gt;

&lt;p&gt;Standard REST APIs are built for speed. AI workloads are fundamentally slow. If you do not decouple them, your architecture will eventually shatter under the weight of holding thousands of long-running HTTP threads open.&lt;/p&gt;
&lt;h4&gt;
  
  
  The "Anti-Pattern" Lifeline: Sync-over-Async
&lt;/h4&gt;

&lt;p&gt;In a perfect world, your clients would be fully event-driven, communicating over WebSockets or Server-Sent Events. In the real world, you have legacy mobile apps, older frontends, and strict partner webhooks that only speak one language: they send an HTTP POST and they expect a &lt;code&gt;200 OK&lt;/code&gt; with a JSON payload immediately. You cannot force them to implement an Azure Service Bus listener.&lt;/p&gt;

&lt;p&gt;This is where the &lt;strong&gt;Sync-over-Async Gateway&lt;/strong&gt; comes in. &lt;/p&gt;

&lt;p&gt;It is an edge integration pattern where a Gateway receives a synchronous HTTP request, converts it into an asynchronous message on a broker (like Azure Service Bus), waits for the backend worker to process it, and then maps the reply back to the original HTTP connection.&lt;/p&gt;
&lt;h4&gt;
  
  
  Azure Service Bus Sessions
&lt;/h4&gt;

&lt;p&gt;When engineers build this on Azure, the immediate instinct is to use &lt;strong&gt;Service Bus Sessions&lt;/strong&gt;.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The Gateway sends a message with &lt;code&gt;SessionId = 123&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;The Gateway blocks and listens to a reply queue exclusively for &lt;code&gt;SessionId = 123&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;The Worker processes the task and sends the reply with &lt;code&gt;SessionId = 123&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This works beautifully on a single machine. At scale, it could be a disaster. &lt;/p&gt;

&lt;p&gt;If you have 50 Gateway instances behind a load balancer, how does the reply get back to the &lt;em&gt;exact&lt;/em&gt; instance holding the open HTTP connection? If you use Sessions, your system becomes deeply &lt;strong&gt;stateful&lt;/strong&gt;. Instance #1 has to explicitly request the lock for Session 123. If Instance #1 crashes, that session is locked until it times out. Furthermore, Azure Service Bus Standard tier enforces hard limits on concurrent sessions, meaning a traffic spike will instantly exhaust your namespace.&lt;/p&gt;

&lt;p&gt;Sessions force you to manage stateful routing across a distributed cluster. It breaks horizontal elasticity.&lt;/p&gt;
&lt;h4&gt;
  
  
  The Fix: Stateless Filtered Topics
&lt;/h4&gt;

&lt;p&gt;To achieve true horizontal scale, the Gateway layer must be 100% stateless. Instead of using locked sessions, we can push the routing logic down to the broker using a &lt;strong&gt;Filtered Topic Pattern&lt;/strong&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Explicit Addressing:&lt;/strong&gt; The Gateway injects a unique &lt;code&gt;ReplyToInstance&lt;/code&gt; property into the request (e.g., &lt;code&gt;Instance-A&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dynamic Subscriptions:&lt;/strong&gt; On startup, each Gateway creates a lightweight, temporary subscription on a global reply topic with a SQL rule: &lt;code&gt;ReplyToInstance = 'Instance-A'&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Broker-Side Routing:&lt;/strong&gt; When the backend worker finishes, it attaches the same property to the reply. The Azure broker evaluates the SQL filter and pushes the message &lt;em&gt;only&lt;/em&gt; to the specific Gateway pod waiting for it.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;No session locks. No implicit instance affinity. Complete horizontal scalability.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqnr6ccwxgzvh45p6pxnj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqnr6ccwxgzvh45p6pxnj.png" alt="Sync-over-Async: Bypassing Azure Service Bus Session Limits for AI Workloads" width="800" height="334"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyuoiaj86batdqu8z6qet.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyuoiaj86batdqu8z6qet.jpeg" alt="Sync-over-Async: Bypassing Azure Service Bus Session Limits for AI Workloads" width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Breaking Down the Stateless Architecture&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you look at the architecture diagram above, here is exactly how we eliminate the Session bottleneck and achieve infinite horizontal scale:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. The Synchronous Edge (Left Side)&lt;/strong&gt;&lt;br&gt;
The client sends a standard, blocking HTTP REST request. Our Load Balancer distributes this to any available &lt;strong&gt;Gateway Replica&lt;/strong&gt; (e.g., Replica 1). Because our Gateway is completely stateless, the load balancer doesn't need to worry about sticky sessions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. The Asynchronous Handoff (Middle)&lt;/strong&gt;&lt;br&gt;
Replica 1 takes the HTTP payload and publishes it to the Azure Service Bus &lt;strong&gt;Request Topic&lt;/strong&gt;. &lt;br&gt;
&lt;em&gt;Crucially, it does NOT open a Service Bus Session.&lt;/em&gt; Instead, it generates a unique &lt;code&gt;CorrelationId&lt;/code&gt; (e.g., &lt;code&gt;replica1_reqA&lt;/code&gt;) and includes it in the message properties. Immediately, Replica 1 spins up a lightweight, dynamic subscription on the &lt;strong&gt;Reply Topic&lt;/strong&gt; with a strict SQL Filter: &lt;code&gt;CorrelationId = 'replica1_reqA'&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. The AI Worker Layer (Right Side)&lt;/strong&gt;&lt;br&gt;
Your long-running AI workers operate as standard, competing consumers. A worker pulls the request from the topic, processes the heavy LLM prompt for 45 seconds, and generates the result. To send the result back, the worker simply attaches that exact same &lt;code&gt;CorrelationId&lt;/code&gt; to the response message and drops it onto the global &lt;strong&gt;Reply Topic&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Broker-Side Routing (The Magic)&lt;/strong&gt;&lt;br&gt;
This is where the architecture shines. The Gateway instances are not actively polling or fighting over locked sessions. The Azure Service Bus broker evaluates the incoming reply message, reads &lt;code&gt;CorrelationId = 'replica1_reqA'&lt;/code&gt;, matches it to Replica 1's dynamic SQL filter, and pushes the message directly down that specific pipe. &lt;/p&gt;

&lt;p&gt;Replica 1 receives the answer, maps it back to the open HTTP thread, and returns the &lt;code&gt;200 OK&lt;/code&gt; to the client. If Replica 1 had crashed during those 45 seconds, its temporary subscription would simply vanish—no locked sessions, no frozen resources, and no blocked queues.&lt;/p&gt;
&lt;h4&gt;
  
  
  Introducing Sentinel: The Open-Source Starter
&lt;/h4&gt;

&lt;p&gt;Implementing dynamic Service Bus Administration clients, processor lifecycles, and thread management is complex. To solve this, I built &lt;strong&gt;Sentinel&lt;/strong&gt;—an open-source Spring Boot starter that abstracts this entire pattern into a single library dependency.&lt;/p&gt;

&lt;p&gt;Here is how you can completely decouple your HTTP APIs from your slow AI workers in just a few lines of code.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Add the Dependency&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight xml"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;io.github.shivamsaluja&lt;span class="nt"&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;sentinel-servicebus-starter&lt;span class="nt"&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.0.0&lt;span class="nt"&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;2. The Zero-Boilerplate Configuration&lt;/strong&gt;&lt;br&gt;
Sentinel handles all the Azure SDK heavy lifting. Just point it to your queues in &lt;code&gt;application.yml&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;sentinel&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;servicebus&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;connection-string&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Endpoint=sb://your-namespace.servicebus.windows.net/;SharedAccessKeyName=...;"&lt;/span&gt;
    &lt;span class="na"&gt;request-queue&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ai-task-requests"&lt;/span&gt;
    &lt;span class="na"&gt;reply-topic&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ai-task-replies"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;3. The Gateway Controller (The Magic)&lt;/strong&gt;&lt;br&gt;
By returning a &lt;code&gt;CompletableFuture&lt;/code&gt;, we instantly free up the Tomcat HTTP thread. The client's connection remains open, but the server resources are released, allowing massive concurrency.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nd"&gt;@RestController&lt;/span&gt;
&lt;span class="nd"&gt;@RequestMapping&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"/api/v1/ai"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;GatewayController&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;

    &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="nc"&gt;SentinelTemplate&lt;/span&gt; &lt;span class="n"&gt;sentinelTemplate&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;

    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nf"&gt;GatewayController&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;SentinelTemplate&lt;/span&gt; &lt;span class="n"&gt;sentinelTemplate&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;sentinelTemplate&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sentinelTemplate&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;

    &lt;span class="nd"&gt;@PostMapping&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"/generate"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;CompletableFuture&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;ResponseEntity&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;generateReport&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nd"&gt;@RequestBody&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;

        &lt;span class="c1"&gt;// 1. Send the prompt to the Service Bus and wait.&lt;/span&gt;
        &lt;span class="c1"&gt;// Under the hood, Sentinel manages the dynamic SQL subscription.&lt;/span&gt;
        &lt;span class="nc"&gt;CompletableFuture&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;asyncReply&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sentinelTemplate&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;sendAndReceive&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

        &lt;span class="c1"&gt;// 2. Map the asynchronous reply back to a standard HTTP 200 OK.&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;asyncReply&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;thenApply&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;reply&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;ResponseEntity&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;ok&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;reply&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
        &lt;span class="o"&gt;}).&lt;/span&gt;&lt;span class="na"&gt;exceptionally&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ex&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;ResponseEntity&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;internalServerError&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Task failed: "&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;ex&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getMessage&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;
        &lt;span class="o"&gt;});&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;4. The Backend Worker Contract&lt;/strong&gt;&lt;br&gt;
Your backend workers remain standard, dumb, asynchronous consumers. They just need to respect the routing contract by passing the properties back.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;processAIRequest&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;ServiceBusReceivedMessageContext&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="nc"&gt;ServiceBusReceivedMessage&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getMessage&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;

    &lt;span class="c1"&gt;// Extract the routing property injected by the Sentinel Gateway&lt;/span&gt;
    &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;replyToInstance&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getApplicationProperties&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;get&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"ReplyToInstance"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

    &lt;span class="c1"&gt;// ... (Simulate slow AI processing taking 45 seconds) ...&lt;/span&gt;
    &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;aiResponse&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"Generated Report Data..."&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;

    &lt;span class="nc"&gt;ServiceBusMessage&lt;/span&gt; &lt;span class="n"&gt;replyMessage&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;ServiceBusMessage&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;aiResponse&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;replyMessage&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;setCorrelationId&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getCorrelationId&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;

    &lt;span class="c1"&gt;// CRITICAL: Attach the routing property so Azure knows which pod gets the reply&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;replyToInstance&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;replyMessage&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getApplicationProperties&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;put&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"ReplyToInstance"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;replyToInstance&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;senderClient&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;sendMessage&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;replyMessage&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  The Result
&lt;/h4&gt;

&lt;p&gt;By dropping the Session requirement, your API Gateway layer becomes infinitely horizontally scalable. You can deploy 10 pods or 1,000 pods. The Azure Service Bus handles all the complex routing logic on the broker side, and your legacy clients get their synchronous &lt;code&gt;200 OK&lt;/code&gt;—no matter how long the AI takes to think.&lt;/p&gt;

&lt;p&gt;If you are dealing with timeout issues or brittle edge-integration architectures, check out the project on GitHub. &lt;/p&gt;

&lt;p&gt;🔗 &lt;strong&gt;&lt;a href="https://github.com/ShivamSaluja/sentinel-servicebus-starter" rel="noopener noreferrer"&gt;Sentinel Service Bus Starter on GitHub&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I would love to hear your thoughts, feedback, or horror stories about managing Service Bus sessions in the comments below!&lt;/p&gt;

</description>
      <category>ai</category>
      <category>azure</category>
      <category>azureservicebus</category>
      <category>microsoft</category>
    </item>
    <item>
      <title>Bypassing Azure Service Bus Session Limits: A Sync-over-Async Pattern for Spring Boot</title>
      <dc:creator>Shivam Saluja</dc:creator>
      <pubDate>Thu, 12 Mar 2026 10:14:39 +0000</pubDate>
      <link>https://forem.com/shivamsaluja/bypassing-azure-service-bus-session-limits-a-sync-over-async-pattern-for-spring-boot-293k</link>
      <guid>https://forem.com/shivamsaluja/bypassing-azure-service-bus-session-limits-a-sync-over-async-pattern-for-spring-boot-293k</guid>
      <description>&lt;p&gt;If you have spent a decade building large-scale backend systems, you know that integrating modern, slow-running workloads—like LLM prompts or complex AI tasks—into legacy synchronous architectures is a massive headache.&lt;/p&gt;

&lt;p&gt;Standard HTTP REST calls are inherently brittle for this. If an AI model takes 45 seconds to generate a response, your traditional API gateway or HTTP client will likely time out at the 30-second mark. The connection drops, the user gets a &lt;code&gt;504 Gateway Timeout&lt;/code&gt;, and the backend CPU cycles are completely wasted.&lt;/p&gt;

&lt;p&gt;The textbook architectural answer is to introduce a message broker to act as a shock absorber. But what if your client-facing frontend &lt;em&gt;requires&lt;/em&gt; a synchronous, Request-Reply experience?&lt;/p&gt;

&lt;p&gt;You have to build a "Sync-over-Async" bridge. And if you are using Azure Service Bus, doing this at a massive scale exposes a critical bottleneck.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Problem with Service Bus Sessions
&lt;/h3&gt;

&lt;p&gt;When implementing a Request-Reply pattern on Azure Service Bus, the default recommendation is to use &lt;strong&gt;Sessions&lt;/strong&gt;. You send a message with a specific &lt;code&gt;SessionId&lt;/code&gt;, and your consumer locks onto that session to receive the reply.&lt;/p&gt;

&lt;p&gt;This approach works beautifully in small systems, but it fails spectacularly at scale for two reasons:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;The "Sticky" Bottleneck:&lt;/strong&gt; Sessions create exclusive locks. If one session has 1,000 messages and another has 10, a consumer gets stuck on the heavy session while other pods sit idle.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hard Limits:&lt;/strong&gt; On the Standard tier, you are limited to 1,500 concurrent sessions. If you are scaling to hundreds or thousands of Spring Boot replicas during a massive traffic spike, you will hit a wall.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If you try to bypass sessions by having thousands of replicas listen to a single shared reply queue, you create a "competing consumer" disaster, wasting CPU cycles and thrashing the broker.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Enterprise Solution: The Filtered Topic Pattern
&lt;/h3&gt;

&lt;p&gt;To build a highly scalable, session-less Request-Reply architecture, we need to shift from Queues to &lt;strong&gt;Topics with SQL Filters&lt;/strong&gt;. This is the core engine of an AI-Native Gateway concept designed to modernize legacy software systems without rewriting the clients.&lt;/p&gt;

&lt;p&gt;Here is how the architecture flows:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqt78u6iboewvqnqvnbfk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqt78u6iboewvqnqvnbfk.png" alt="Architecture Diagram - Bypassing Azure Service Bus Session Limits: A Sync-over-Async Pattern for Spring Boot " width="800" height="439"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;The Request:&lt;/strong&gt; The Spring Boot application generates a unique &lt;code&gt;InstanceId&lt;/code&gt; on startup. It sends the request to a standard queue, attaching a custom property: &lt;code&gt;ReplyToInstance = 'Instance-123'&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Dynamic Subscription:&lt;/strong&gt; When the pod boots up, it dynamically provisions a lightweight Subscription to a global &lt;code&gt;reply-topic&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Magic (SQL Filter):&lt;/strong&gt; We apply a &lt;code&gt;SqlRuleFilter&lt;/code&gt; to that subscription: &lt;code&gt;ReplyToInstance = 'Instance-123'&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjtlifvk2x2fi10jcqqa4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjtlifvk2x2fi10jcqqa4.png" alt="Mermaid diagram, which includes the HTTP Load Balancer and illustrates the flow with multiple Gateway instances to show how the Filtered Topic routing works perfectly under horizontal scale" width="800" height="426"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;By leveraging the broker's data plane to evaluate the SQL filter, Azure Service Bus does the heavy lifting. Pod #123 &lt;em&gt;only&lt;/em&gt; receives messages destined for Pod #123. There is zero thrashing, no session limits, and you get pure horizontal elasticity.&lt;/p&gt;

&lt;h3&gt;
  
  
  Achieving True Horizontal Scaling with an HTTP Load Balancer
&lt;/h3&gt;

&lt;p&gt;This architecture is not just powerful for one gateway instance; it is designed for &lt;strong&gt;massive scale&lt;/strong&gt;. You can have 50 or 100 Gateway pods sitting behind a load balancer to handle peak traffic.&lt;/p&gt;

&lt;p&gt;To do this, you place a standard HTTP Load Balancer (like Azure Application Gateway or Nginx) in front of your Sentinel Gateway instances.&lt;/p&gt;

&lt;p&gt;The Load Balancer's role is crucial:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Even Traffic Distribution:&lt;/strong&gt; Configure the load balancer with a "Round-Robin" or "Least Connections" algorithm. This ensures incoming HTTP requests are sprayed evenly across all available Gateway pods (e.g., &lt;code&gt;Gateway-A&lt;/code&gt;, &lt;code&gt;Gateway-B&lt;/code&gt;, etc.).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Preventing "Sticky" Bottlenecks:&lt;/strong&gt; &lt;strong&gt;This is critical.&lt;/strong&gt; You must &lt;strong&gt;disable HTTP Session Affinity (Sticky Sessions)&lt;/strong&gt; on the Load Balancer. Every single request should be routed independently. Because each Gateway instance operates on a strict 1:1 ratio—generating a unique &lt;code&gt;CorrelationID&lt;/code&gt; and waiting for exactly one reply—they don't need to share state. An even distribution of HTTP traffic naturally leads to an even distribution of Service Bus messages and replies.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This creates a stateless, highly resilient design. If one Gateway instance crashes, the load balancer simply sends the next request to another instance, and the overall system keeps humming.&lt;/p&gt;

&lt;h3&gt;
  
  
  Introducing the Sentinel Service Bus Starter
&lt;/h3&gt;

&lt;p&gt;Wiring up the Azure Administration Client to dynamically provision and clean up these filtered subscriptions—while managing reactive &lt;code&gt;CompletableFuture&lt;/code&gt; mappings—is a lot of boilerplate.&lt;/p&gt;

&lt;p&gt;To solve this, I built the &lt;strong&gt;Sentinel Service Bus Starter&lt;/strong&gt;, a plug-and-play Spring Boot library that abstracts this entire pattern into a single dependency.&lt;/p&gt;

&lt;h4&gt;
  
  
  How it works:
&lt;/h4&gt;

&lt;p&gt;Just drop the dependency into your &lt;code&gt;build.gradle&lt;/code&gt;, provide your connection string in &lt;code&gt;application.yml&lt;/code&gt;, and inject the &lt;code&gt;SentinelTemplate&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nd"&gt;@RestController&lt;/span&gt;
&lt;span class="nd"&gt;@RequestMapping&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"/api/v1/gateway"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;GatewayController&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;

    &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="nc"&gt;SentinelTemplate&lt;/span&gt; &lt;span class="n"&gt;sentinelTemplate&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;

    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nf"&gt;GatewayController&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;SentinelTemplate&lt;/span&gt; &lt;span class="n"&gt;sentinelTemplate&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;sentinelTemplate&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sentinelTemplate&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;

    &lt;span class="nd"&gt;@PostMapping&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"/process"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;CompletableFuture&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;ResponseEntity&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;processRequest&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nd"&gt;@RequestBody&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="c1"&gt;// Sends to the ASB Queue, waits on the dynamic Topic Subscription&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;sentinelTemplate&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;sendAndReceive&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
            &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;thenApply&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nl"&gt;ResponseEntity:&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="n"&gt;ok&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
            &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;exceptionally&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ex&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nc"&gt;ResponseEntity&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;internalServerError&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Because it leverages Java 21's Virtual Threads (Project Loom) under the hood, Tomcat HTTP threads are never blocked while waiting for the Service Bus round-trip, allowing incredible throughput even when waiting 60 seconds for an AI workload to finish.&lt;/p&gt;

&lt;h3&gt;
  
  
  Bridging the Legacy Gap
&lt;/h3&gt;

&lt;p&gt;We don't always have the luxury of migrating our entire ecosystem to Event-Driven Architecture overnight. Sometimes, you just need a bulletproof, highly scalable Gateway to protect your modern backends from synchronous legacy clients.&lt;/p&gt;

&lt;p&gt;I’d love to hear how other teams are tackling the Sync-over-Async problem in the comments!&lt;/p&gt;

</description>
      <category>architecture</category>
      <category>azure</category>
      <category>springboot</category>
      <category>systemdesign</category>
    </item>
  </channel>
</rss>
