<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: HorusGod</title>
    <description>The latest articles on Forem by HorusGod (@horusgod007).</description>
    <link>https://forem.com/horusgod007</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3809576%2F28291694-f4ba-49fe-8e33-4b398946aeec.png</url>
      <title>Forem: HorusGod</title>
      <link>https://forem.com/horusgod007</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/horusgod007"/>
    <language>en</language>
    <item>
      <title>How I bridged codex-tui's WebSocket /v1/responses through Cloudflare, nginx, and FastAPI</title>
      <dc:creator>HorusGod</dc:creator>
      <pubDate>Sat, 02 May 2026 20:59:19 +0000</pubDate>
      <link>https://forem.com/horusgod007/how-i-bridged-codex-tuis-websocket-v1responses-through-cloudflare-nginx-and-fastapi-49a3</link>
      <guid>https://forem.com/horusgod007/how-i-bridged-codex-tuis-websocket-v1responses-through-cloudflare-nginx-and-fastapi-49a3</guid>
      <description>&lt;p&gt;OpenAI's Codex CLI (&lt;code&gt;codex-tui&lt;/code&gt;) shipped a new feature in version 0.128 that broke every third-party AI gateway I tested with it: streaming responses now go over &lt;strong&gt;WebSocket on &lt;code&gt;/v1/responses&lt;/code&gt;&lt;/strong&gt; instead of HTTP+SSE. If your gateway only registers a &lt;code&gt;POST /v1/responses&lt;/code&gt; handler, every Codex session fails with a confusing storm of &lt;code&gt;405 Method Not Allowed&lt;/code&gt; errors interleaved with the occasional successful POST.&lt;/p&gt;

&lt;p&gt;I run a hosted AI gateway at &lt;a href="https://g0i.ai" rel="noopener noreferrer"&gt;g0i.ai&lt;/a&gt; — the same problem hit me. This post is the writeup of the four-layer diagnosis I did to make it work, with the exact config and code at each layer. If you're running your own gateway (LiteLLM, Helicone, Portkey, your own Go/Python proxy, whatever) and Codex CLI users are hitting your endpoint, this is the path I'd hand you.&lt;/p&gt;

&lt;h2&gt;
  
  
  The symptom
&lt;/h2&gt;

&lt;p&gt;Every codex-tui session emitted a flood of failed requests:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;POST /v1/responses HTTP/1.1   200    254552 bytes  opencode/1.14.32 ✓
GET  /v1/responses HTTP/1.1   405    31 bytes      codex-tui/0.128.0 ✗
GET  /v1/responses HTTP/1.1   405    31 bytes      codex-tui/0.128.0 ✗
GET  /v1/responses HTTP/1.1   405    31 bytes      codex-tui/0.128.0 ✗
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Four hundred-and-five with &lt;code&gt;Allow: POST&lt;/code&gt; is the smoking gun: the client is sending GET, the server has only POST registered, hence rejection. But why GET? Codex 0.128's user-agent is sending a &lt;strong&gt;WebSocket upgrade handshake&lt;/strong&gt;, which on the wire is &lt;code&gt;GET /v1/responses HTTP/1.1&lt;/code&gt; with &lt;code&gt;Upgrade: websocket&lt;/code&gt; and &lt;code&gt;Connection: Upgrade&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The fix sounds easy: register a WebSocket route at the same path. But the actual request has to traverse four layers, three of which strip the upgrade headers by default.&lt;/p&gt;

&lt;h2&gt;
  
  
  The path the upgrade has to survive
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;client (codex-tui)
  ↓ wss://api.your-gateway.com/v1/responses
[1] Cloudflare Worker (or whatever edge you're using)
  ↓
[2] Cloudflare Tunnel / your VPN to origin
  ↓
[3] nginx reverse proxy (terminating TLS, multiplexing services)
  ↓
[4] FastAPI / Express / your application backend
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three of these strip the upgrade by default. Let me walk through each.&lt;/p&gt;

&lt;h2&gt;
  
  
  Layer 1 — Cloudflare Worker
&lt;/h2&gt;

&lt;p&gt;If you're using a CF Worker for HTTP filtering, smart routing, prompt rewriting, or whatever, you're almost certainly maintaining a header sanitizer. Mine looks like this — copied from a hundred Worker examples online:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;HOP_BY_HOP&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Set&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
  &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;connection&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;keep-alive&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;proxy-authenticate&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;proxy-authorization&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;te&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;trailer&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;transfer-encoding&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;upgrade&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="c1"&gt;// ... CF-specific headers&lt;/span&gt;
&lt;span class="p"&gt;]);&lt;/span&gt;

&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;cleanHeaders&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Headers&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nx"&gt;Headers&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;out&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Headers&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;k&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;v&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;input&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;entries&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;HOP_BY_HOP&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;has&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;k&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toLowerCase&lt;/span&gt;&lt;span class="p"&gt;()))&lt;/span&gt; &lt;span class="k"&gt;continue&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="nx"&gt;out&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;k&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;v&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;out&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is technically correct per RFC 7230 — &lt;code&gt;Upgrade&lt;/code&gt; and &lt;code&gt;Connection&lt;/code&gt; are hop-by-hop headers and shouldn't be forwarded by a true proxy. But CF Workers implementing fetch passthrough need to &lt;em&gt;preserve&lt;/em&gt; them when the goal is letting the WebSocket upgrade reach origin.&lt;/p&gt;

&lt;p&gt;The fix is a one-liner at the top of the fetch handler:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="na"&gt;request&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Env&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;Response&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;URL&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="c1"&gt;// WS passthrough — skip the regular HTTP pipeline. Cloudflare Workers&lt;/span&gt;
    &lt;span class="c1"&gt;// forward WebSocket upgrades natively when we don't strip the headers.&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Upgrade&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)?.&lt;/span&gt;&lt;span class="nf"&gt;toLowerCase&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;websocket&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;originUrl&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;buildOriginRequest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;originUrl&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toString&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;method&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;// pass original — don't sanitize&lt;/span&gt;
        &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c1"&gt;// ... existing HTTP flow with cleanHeaders, JSON parsing, etc.&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. CF Workers' &lt;code&gt;fetch()&lt;/code&gt; knows how to forward WebSocket upgrades to origin as long as the headers are intact. The &lt;code&gt;Upgrade: websocket&lt;/code&gt; and &lt;code&gt;Connection: Upgrade&lt;/code&gt; headers reach origin verbatim.&lt;/p&gt;

&lt;h2&gt;
  
  
  Layer 2 — Cloudflare Tunnel (or your VPN)
&lt;/h2&gt;

&lt;p&gt;If you're using &lt;code&gt;cloudflared&lt;/code&gt;, the tunnel itself supports WebSocket out of the box for HTTP services. The config in &lt;code&gt;~/.cloudflared/config.yml&lt;/code&gt; doesn't need any special directive — &lt;code&gt;service: http://localhost:8085&lt;/code&gt; forwards both HTTP/1.1 traffic and WS upgrades correctly. Same for &lt;code&gt;wireguard&lt;/code&gt;, &lt;code&gt;tailscale&lt;/code&gt;, etc.&lt;/p&gt;

&lt;p&gt;If you're using something more aggressive (a custom nginx-stream forwarder, a Lambda@Edge function), check that your transport handles upgrades.&lt;/p&gt;

&lt;h2&gt;
  
  
  Layer 3 — nginx
&lt;/h2&gt;

&lt;p&gt;The standard reverse-proxy block in every nginx-on-rails tutorial is &lt;strong&gt;broken for WebSockets&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight nginx"&gt;&lt;code&gt;&lt;span class="k"&gt;location&lt;/span&gt; &lt;span class="n"&gt;/v1/&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kn"&gt;proxy_pass&lt;/span&gt; &lt;span class="s"&gt;http://my_backend&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kn"&gt;proxy_http_version&lt;/span&gt; &lt;span class="mf"&gt;1.1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kn"&gt;proxy_set_header&lt;/span&gt; &lt;span class="s"&gt;Connection&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;   &lt;span class="c1"&gt;# ← strips Upgrade&lt;/span&gt;
    &lt;span class="kn"&gt;proxy_set_header&lt;/span&gt; &lt;span class="s"&gt;Host&lt;/span&gt; &lt;span class="nv"&gt;$host&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;proxy_set_header Connection ""&lt;/code&gt; is the conventional way to enable upstream keepalive pooling, because nginx's &lt;code&gt;Connection: keep-alive&lt;/code&gt; from the client should NOT be forwarded to upstream. But it also wipes out &lt;code&gt;Connection: Upgrade&lt;/code&gt; in WebSocket handshakes.&lt;/p&gt;

&lt;p&gt;The fix uses the standard nginx &lt;code&gt;map&lt;/code&gt; directive to set &lt;code&gt;Connection&lt;/code&gt; based on whether the client requested an upgrade. Add this once at the &lt;code&gt;http {}&lt;/code&gt; level:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight nginx"&gt;&lt;code&gt;&lt;span class="k"&gt;map&lt;/span&gt; &lt;span class="nv"&gt;$http_upgrade&lt;/span&gt; &lt;span class="nv"&gt;$connection_upgrade&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kn"&gt;default&lt;/span&gt; &lt;span class="s"&gt;upgrade&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kn"&gt;''&lt;/span&gt;      &lt;span class="s"&gt;close&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then for the specific endpoint that handles WebSockets, add a dedicated &lt;code&gt;location&lt;/code&gt; block ABOVE your generic &lt;code&gt;/v1/&lt;/code&gt; block:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight nginx"&gt;&lt;code&gt;&lt;span class="k"&gt;location&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;/v1/responses&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kn"&gt;proxy_pass&lt;/span&gt; &lt;span class="s"&gt;http://my_backend&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kn"&gt;proxy_http_version&lt;/span&gt; &lt;span class="mf"&gt;1.1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kn"&gt;proxy_set_header&lt;/span&gt; &lt;span class="s"&gt;Upgrade&lt;/span&gt; &lt;span class="nv"&gt;$http_upgrade&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kn"&gt;proxy_set_header&lt;/span&gt; &lt;span class="s"&gt;Connection&lt;/span&gt; &lt;span class="nv"&gt;$connection_upgrade&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kn"&gt;proxy_set_header&lt;/span&gt; &lt;span class="s"&gt;Host&lt;/span&gt; &lt;span class="nv"&gt;$host&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kn"&gt;proxy_set_header&lt;/span&gt; &lt;span class="s"&gt;X-Real-IP&lt;/span&gt; &lt;span class="nv"&gt;$remote_addr&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kn"&gt;proxy_set_header&lt;/span&gt; &lt;span class="s"&gt;X-Forwarded-For&lt;/span&gt; &lt;span class="nv"&gt;$proxy_add_x_forwarded_for&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kn"&gt;proxy_set_header&lt;/span&gt; &lt;span class="s"&gt;X-Forwarded-Proto&lt;/span&gt; &lt;span class="nv"&gt;$scheme&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kn"&gt;proxy_read_timeout&lt;/span&gt; &lt;span class="s"&gt;3600s&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kn"&gt;proxy_send_timeout&lt;/span&gt; &lt;span class="s"&gt;3600s&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kn"&gt;proxy_buffering&lt;/span&gt; &lt;span class="no"&gt;off&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;=&lt;/code&gt; makes it an exact-match block — POST and GET to &lt;code&gt;/v1/responses&lt;/code&gt; both go here, and the upgrade headers get forwarded properly only when the client sends them. Other &lt;code&gt;/v1/*&lt;/code&gt; paths fall through to the keepalive-friendly block.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;proxy_buffering off&lt;/code&gt; is critical for any streaming endpoint — buffered nginx hangs on to the response body until it has the whole thing, which defeats the entire point of streaming.&lt;/p&gt;

&lt;h2&gt;
  
  
  Layer 4 — FastAPI (or your backend)
&lt;/h2&gt;

&lt;p&gt;The actual WebSocket handler. FastAPI makes this nicely concise:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;httpx&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;fastapi&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;APIRouter&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;WebSocket&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;WebSocketDisconnect&lt;/span&gt;

&lt;span class="n"&gt;router&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;APIRouter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="nd"&gt;@router.websocket&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/v1/responses&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;responses_ws&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ws&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;WebSocket&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Auth via Bearer header on the WS upgrade request
&lt;/span&gt;    &lt;span class="n"&gt;auth&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ws&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;authorization&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;auth&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;startswith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bearer &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;ws&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;4401&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;reason&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Unauthorized&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt;
    &lt;span class="n"&gt;token&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;auth&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;:].&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;user&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;validate_token&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;ws&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;4401&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;reason&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Invalid token&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt;

    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;ws&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;accept&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="c1"&gt;# First frame: the request body as a single JSON message
&lt;/span&gt;    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;first&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;wait_for&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ws&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;receive_text&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;TimeoutError&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;ws&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;4408&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;reason&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;No request received&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt;

    &lt;span class="n"&gt;body&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;first&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;stream&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;  &lt;span class="c1"&gt;# always stream over the bridge
&lt;/span&gt;
    &lt;span class="n"&gt;upstream_url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;resolve_upstream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/v1/responses&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;httpx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;AsyncClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;httpx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Timeout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;read&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;600.0&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;POST&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;upstream_url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;aread&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
                &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;ws&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;errors&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ignore&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)[:&lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;
                &lt;span class="p"&gt;}))&lt;/span&gt;
                &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;ws&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;4502&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="k"&gt;return&lt;/span&gt;

            &lt;span class="c1"&gt;# Forward each SSE event as a WS text frame
&lt;/span&gt;            &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;aiter_lines&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
                &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startswith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
                    &lt;span class="n"&gt;payload&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;:]&lt;/span&gt;
                    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[DONE]&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                        &lt;span class="k"&gt;break&lt;/span&gt;
                    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;ws&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="n"&gt;WebSocketDisconnect&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                        &lt;span class="k"&gt;return&lt;/span&gt;  &lt;span class="c1"&gt;# client gone
&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;ws&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A few non-obvious things in here:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Auth is on the upgrade request itself.&lt;/strong&gt; The client sends &lt;code&gt;Authorization: Bearer sk-...&lt;/code&gt; as a regular HTTP header during the &lt;code&gt;GET /v1/responses&lt;/code&gt; upgrade. FastAPI's &lt;code&gt;WebSocket.headers&lt;/code&gt; exposes them. You authenticate BEFORE calling &lt;code&gt;await ws.accept()&lt;/code&gt; — if auth fails, &lt;code&gt;ws.close(code=4401)&lt;/code&gt; rejects the upgrade cleanly with a custom close code.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Don't reuse a request-scoped DB session inside the WebSocket handler.&lt;/strong&gt; I learned this the hard way — passing the route's &lt;code&gt;db: AsyncSession&lt;/code&gt; into an asyncio task that outlives the route return causes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;asyncpg.exceptions._base.InterfaceError: cannot perform operation: another operation is in progress
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The framework cleans up the request-scoped session as soon as the route returns, but the WS handler keeps running. Use a fresh session via &lt;code&gt;async_session_factory()&lt;/code&gt; inside the WS handler.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. The &lt;code&gt;stream=True&lt;/code&gt; upstream call uses &lt;code&gt;httpx.stream()&lt;/code&gt;, not &lt;code&gt;httpx.AsyncClient.post()&lt;/code&gt;.&lt;/strong&gt; This is the difference between "wait for the entire response, then forward" (broken — defeats streaming) and "iterate over the SSE lines as they arrive" (correct).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Upstream's SSE has &lt;code&gt;data:&lt;/code&gt; prefix lines and possibly &lt;code&gt;event:&lt;/code&gt; prefix lines.&lt;/strong&gt; The OpenAI Responses API embeds the event type INSIDE the JSON payload (the &lt;code&gt;"type": "response.output_text.delta"&lt;/code&gt; field), so we forward only the &lt;code&gt;data:&lt;/code&gt; content. If your upstream uses &lt;code&gt;event:&lt;/code&gt;-typed SSE, you may need to multiplex differently.&lt;/p&gt;

&lt;h2&gt;
  
  
  Verification
&lt;/h2&gt;

&lt;p&gt;Once all four layers are wired, test with &lt;code&gt;wscat&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;wscat &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s2"&gt;"wss://api.your-gateway.com/v1/responses"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer sk-..."&lt;/span&gt;

Connected &lt;span class="o"&gt;(&lt;/span&gt;press CTRL+C to quit&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"model"&lt;/span&gt;:&lt;span class="s2"&gt;"gpt-5.5"&lt;/span&gt;,&lt;span class="s2"&gt;"input"&lt;/span&gt;:&lt;span class="s2"&gt;"reply with the word BANANA"&lt;/span&gt;,&lt;span class="s2"&gt;"stream"&lt;/span&gt;:true&lt;span class="o"&gt;}&lt;/span&gt;
&amp;lt; &lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"type"&lt;/span&gt;:&lt;span class="s2"&gt;"response.created"&lt;/span&gt;,&lt;span class="s2"&gt;"response"&lt;/span&gt;:&lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"id"&lt;/span&gt;:&lt;span class="s2"&gt;"resp_..."&lt;/span&gt;...&lt;span class="o"&gt;}}&lt;/span&gt;
&amp;lt; &lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"type"&lt;/span&gt;:&lt;span class="s2"&gt;"response.in_progress"&lt;/span&gt;,&lt;span class="s2"&gt;"response"&lt;/span&gt;:&lt;span class="o"&gt;{&lt;/span&gt;...&lt;span class="o"&gt;}}&lt;/span&gt;
&amp;lt; &lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"type"&lt;/span&gt;:&lt;span class="s2"&gt;"response.output_item.added"&lt;/span&gt;,&lt;span class="s2"&gt;"item"&lt;/span&gt;:&lt;span class="o"&gt;{&lt;/span&gt;...&lt;span class="o"&gt;}}&lt;/span&gt;
&amp;lt; &lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"type"&lt;/span&gt;:&lt;span class="s2"&gt;"response.output_text.delta"&lt;/span&gt;,&lt;span class="s2"&gt;"delta"&lt;/span&gt;:&lt;span class="s2"&gt;"BAN"&lt;/span&gt;,&lt;span class="s2"&gt;"item_id"&lt;/span&gt;:&lt;span class="s2"&gt;"..."&lt;/span&gt;&lt;span class="o"&gt;}&lt;/span&gt;
&amp;lt; &lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"type"&lt;/span&gt;:&lt;span class="s2"&gt;"response.output_text.delta"&lt;/span&gt;,&lt;span class="s2"&gt;"delta"&lt;/span&gt;:&lt;span class="s2"&gt;"ANA"&lt;/span&gt;,&lt;span class="s2"&gt;"item_id"&lt;/span&gt;:&lt;span class="s2"&gt;"..."&lt;/span&gt;&lt;span class="o"&gt;}&lt;/span&gt;
&amp;lt; &lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"type"&lt;/span&gt;:&lt;span class="s2"&gt;"response.output_text.done"&lt;/span&gt;,&lt;span class="s2"&gt;"text"&lt;/span&gt;:&lt;span class="s2"&gt;"BANANA"&lt;/span&gt;,...&lt;span class="o"&gt;}&lt;/span&gt;
&amp;lt; &lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"type"&lt;/span&gt;:&lt;span class="s2"&gt;"response.completed"&lt;/span&gt;,&lt;span class="s2"&gt;"response"&lt;/span&gt;:&lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"status"&lt;/span&gt;:&lt;span class="s2"&gt;"completed"&lt;/span&gt;,...&lt;span class="o"&gt;}}&lt;/span&gt;
Disconnected &lt;span class="o"&gt;(&lt;/span&gt;code: 1000, reason: &lt;span class="s2"&gt;""&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's the protocol codex-tui expects. With this in place, point your codex-tui config at your gateway:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OPENAI_BASE_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;https://api.your-gateway.com/v1
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;sk-...
codex
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And the WebSocket handshake succeeds, the stream flows, and Codex's UI updates token-by-token like it should.&lt;/p&gt;

&lt;h2&gt;
  
  
  CDN-specific notes
&lt;/h2&gt;

&lt;p&gt;A few CDNs I tested:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cloudflare&lt;/strong&gt; — supports WS through Workers (with the passthrough above) and through plain proxied zones. No extra config needed if you're not using a Worker.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bunny.net&lt;/strong&gt; — supports WS upgrade verbatim. The &lt;code&gt;CDN-RequestPullCode: 101&lt;/code&gt; response header confirms the edge pulled "101 Switching Protocols" from origin and forwarded it. No special configuration needed beyond pointing the pull zone at your origin.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fastly&lt;/strong&gt; — needs explicit WS service config; their default HTTP service doesn't pass upgrades.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AWS CloudFront&lt;/strong&gt; — supports WebSockets but only on certain origin types; check their docs.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you're on a CDN that doesn't pass WS upgrades AT ALL, the fallback is to bypass the CDN entirely for the &lt;code&gt;/v1/responses&lt;/code&gt; path — DNS-only point a &lt;code&gt;wsapi.your-domain.com&lt;/code&gt; directly at origin, and have clients hit that for WebSocket sessions. Less elegant but works.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd watch for next
&lt;/h2&gt;

&lt;p&gt;OpenAI's Responses API is still in flux — they've been adding fields, changing how reasoning blocks are encoded, and the WebSocket variant of /v1/responses is undocumented as of this writing (Jan 2026). If you're shipping a gateway that supports Codex, expect to chase a moving target.&lt;/p&gt;

&lt;p&gt;The other two integration points worth watching are MCP servers (codex 0.128 added native MCP support, also over WebSocket in some configurations) and the realtime audio API (&lt;code&gt;/v1/realtime&lt;/code&gt;, fully WebSocket, has been stable longer).&lt;/p&gt;

&lt;p&gt;If this saved you some time, drop a comment — happy to compare notes on what other agents (Cline, Cursor, Aider, Continue) actually need on the wire. I run &lt;a href="https://g0i.ai" rel="noopener noreferrer"&gt;g0i&lt;/a&gt; which has integration guides for each of those, and the lessons from this codex-tui chase generalized to all of them.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;If you're a gateway operator and want to compare notes on edge-case clients, my DMs are open.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>programming</category>
      <category>productivity</category>
    </item>
    <item>
      <title>I Built CloudDesktop — Turn Any Linux VPS Into a Browser-Based Desktop (Free &amp; Open Source)</title>
      <dc:creator>HorusGod</dc:creator>
      <pubDate>Fri, 06 Mar 2026 09:46:58 +0000</pubDate>
      <link>https://forem.com/horusgod007/i-built-clouddesktop-turn-any-linux-vps-into-a-browser-based-desktop-free-open-source-2o3n</link>
      <guid>https://forem.com/horusgod007/i-built-clouddesktop-turn-any-linux-vps-into-a-browser-based-desktop-free-open-source-2o3n</guid>
      <description>&lt;p&gt;Ever wanted a full Linux desktop in your browser?&lt;/p&gt;

&lt;p&gt;No SSH. No PuTTY. No setup headaches.&lt;br&gt;
Just open a tab — and you're in. 👇&lt;/p&gt;




&lt;h2&gt;
  
  
  🧠 The Problem
&lt;/h2&gt;

&lt;p&gt;I wanted a persistent Linux environment I could reach from anywhere — my phone on the go, a tablet, even a friend's laptop.&lt;/p&gt;

&lt;p&gt;Every solution I found was either:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;💸 Expensive (cloud desktops cost $$$)&lt;/li&gt;
&lt;li&gt;🔧 Painful to set up (VNC configs, firewalls, SSL hell)&lt;/li&gt;
&lt;li&gt;📵 Desktop-only (forget using it on mobile)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So I spent weeks and built CloudDesktop from scratch.&lt;/p&gt;




&lt;h2&gt;
  
  
  🖥️ What it looks like
&lt;/h2&gt;

&lt;p&gt;Here's the login screen:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyspo78c0bdhy1pt6s94h.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyspo78c0bdhy1pt6s94h.png" alt="Login Screen" width="800" height="500"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And once you're in — a full XFCE desktop, in your browser:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1spbctrcrm7mj95nywbb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1spbctrcrm7mj95nywbb.png" alt="Desktop" width="800" height="500"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  ⚡ One command to rule them all
&lt;/h2&gt;

&lt;p&gt;sudo bash install.sh&lt;/p&gt;

&lt;p&gt;That's it. The installer auto-configures:&lt;/p&gt;

&lt;p&gt;✅ XFCE desktop + TigerVNC&lt;br&gt;
✅ WebSocket bridge (noVNC)&lt;br&gt;
✅ Node.js + Express backend&lt;br&gt;
✅ Nginx reverse proxy + SSL&lt;br&gt;
✅ Firewall (UFW) + Fail2ban&lt;br&gt;
✅ Systemd services (auto-start on boot)&lt;/p&gt;




&lt;h2&gt;
  
  
  📱 Works on EVERY device
&lt;/h2&gt;

&lt;p&gt;This was the hardest part to get right.&lt;/p&gt;

&lt;p&gt;CloudDesktop is mobile-first:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Virtual trackpad cursor (like Microsoft RD Client)&lt;/li&gt;
&lt;li&gt;Pinch to zoom + scroll&lt;/li&gt;
&lt;li&gt;On-screen keyboard&lt;/li&gt;
&lt;li&gt;Auto-resolution on orientation change&lt;/li&gt;
&lt;li&gt;Fullscreen PWA mode — no browser chrome&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Install it as a native app on iOS, Android, Windows, macOS — all from your browser.&lt;/p&gt;




&lt;h2&gt;
  
  
  🔄 One live session, all devices
&lt;/h2&gt;

&lt;p&gt;All your devices connect to the same live desktop.&lt;/p&gt;

&lt;p&gt;Start coding on your PC → pick up exactly where you left off on your phone.&lt;br&gt;
No sync. No cloud storage. Just your desktop, everywhere.&lt;/p&gt;




&lt;h2&gt;
  
  
  ⚙️ Settings &amp;amp; customization
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fejusrc5raqh4wr0igy73.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fejusrc5raqh4wr0igy73.png" alt="Settings Panel" width="800" height="500"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Adjust resolution, manage sessions, toggle features — all from a clean settings panel inside the browser.&lt;/p&gt;




&lt;h2&gt;
  
  
  🤖 Claude Code built right in
&lt;/h2&gt;

&lt;p&gt;This is my favorite part.&lt;/p&gt;

&lt;p&gt;CloudDesktop has first-class Claude Code support with dedicated dock icons:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvajrvjek9ao053j19bbp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvajrvjek9ao053j19bbp.png" alt="Claude Code in Dock" width="800" height="500"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Claude Code — launch CLI in a terminal from the dock&lt;/li&gt;
&lt;li&gt;Claude Fast — one-click sandbox mode for quick tasks&lt;/li&gt;
&lt;li&gt;Directory Picker — choose your working folder before launching&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🔒 Security? Covered.
&lt;/h2&gt;

&lt;p&gt;This runs on the open internet, so security was non-negotiable:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;🔐 Bcrypt password hashing&lt;/li&gt;
&lt;li&gt;🎟️ JWT session tokens (httpOnly cookies)&lt;/li&gt;
&lt;li&gt;📲 TOTP two-factor authentication&lt;/li&gt;
&lt;li&gt;🚫 Rate limiting on auth endpoints&lt;/li&gt;
&lt;li&gt;🛡️ Fail2ban + UFW firewall&lt;/li&gt;
&lt;li&gt;🔒 HTTPS enforced (Let's Encrypt or self-signed)&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🏗️ How it works under the hood
&lt;/h2&gt;

&lt;p&gt;Browser ──HTTPS──▸ Nginx ──▸ Express API (auth, files, resolution)&lt;br&gt;
                       └──▸ WebSocket ──▸ websockify ──▸ VNC (TigerVNC/XFCE)&lt;/p&gt;

&lt;p&gt;Simple, battle-tested stack. No magic, no vendor lock-in.&lt;/p&gt;




&lt;h2&gt;
  
  
  💚 100% Free &amp;amp; Open Source
&lt;/h2&gt;

&lt;p&gt;No hidden fees.&lt;br&gt;
No premium tiers.&lt;br&gt;
No telemetry.&lt;br&gt;
No nonsense.&lt;/p&gt;

&lt;p&gt;Fork it. Break it. Make it yours.&lt;/p&gt;

&lt;p&gt;👉 GitHub — &lt;a href="https://github.com/HorusGod007/CloudDesktop" rel="noopener noreferrer"&gt;https://github.com/HorusGod007/CloudDesktop&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;If this helped you or looks useful — a ⭐ on GitHub means the world and helps others discover it. Drop your questions below, I read every comment! 🙏&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>linux</category>
      <category>selfhosted</category>
      <category>webdev</category>
    </item>
  </channel>
</rss>
