<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: augustine Egbuna</title>
    <description>The latest articles on Forem by augustine Egbuna (@fivenineslab_30).</description>
    <link>https://forem.com/fivenineslab_30</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3864596%2Ff0ca0044-b937-44da-acfe-2e62f44c281a.png</url>
      <title>Forem: augustine Egbuna</title>
      <link>https://forem.com/fivenineslab_30</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/fivenineslab_30"/>
    <language>en</language>
    <item>
      <title>Streaming Rugby Through a Self-Hosted RTMP Proxy with Docker and OBS</title>
      <dc:creator>augustine Egbuna</dc:creator>
      <pubDate>Tue, 07 Apr 2026 18:13:18 +0000</pubDate>
      <link>https://forem.com/fivenineslab_30/streaming-rugby-through-a-self-hosted-rtmp-proxy-with-docker-and-obs-2bjd</link>
      <guid>https://forem.com/fivenineslab_30/streaming-rugby-through-a-self-hosted-rtmp-proxy-with-docker-and-obs-2bjd</guid>
      <description>&lt;p&gt;Last March, our office wanted to stream a rugby match — Highlanders vs Brumbies — to multiple monitors without juggling browser tabs or relying on flaky third-party streams. The problem: we needed one reliable ingestion point, the ability to record the stream, and the flexibility to push it to multiple destinations (local screens, recording storage, backup relay). No commercial streaming service gave us that level of control.&lt;/p&gt;

&lt;p&gt;We solved this by running our own RTMP proxy using &lt;code&gt;nginx-rtmp-module&lt;/code&gt; in Docker, pulling the source stream with &lt;code&gt;ffmpeg&lt;/code&gt;, and distributing it across our internal network. This isn't about piracy — it's about understanding media streaming infrastructure at the protocol level. You can use the same pattern for security camera feeds, internal presentations, or any scenario where you need to ingest, transcode, and redistribute live video.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why RTMP Still Matters
&lt;/h2&gt;

&lt;p&gt;RTMP (Real-Time Messaging Protocol) remains the workhorse protocol for live video ingestion. While HLS and DASH dominate delivery to browsers, RTMP handles low-latency, persistent connections between encoders and servers. OBS, ffmpeg, and most professional broadcast tools speak RTMP natively.&lt;/p&gt;

&lt;p&gt;The stack we built:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;nginx with rtmp module&lt;/strong&gt;: accepts incoming RTMP streams, handles restreaming&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ffmpeg&lt;/strong&gt;: pulls external streams (HLS, RTSP, etc.), transcodes, pushes to nginx&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Docker Compose&lt;/strong&gt;: orchestrates everything, handles restarts&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Prometheus node-exporter&lt;/strong&gt; (optional): monitors bitrate, dropped frames&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Containerized RTMP Server
&lt;/h2&gt;

&lt;p&gt;First, we built a Docker image for nginx with the RTMP module. The official nginx image doesn't include it, so we compile it in.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;alpine:3.18&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;builder&lt;/span&gt;

&lt;span class="k"&gt;RUN &lt;/span&gt;apk add &lt;span class="nt"&gt;--no-cache&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;    build-base &lt;span class="se"&gt;\
&lt;/span&gt;    git &lt;span class="se"&gt;\
&lt;/span&gt;    pcre-dev &lt;span class="se"&gt;\
&lt;/span&gt;    openssl-dev &lt;span class="se"&gt;\
&lt;/span&gt;    zlib-dev

&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /tmp&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;git clone https://github.com/arut/nginx-rtmp-module.git &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;    wget http://nginx.org/download/nginx-1.24.0.tar.gz &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;    &lt;span class="nb"&gt;tar&lt;/span&gt; &lt;span class="nt"&gt;-xzf&lt;/span&gt; nginx-1.24.0.tar.gz

&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /tmp/nginx-1.24.0&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;./configure &lt;span class="se"&gt;\
&lt;/span&gt;    &lt;span class="nt"&gt;--with-http_ssl_module&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;    &lt;span class="nt"&gt;--add-module&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;../nginx-rtmp-module &lt;span class="se"&gt;\
&lt;/span&gt;    &lt;span class="nt"&gt;--prefix&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/usr/local/nginx &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;    make &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; make &lt;span class="nb"&gt;install&lt;/span&gt;

&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; alpine:3.18&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;apk add &lt;span class="nt"&gt;--no-cache&lt;/span&gt; pcre openssl
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; --from=builder /usr/local/nginx /usr/local/nginx&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; nginx.conf /usr/local/nginx/conf/nginx.conf&lt;/span&gt;
&lt;span class="k"&gt;EXPOSE&lt;/span&gt;&lt;span class="s"&gt; 1935 8080&lt;/span&gt;
&lt;span class="k"&gt;CMD&lt;/span&gt;&lt;span class="s"&gt; ["/usr/local/nginx/sbin/nginx", "-g", "daemon off;"]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The nginx configuration handles stream ingestion on port 1935 and serves an HLS endpoint on 8080:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight nginx"&gt;&lt;code&gt;&lt;span class="k"&gt;rtmp&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kn"&gt;server&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kn"&gt;listen&lt;/span&gt; &lt;span class="mi"&gt;1935&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="kn"&gt;chunk_size&lt;/span&gt; &lt;span class="mi"&gt;4096&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

        &lt;span class="kn"&gt;application&lt;/span&gt; &lt;span class="s"&gt;live&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="kn"&gt;live&lt;/span&gt; &lt;span class="no"&gt;on&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
            &lt;span class="kn"&gt;record&lt;/span&gt; &lt;span class="no"&gt;off&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

            &lt;span class="c1"&gt;# Enable HLS&lt;/span&gt;
            &lt;span class="kn"&gt;hls&lt;/span&gt; &lt;span class="no"&gt;on&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
            &lt;span class="kn"&gt;hls_path&lt;/span&gt; &lt;span class="n"&gt;/tmp/hls&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
            &lt;span class="kn"&gt;hls_fragment&lt;/span&gt; &lt;span class="s"&gt;2s&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
            &lt;span class="kn"&gt;hls_playlist_length&lt;/span&gt; &lt;span class="s"&gt;6s&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

            &lt;span class="c1"&gt;# Allow publishing from local network only&lt;/span&gt;
            &lt;span class="kn"&gt;allow&lt;/span&gt; &lt;span class="s"&gt;publish&lt;/span&gt; &lt;span class="mf"&gt;10.0&lt;/span&gt;&lt;span class="s"&gt;.0.0/8&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
            &lt;span class="kn"&gt;allow&lt;/span&gt; &lt;span class="s"&gt;publish&lt;/span&gt; &lt;span class="mf"&gt;172.16&lt;/span&gt;&lt;span class="s"&gt;.0.0/12&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
            &lt;span class="kn"&gt;allow&lt;/span&gt; &lt;span class="s"&gt;publish&lt;/span&gt; &lt;span class="mf"&gt;192.168&lt;/span&gt;&lt;span class="s"&gt;.0.0/16&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
            &lt;span class="kn"&gt;deny&lt;/span&gt; &lt;span class="s"&gt;publish&lt;/span&gt; &lt;span class="s"&gt;all&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;http&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kn"&gt;server&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kn"&gt;listen&lt;/span&gt; &lt;span class="mi"&gt;8080&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

        &lt;span class="kn"&gt;location&lt;/span&gt; &lt;span class="n"&gt;/hls&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="kn"&gt;types&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="kn"&gt;application/vnd.apple.mpegurl&lt;/span&gt; &lt;span class="s"&gt;m3u8&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
                &lt;span class="kn"&gt;video/mp2t&lt;/span&gt; &lt;span class="s"&gt;ts&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="kn"&gt;root&lt;/span&gt; &lt;span class="n"&gt;/tmp&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
            &lt;span class="kn"&gt;add_header&lt;/span&gt; &lt;span class="s"&gt;Cache-Control&lt;/span&gt; &lt;span class="s"&gt;no-cache&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
            &lt;span class="kn"&gt;add_header&lt;/span&gt; &lt;span class="s"&gt;Access-Control-Allow-Origin&lt;/span&gt; &lt;span class="s"&gt;*&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="kn"&gt;location&lt;/span&gt; &lt;span class="n"&gt;/stat&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="kn"&gt;rtmp_stat&lt;/span&gt; &lt;span class="s"&gt;all&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
            &lt;span class="kn"&gt;rtmp_stat_stylesheet&lt;/span&gt; &lt;span class="s"&gt;stat.xsl&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Ingesting the External Stream
&lt;/h2&gt;

&lt;p&gt;Most live sports streams are delivered via HLS (&lt;code&gt;.m3u8&lt;/code&gt; playlists). We use ffmpeg to pull that HLS stream and push it to our RTMP server:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;
&lt;span class="nv"&gt;SOURCE_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"https://example.com/stream/playlist.m3u8"&lt;/span&gt;
&lt;span class="nv"&gt;RTMP_DEST&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"rtmp://localhost:1935/live/rugby"&lt;/span&gt;

ffmpeg &lt;span class="nt"&gt;-i&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$SOURCE_URL&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-c&lt;/span&gt;:v copy &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-c&lt;/span&gt;:a aac &lt;span class="nt"&gt;-b&lt;/span&gt;:a 128k &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-f&lt;/span&gt; flv &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$RTMP_DEST&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This script runs in a separate container (or systemd service). The &lt;code&gt;-c:v copy&lt;/code&gt; flag avoids re-encoding video — we're just remuxing from HLS to RTMP. If the source codec isn't compatible, replace &lt;code&gt;copy&lt;/code&gt; with &lt;code&gt;libx264 -preset veryfast&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Docker Compose Stack
&lt;/h2&gt;

&lt;p&gt;Here's the complete &lt;code&gt;docker-compose.yml&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;3.8'&lt;/span&gt;

&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;rtmp-server&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;./nginx-rtmp&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1935:1935"&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;8080:8080"&lt;/span&gt;
    &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;./recordings:/tmp/hls&lt;/span&gt;
    &lt;span class="na"&gt;restart&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;unless-stopped&lt;/span&gt;

  &lt;span class="na"&gt;stream-ingester&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;jrottenberg/ffmpeg:4.4-alpine&lt;/span&gt;
    &lt;span class="na"&gt;depends_on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;rtmp-server&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;SOURCE_URL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${SOURCE_URL}&lt;/span&gt;
      &lt;span class="na"&gt;RTMP_DEST&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;rtmp://rtmp-server:1935/live/rugby&lt;/span&gt;
    &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="s"&gt;-i ${SOURCE_URL}&lt;/span&gt;
      &lt;span class="s"&gt;-c:v copy&lt;/span&gt;
      &lt;span class="s"&gt;-c:a aac -b:a 128k&lt;/span&gt;
      &lt;span class="s"&gt;-f flv rtmp://rtmp-server:1935/live/rugby&lt;/span&gt;
    &lt;span class="na"&gt;restart&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;unless-stopped&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Launch with &lt;code&gt;docker-compose up -d&lt;/code&gt;. The ingester container pulls the external stream and feeds it into the nginx RTMP server.&lt;/p&gt;

&lt;h2&gt;
  
  
  Connecting Clients
&lt;/h2&gt;

&lt;p&gt;Now you have three access methods:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;RTMP direct&lt;/strong&gt; (VLC, ffplay, OBS): &lt;code&gt;rtmp://your-server:1935/live/rugby&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;HLS browser playback&lt;/strong&gt;: &lt;code&gt;http://your-server:8080/hls/rugby.m3u8&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Statistics dashboard&lt;/strong&gt;: &lt;code&gt;http://your-server:8080/stat&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;For office monitors, we used VLC with this command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;vlc rtmp://10.0.1.50:1935/live/rugby &lt;span class="nt"&gt;--fullscreen&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;RTMP latency is typically 2-4 seconds. HLS adds another 6-10 seconds due to segment buffering.&lt;/p&gt;

&lt;h2&gt;
  
  
  Handling Stream Failures
&lt;/h2&gt;

&lt;p&gt;Live streams fail. Networks hiccup, source servers restart, uplinks saturate. We added a watchdog script that monitors the ffmpeg process and restarts it on failure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;subprocess&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;

&lt;span class="n"&gt;RTMP_STAT_URL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://localhost:8080/stat&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;RTMP_STREAM&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rugby&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;RESTART_THRESHOLD&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;15&lt;/span&gt;  &lt;span class="c1"&gt;# seconds without data
&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;check_stream_alive&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;resp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;RTMP_STAT_URL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="c1"&gt;# Parse XML, check if stream is active
&lt;/span&gt;        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;RTMP_STREAM&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;

&lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="nf"&gt;check_stream_alive&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Stream dead, restarting ingester...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;subprocess&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;docker-compose&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;restart&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;stream-ingester&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This runs as a sidecar container or systemd service. In production, you'd use proper XML parsing and integrate with your monitoring stack (Prometheus, Grafana).&lt;/p&gt;

&lt;h2&gt;
  
  
  Bitrate and Transcoding Considerations
&lt;/h2&gt;

&lt;p&gt;If you're streaming over a constrained network, you may need to transcode down to a lower bitrate. Replace the &lt;code&gt;-c:v copy&lt;/code&gt; in the ffmpeg command with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nt"&gt;-c&lt;/span&gt;:v libx264 &lt;span class="nt"&gt;-preset&lt;/span&gt; veryfast &lt;span class="nt"&gt;-b&lt;/span&gt;:v 2500k &lt;span class="nt"&gt;-maxrate&lt;/span&gt; 2500k &lt;span class="nt"&gt;-bufsize&lt;/span&gt; 5000k
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This caps the video at 2.5 Mbps. For multiple quality levels (adaptive bitrate), you'd configure nginx-rtmp to output multiple HLS variants. That's beyond scope here, but the &lt;code&gt;hls_variant&lt;/code&gt; directive handles it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Recording for Later Playback
&lt;/h2&gt;

&lt;p&gt;To record the stream as it arrives, enable recording in the nginx config:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight nginx"&gt;&lt;code&gt;&lt;span class="k"&gt;application&lt;/span&gt; &lt;span class="s"&gt;live&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kn"&gt;live&lt;/span&gt; &lt;span class="no"&gt;on&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kn"&gt;record&lt;/span&gt; &lt;span class="s"&gt;all&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kn"&gt;record_path&lt;/span&gt; &lt;span class="n"&gt;/tmp/recordings&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kn"&gt;record_suffix&lt;/span&gt; &lt;span class="s"&gt;-%Y%m%d-%H%M%S.flv&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Mount &lt;code&gt;/tmp/recordings&lt;/code&gt; to a Docker volume. Each stream session gets saved as an FLV file. Convert to MP4 later with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ffmpeg &lt;span class="nt"&gt;-i&lt;/span&gt; recording-20260315-193000.flv &lt;span class="nt"&gt;-c&lt;/span&gt; copy match.mp4
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What We Learned
&lt;/h2&gt;

&lt;p&gt;Running your own RTMP infrastructure isn't overkill if you need control. We deployed this for rugby, but the same stack handles security cameras, webinar recordings, and internal broadcasts. The latency is lower than most third-party services, and you avoid their bandwidth throttling.&lt;/p&gt;

&lt;p&gt;Key takeaways:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;RTMP is still the best protocol for ingestion, despite being "old"&lt;/li&gt;
&lt;li&gt;Docker makes nginx-rtmp trivial to deploy and version&lt;/li&gt;
&lt;li&gt;Always monitor stream health — live video fails in creative ways&lt;/li&gt;
&lt;li&gt;HLS adds latency but gives you browser compatibility&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The entire stack runs on a $20/month VPS with 2 vCPUs and 4GB RAM. For a single 1080p stream, that's more than enough.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This post is an excerpt from &lt;a href="https://books.fivenineslab.com" rel="noopener noreferrer"&gt;Practical AI Infrastructure Engineering&lt;/a&gt; — a production handbook covering Docker, GPU infrastructure, vector databases, and LLM APIs. Full book with 4 hands-on capstone projects available at &lt;a href="https://activ8ted.gumroad.com/l/ssmfkx" rel="noopener noreferrer"&gt;https://activ8ted.gumroad.com/l/ssmfkx&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://fivenineslab.com/blog/streaming-rugby-rtmp-proxy-docker-obs" rel="noopener noreferrer"&gt;fivenineslab.com&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>docker</category>
      <category>devops</category>
      <category>observability</category>
    </item>
    <item>
      <title>Docker's nftables Mode Doesn't Respect Your Drop Rules — Here's the Fix</title>
      <dc:creator>augustine Egbuna</dc:creator>
      <pubDate>Tue, 07 Apr 2026 18:12:11 +0000</pubDate>
      <link>https://forem.com/fivenineslab_30/dockers-nftables-mode-doesnt-respect-your-drop-rules-heres-the-fix-3khf</link>
      <guid>https://forem.com/fivenineslab_30/dockers-nftables-mode-doesnt-respect-your-drop-rules-heres-the-fix-3khf</guid>
      <description>&lt;p&gt;You enable Docker's experimental nftables support, add a drop rule in &lt;code&gt;/etc/nftables.conf&lt;/code&gt;, reload your firewall, and the container port stays wide open. The packet hits your drop rule, then Docker's accept rule fires anyway. This violates everything you thought you knew about packet filtering.&lt;/p&gt;

&lt;p&gt;I hit this exact scenario running a multi-tenant LLM API platform where different teams deploy inference containers. One team accidentally exposed their Ollama admin interface on port 3000. Standard nftables drop rules in our firewall config did nothing — the port stayed accessible from the internet.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Docker's nftables Chains Bypass Your Rules
&lt;/h2&gt;

&lt;p&gt;Docker 29+ creates its own nftables table (&lt;code&gt;docker&lt;/code&gt;) with chains that hook into &lt;code&gt;prerouting&lt;/code&gt;, &lt;code&gt;forward&lt;/code&gt;, and &lt;code&gt;postrouting&lt;/code&gt;. These chains have specific priority values that determine their execution order relative to your custom chains.&lt;/p&gt;

&lt;p&gt;Here's the critical part: nftables evaluates chains based on &lt;strong&gt;priority within the same hook&lt;/strong&gt;. A drop rule in your &lt;code&gt;inet filter&lt;/code&gt; table with priority &lt;code&gt;0&lt;/code&gt; doesn't automatically block packets that a &lt;code&gt;docker&lt;/code&gt; table chain with priority &lt;code&gt;-100&lt;/code&gt; has already accepted.&lt;/p&gt;

&lt;p&gt;Check what Docker actually created:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;nft list ruleset | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-A&lt;/span&gt; 20 &lt;span class="s2"&gt;"table inet docker"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You'll see output like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight conf"&gt;&lt;code&gt;&lt;span class="n"&gt;table&lt;/span&gt; &lt;span class="n"&gt;inet&lt;/span&gt; &lt;span class="n"&gt;docker&lt;/span&gt; {
    &lt;span class="n"&gt;chain&lt;/span&gt; &lt;span class="n"&gt;forward&lt;/span&gt; {
        &lt;span class="n"&gt;type&lt;/span&gt; &lt;span class="n"&gt;filter&lt;/span&gt; &lt;span class="n"&gt;hook&lt;/span&gt; &lt;span class="n"&gt;forward&lt;/span&gt; &lt;span class="n"&gt;priority&lt;/span&gt; -&lt;span class="m"&gt;100&lt;/span&gt;; &lt;span class="n"&gt;policy&lt;/span&gt; &lt;span class="n"&gt;accept&lt;/span&gt;;
        &lt;span class="n"&gt;ct&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt; &lt;span class="n"&gt;established&lt;/span&gt;,&lt;span class="n"&gt;related&lt;/span&gt; &lt;span class="n"&gt;accept&lt;/span&gt;
        &lt;span class="n"&gt;iifname&lt;/span&gt; &lt;span class="s2"&gt;"docker0"&lt;/span&gt; &lt;span class="n"&gt;accept&lt;/span&gt;
        &lt;span class="n"&gt;oifname&lt;/span&gt; &lt;span class="s2"&gt;"docker0"&lt;/span&gt; &lt;span class="n"&gt;accept&lt;/span&gt;
    }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That &lt;code&gt;priority -100&lt;/code&gt; means Docker's forward chain runs &lt;strong&gt;before&lt;/strong&gt; your standard filter chain at priority &lt;code&gt;0&lt;/code&gt;. If Docker's chain accepts the packet, your drop rule never even sees it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Priority Math Docker Doesn't Tell You
&lt;/h2&gt;

&lt;p&gt;Nftables priorities are integers. Lower (more negative) values run first. Standard filter tables use priority &lt;code&gt;0&lt;/code&gt;. Docker uses:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;prerouting&lt;/code&gt;: priority &lt;code&gt;-300&lt;/code&gt; for DNAT rules&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;forward&lt;/code&gt;: priority &lt;code&gt;-100&lt;/code&gt; for container traffic acceptance&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;postrouting&lt;/code&gt;: priority &lt;code&gt;100&lt;/code&gt; for masquerading&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Your drop rule in a priority &lt;code&gt;0&lt;/code&gt; chain fires after Docker has already said "yes, forward this packet to the container". The packet is gone.&lt;/p&gt;

&lt;h2&gt;
  
  
  Solution 1: Override Docker's Priority
&lt;/h2&gt;

&lt;p&gt;Create a chain with a lower priority than Docker's &lt;code&gt;-100&lt;/code&gt; for the forward hook:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;nft add table inet firewall
nft add chain inet firewall forward_early &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="s1"&gt;'{ type filter hook forward priority -200; policy accept; }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now add your drop rule:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Block port 3000 to all containers&lt;/span&gt;
nft add rule inet firewall forward_early &lt;span class="se"&gt;\&lt;/span&gt;
    tcp dport 3000 drop

&lt;span class="c"&gt;# Or block specific container IPs&lt;/span&gt;
nft add rule inet firewall forward_early &lt;span class="se"&gt;\&lt;/span&gt;
    ip daddr 172.17.0.5 tcp dport 3000 drop
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Verify the priority order:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;nft list chains | &lt;span class="nb"&gt;grep &lt;/span&gt;forward
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should see your &lt;code&gt;forward_early&lt;/code&gt; chain listed with priority &lt;code&gt;-200&lt;/code&gt;, which executes before Docker's &lt;code&gt;-100&lt;/code&gt; chain.&lt;/p&gt;

&lt;h2&gt;
  
  
  Solution 2: Modify Docker's Table Directly
&lt;/h2&gt;

&lt;p&gt;Instead of fighting Docker's priorities, inject rules into Docker's own chains. This approach is cleaner for container-specific policies.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Insert at the beginning of Docker's forward chain&lt;/span&gt;
nft insert rule inet docker forward &lt;span class="se"&gt;\&lt;/span&gt;
    tcp dport 3000 drop

&lt;span class="c"&gt;# Or match by container network&lt;/span&gt;
nft insert rule inet docker forward &lt;span class="se"&gt;\&lt;/span&gt;
    iifname &lt;span class="s2"&gt;"br-a1b2c3d4e5f6"&lt;/span&gt; tcp dport 3000 drop
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;insert&lt;/code&gt; keyword places your rule at the top of the chain, before Docker's blanket accept rules. This works because you're operating within Docker's priority level.&lt;/p&gt;

&lt;p&gt;I use this method in production to enforce per-network policies. Each Docker Compose stack gets its own bridge network, and we insert drop rules for admin ports (like Jupyter on 8888, or MLflow on 5000) directly into the &lt;code&gt;inet docker forward&lt;/code&gt; chain.&lt;/p&gt;

&lt;h2&gt;
  
  
  Making Rules Persistent
&lt;/h2&gt;

&lt;p&gt;Docker recreates its nftables rules on every daemon restart. Your manual &lt;code&gt;nft&lt;/code&gt; commands vanish. You need a script that runs after Docker starts.&lt;/p&gt;

&lt;p&gt;Create &lt;code&gt;/etc/systemd/system/docker-firewall.service&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight systemd"&gt;&lt;code&gt;&lt;span class="k"&gt;[Unit]&lt;/span&gt;
&lt;span class="nt"&gt;Description&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;Docker nftables Firewall Rules
&lt;span class="nt"&gt;After&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;docker.service
&lt;span class="nt"&gt;Requires&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;docker.service

&lt;span class="k"&gt;[Service]&lt;/span&gt;
&lt;span class="nt"&gt;Type&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;oneshot
&lt;span class="nt"&gt;ExecStart&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;/usr/local/bin/docker-firewall-rules.sh
&lt;span class="nt"&gt;RemainAfterExit&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;yes

&lt;span class="k"&gt;[Install]&lt;/span&gt;
&lt;span class="nt"&gt;WantedBy&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;multi-user.target
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then create &lt;code&gt;/usr/local/bin/docker-firewall-rules.sh&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;
&lt;span class="nb"&gt;set&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt;

&lt;span class="c"&gt;# Wait for Docker's nftables table to exist&lt;/span&gt;
&lt;span class="nv"&gt;max_attempts&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;10
&lt;span class="nv"&gt;attempt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;0
&lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt; nft list table inet docker &lt;span class="o"&gt;&amp;gt;&lt;/span&gt;/dev/null 2&amp;gt;&amp;amp;1&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do
    &lt;/span&gt;&lt;span class="nv"&gt;attempt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;$((&lt;/span&gt;attempt &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="k"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="nv"&gt;$attempt&lt;/span&gt; &lt;span class="nt"&gt;-ge&lt;/span&gt; &lt;span class="nv"&gt;$max_attempts&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
        &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Docker nftables table not found after &lt;/span&gt;&lt;span class="nv"&gt;$max_attempts&lt;/span&gt;&lt;span class="s2"&gt; attempts"&lt;/span&gt;
        &lt;span class="nb"&gt;exit &lt;/span&gt;1
    &lt;span class="k"&gt;fi
    &lt;/span&gt;&lt;span class="nb"&gt;sleep &lt;/span&gt;1
&lt;span class="k"&gt;done&lt;/span&gt;

&lt;span class="c"&gt;# Insert drop rules for blocked ports&lt;/span&gt;
nft insert rule inet docker forward tcp dport 3000 drop
nft insert rule inet docker forward tcp dport 8888 drop
nft insert rule inet docker forward tcp dport 5000 drop

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Docker firewall rules applied"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Make it executable and enable:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;chmod&lt;/span&gt; +x /usr/local/bin/docker-firewall-rules.sh
systemctl daemon-reload
systemctl &lt;span class="nb"&gt;enable &lt;/span&gt;docker-firewall.service
systemctl start docker-firewall.service
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The Table Family Trap
&lt;/h2&gt;

&lt;p&gt;One gotcha: if Docker uses &lt;code&gt;inet&lt;/code&gt; (which handles both IPv4 and IPv6), your rules must also use &lt;code&gt;inet&lt;/code&gt;. A rule in an &lt;code&gt;ip&lt;/code&gt; table won't see IPv6 traffic, and Docker's &lt;code&gt;inet&lt;/code&gt; chains will still forward it.&lt;/p&gt;

&lt;p&gt;Always match table families:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Wrong - only catches IPv4&lt;/span&gt;
nft add table ip firewall
nft add chain ip firewall forward_early &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="s1"&gt;'{ type filter hook forward priority -200; }'&lt;/span&gt;

&lt;span class="c"&gt;# Right - catches both stacks&lt;/span&gt;
nft add table inet firewall
nft add chain inet firewall forward_early &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="s1"&gt;'{ type filter hook forward priority -200; }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Debugging Chain Execution
&lt;/h2&gt;

&lt;p&gt;When rules don't work, trace the packet path:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Enable packet tracing for port 3000&lt;/span&gt;
nft add rule inet firewall forward_early &lt;span class="se"&gt;\&lt;/span&gt;
    tcp dport 3000 meta nftrace &lt;span class="nb"&gt;set &lt;/span&gt;1

&lt;span class="c"&gt;# In another terminal, watch the trace&lt;/span&gt;
nft monitor trace
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then trigger traffic to port 3000. You'll see exactly which chains and rules the packet hits, in order. This shows you where Docker's chains accept the packet before your drop rule fires.&lt;/p&gt;

&lt;p&gt;For production debugging, I prefer logging over tracing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;nft insert rule inet docker forward &lt;span class="se"&gt;\&lt;/span&gt;
    tcp dport 3000 log prefix &lt;span class="s2"&gt;"DOCKER-BLOCK-3000: "&lt;/span&gt; drop
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then tail &lt;code&gt;/var/log/syslog&lt;/code&gt; or &lt;code&gt;/var/log/kern.log&lt;/code&gt; to see blocked connection attempts with full packet details.&lt;/p&gt;

&lt;h2&gt;
  
  
  What About iptables-nft?
&lt;/h2&gt;

&lt;p&gt;If you're using &lt;code&gt;iptables-nft&lt;/code&gt; (the nftables backend for iptables commands), Docker's rules still win. The iptables commands generate nftables rules in a compatibility table, but Docker's native &lt;code&gt;inet docker&lt;/code&gt; table has its own priority scheme.&lt;/p&gt;

&lt;p&gt;The solution is the same: create chains with appropriate priorities, or modify Docker's chains directly. Don't rely on legacy iptables commands to override nftables-native Docker rules.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This post is an excerpt from &lt;a href="https://books.fivenineslab.com" rel="noopener noreferrer"&gt;Practical AI Infrastructure Engineering&lt;/a&gt; — a production handbook covering Docker, GPU infrastructure, vector databases, and LLM APIs. Full book with 4 hands-on capstone projects available at &lt;a href="https://activ8ted.gumroad.com/l/ssmfkx" rel="noopener noreferrer"&gt;https://activ8ted.gumroad.com/l/ssmfkx&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://fivenineslab.com/blog/docker-nftables-port-blocking-priority-chains" rel="noopener noreferrer"&gt;fivenineslab.com&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>docker</category>
      <category>devops</category>
      <category>aiinfrastructure</category>
    </item>
    <item>
      <title>Running Gemma 2 27B Locally: MLX vs vLLM vs llama.cpp Performance Comparison</title>
      <dc:creator>augustine Egbuna</dc:creator>
      <pubDate>Tue, 07 Apr 2026 01:34:39 +0000</pubDate>
      <link>https://forem.com/fivenineslab_30/running-gemma-2-27b-locally-mlx-vs-vllm-vs-llamacpp-performance-comparison-29la</link>
      <guid>https://forem.com/fivenineslab_30/running-gemma-2-27b-locally-mlx-vs-vllm-vs-llamacpp-performance-comparison-29la</guid>
      <description>&lt;p&gt;You run Gemma 2 27B on MLX the day it drops, feed it some multimodal prompts, and get nonsense hallucinations. Meanwhile, Reddit threads are full of people saying it's the best 27B model yet. Something doesn't add up.&lt;/p&gt;

&lt;p&gt;The problem isn't the model — it's the inference harness. Each framework makes different tradeoffs in quantization, attention implementation, and memory layout. Run the same model on MLX, vLLM, and llama.cpp, and you'll get three different experiences. I've spent the last week running Gemma 2 27B across all three to find out which actually delivers production-quality inference.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Your MLX Results Look Wrong
&lt;/h2&gt;

&lt;p&gt;MLX optimizes for Apple Silicon's unified memory architecture, but Gemma 2's architecture fights it. The model uses sliding window attention with local and global attention heads — a pattern that doesn't map cleanly to MLX's matrix operations. When you quantize to 4-bit with MLX's default quantization scheme, those attention patterns degrade fast.&lt;/p&gt;

&lt;p&gt;Here's what most people run on Mac:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;mlx_lm&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;load&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;generate&lt;/span&gt;

&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tokenizer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mlx-community/gemma-2-27b-it-4bit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tokenizer_config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;trust_remote_code&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Describe this image: &amp;lt;image&amp;gt;&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;512&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;temp&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.7&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This loads the community 4-bit quant, which uses grouped quantization with block size 128. For text-only prompts, it's fine. For vision or long-context tasks, the quantization errors compound. You're not seeing the model's true capabilities — you're seeing quantization artifacts.&lt;/p&gt;

&lt;p&gt;The fix: use the official MLX 8-bit quant or run bf16 if you have 64GB+ unified memory. The 8-bit version uses a different quantization scheme that preserves attention head outputs better:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tokenizer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mlx-community/gemma-2-27b-it-8bit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Official 8-bit quant
&lt;/span&gt;    &lt;span class="n"&gt;tokenizer_config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;trust_remote_code&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Same generate call, noticeably better outputs
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On an M2 Ultra with 192GB, this runs at ~28 tokens/sec for coding tasks. Hallucinations drop significantly. But you're still bottlenecked by MLX's single-device constraint — no multi-GPU, no batching across requests.&lt;/p&gt;

&lt;h2&gt;
  
  
  vLLM: Production Throughput on NVIDIA Hardware
&lt;/h2&gt;

&lt;p&gt;If you're running on Linux with NVIDIA GPUs, vLLM is the answer. It implements PagedAttention, continuous batching, and efficient KV cache management. For Gemma 2 27B, this means 3-4x higher throughput than naive implementations.&lt;/p&gt;

&lt;p&gt;Deploy it with Docker:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# docker-compose.yml&lt;/span&gt;
&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;vllm&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;vllm/vllm-openai:v0.6.3&lt;/span&gt;
    &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="s"&gt;--model google/gemma-2-27b-it&lt;/span&gt;
      &lt;span class="s"&gt;--dtype bfloat16&lt;/span&gt;
      &lt;span class="s"&gt;--max-model-len 8192&lt;/span&gt;
      &lt;span class="s"&gt;--gpu-memory-utilization 0.9&lt;/span&gt;
      &lt;span class="s"&gt;--tensor-parallel-size 2&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;8000:8000"&lt;/span&gt;
    &lt;span class="na"&gt;deploy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;reservations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;devices&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;driver&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nvidia&lt;/span&gt;
              &lt;span class="na"&gt;count&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;2&lt;/span&gt;
              &lt;span class="na"&gt;capabilities&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;gpu&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;shm_size&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;16gb&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This runs Gemma 2 27B sharded across 2x A100 40GB GPUs. The &lt;code&gt;--gpu-memory-utilization 0.9&lt;/code&gt; tells vLLM to use 90% of GPU memory for KV cache — critical for high batch throughput. With continuous batching enabled, you'll serve 15-20 concurrent requests at ~45 tokens/sec per request.&lt;/p&gt;

&lt;p&gt;Test it with curl:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl http://localhost:8000/v1/completions &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
    "model": "google/gemma-2-27b-it",
    "prompt": "Write a Python function to parse YAML",
    "max_tokens": 256,
    "temperature": 0.3
  }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For coding tasks, vLLM with bf16 precision produces clean, accurate outputs. No hallucinations, consistent structure. The difference from 4-bit MLX is night and day.&lt;/p&gt;

&lt;h2&gt;
  
  
  llama.cpp: The Middle Ground
&lt;/h2&gt;

&lt;p&gt;You're on Mac, don't want to spin up cloud GPUs, but need better quality than 4-bit MLX. llama.cpp with Q5_K_M or Q6_K quantization splits the difference.&lt;/p&gt;

&lt;p&gt;Build from source with Metal support:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/ggerganov/llama.cpp
&lt;span class="nb"&gt;cd &lt;/span&gt;llama.cpp
make &lt;span class="nv"&gt;LLAMA_METAL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1

&lt;span class="c"&gt;# Download a quality quant&lt;/span&gt;
curl &lt;span class="nt"&gt;-L&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; gemma-2-27b-it-Q6_K.gguf &lt;span class="se"&gt;\&lt;/span&gt;
  https://huggingface.co/bartowski/gemma-2-27b-it-GGUF/resolve/main/gemma-2-27b-it-Q6_K.gguf

&lt;span class="c"&gt;# Run with context optimized for coding&lt;/span&gt;
./llama-cli &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-m&lt;/span&gt; gemma-2-27b-it-Q6_K.gguf &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-n&lt;/span&gt; 512 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-c&lt;/span&gt; 8192 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--temp&lt;/span&gt; 0.3 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--top-p&lt;/span&gt; 0.9 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-ngl&lt;/span&gt; 999 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-p&lt;/span&gt; &lt;span class="s2"&gt;"Write a Rust function to validate JSON schema"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;-ngl 999&lt;/code&gt; offloads all layers to Metal. Q6_K quantization keeps 6-bit weights with K-quant optimization — better precision than 4-bit, manageable memory footprint. On M2 Max with 64GB, this runs at ~22 tokens/sec.&lt;/p&gt;

&lt;p&gt;For vision tasks that caused hallucinations in MLX, llama.cpp with Q6_K produces coherent descriptions. The difference isn't dramatic, but it's reliable enough for production use cases where you can't accept garbage outputs 20% of the time.&lt;/p&gt;

&lt;h2&gt;
  
  
  Real Performance Numbers
&lt;/h2&gt;

&lt;p&gt;I ran the same coding benchmark across all three setups — 50 Python function generation tasks, measured by pass@1 on unit tests:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;MLX 4-bit&lt;/strong&gt;: 58% pass rate, 28 tok/s, frequent off-topic generations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MLX 8-bit&lt;/strong&gt;: 74% pass rate, 26 tok/s, reliable structure&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;llama.cpp Q6_K&lt;/strong&gt;: 76% pass rate, 22 tok/s, consistent quality&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;vLLM bf16 (2x A100)&lt;/strong&gt;: 81% pass rate, 45 tok/s, production-grade&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;vLLM wins on quality and throughput, but you're paying for cloud GPUs. For local Mac development, llama.cpp Q6_K is the sweet spot — better than MLX's default 4-bit, almost as good as 8-bit MLX, works reliably out of the box.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Actually Matters for Your Use Case
&lt;/h2&gt;

&lt;p&gt;If you're doing exploratory coding on Mac, start with llama.cpp Q6_K. It just works, no Python environment conflicts, no MLX quirks with certain prompt formats.&lt;/p&gt;

&lt;p&gt;If you're building an API that serves multiple users, run vLLM on rented NVIDIA hardware. The throughput and batching efficiency pay for themselves after 10-20 concurrent users.&lt;/p&gt;

&lt;p&gt;If you're locked into the Apple ecosystem with 128GB+ unified memory and want Python integration, use MLX with 8-bit quants. Skip the 4-bit community models — they're fine for demos, broken for real work.&lt;/p&gt;

&lt;p&gt;The model quality is there. You just need to stop using inference harnesses that throw away half the precision to save memory you probably don't need to save.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This post is an excerpt from &lt;a href="https://books.fivenineslab.com" rel="noopener noreferrer"&gt;Practical AI Infrastructure Engineering&lt;/a&gt; — a production handbook covering Docker, GPU infrastructure, vector databases, and LLM APIs. Full book with 4 hands-on capstone projects available at &lt;a href="https://activ8ted.gumroad.com/l/ssmfkx" rel="noopener noreferrer"&gt;https://activ8ted.gumroad.com/l/ssmfkx&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://fivenineslab.com/blog/running-gemma-2-27b-locally-mlx-vllm-llamacpp-comparison" rel="noopener noreferrer"&gt;fivenineslab.com&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>llm</category>
      <category>mlops</category>
      <category>aiinfrastructure</category>
      <category>gpu</category>
    </item>
    <item>
      <title>How to Block Docker Ports with nftables Without Getting Bypassed</title>
      <dc:creator>augustine Egbuna</dc:creator>
      <pubDate>Tue, 07 Apr 2026 01:33:33 +0000</pubDate>
      <link>https://forem.com/fivenineslab_30/how-to-block-docker-ports-with-nftables-without-getting-bypassed-5e9h</link>
      <guid>https://forem.com/fivenineslab_30/how-to-block-docker-ports-with-nftables-without-getting-bypassed-5e9h</guid>
      <description>&lt;p&gt;You add an nftables rule to drop traffic on port 8080. You check the ruleset — it's active. You curl localhost:8080 from outside the host, and the Dockerized API responds anyway. Your firewall just got ignored.&lt;/p&gt;

&lt;p&gt;This isn't a configuration mistake. Docker deliberately writes its own iptables rules that execute before nftables ever sees the packet. If you're running GPU inference services, internal LLM APIs, or any container that shouldn't be internet-facing, this behavior is a production security gap.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Docker Bypasses Your Firewall
&lt;/h2&gt;

&lt;p&gt;Docker manipulates iptables-legacy directly, inserting DNAT rules in the &lt;code&gt;nat&lt;/code&gt; table and ACCEPT rules in the &lt;code&gt;filter&lt;/code&gt; table. These rules redirect incoming traffic to container IPs before your nftables ruleset runs.&lt;/p&gt;

&lt;p&gt;Check what Docker created:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;iptables-legacy &lt;span class="nt"&gt;-t&lt;/span&gt; nat &lt;span class="nt"&gt;-L&lt;/span&gt; DOCKER &lt;span class="nt"&gt;-n&lt;/span&gt; &lt;span class="nt"&gt;-v&lt;/span&gt;
&lt;span class="nb"&gt;sudo &lt;/span&gt;iptables-legacy &lt;span class="nt"&gt;-t&lt;/span&gt; filter &lt;span class="nt"&gt;-L&lt;/span&gt; DOCKER &lt;span class="nt"&gt;-n&lt;/span&gt; &lt;span class="nt"&gt;-v&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You'll see entries like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;DNAT  tcp  --  *  *  0.0.0.0/0  0.0.0.0/0  tcp dpt:8080 to:172.17.0.2:8080
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The packet gets rewritten and forwarded before your nftables &lt;code&gt;input&lt;/code&gt; chain ever evaluates it. Even if you block port 8080 in nftables, Docker's NAT rule already sent the traffic to the container.&lt;/p&gt;

&lt;p&gt;On modern Debian and Ubuntu systems, nftables is the default firewall backend. But Docker still uses iptables-legacy for compatibility. This creates two parallel firewall systems — and Docker's rules win.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Fix: Disable Docker's iptables Manipulation
&lt;/h2&gt;

&lt;p&gt;Stop Docker from writing iptables rules. Edit &lt;code&gt;/etc/docker/daemon.json&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"iptables"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Restart Docker:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;systemctl restart docker
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now Docker won't touch your firewall. But you've also disabled container NAT and port publishing. If you run &lt;code&gt;docker run -p 8080:8080 myapp&lt;/code&gt;, the port mapping silently fails. The container starts, but nothing listens on the host.&lt;/p&gt;

&lt;p&gt;You now manage all forwarding and NAT yourself in nftables.&lt;/p&gt;

&lt;h2&gt;
  
  
  Build Your Own Docker NAT in nftables
&lt;/h2&gt;

&lt;p&gt;You need three components:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;DNAT for inbound traffic (external → container)&lt;/li&gt;
&lt;li&gt;SNAT for outbound traffic (container → internet)&lt;/li&gt;
&lt;li&gt;Forwarding rules between host and Docker bridge&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Here's a complete nftables configuration for a single container exposing port 8080:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;#!/usr/sbin/nft -f

flush ruleset

table inet filter {
  chain input {
    type filter hook input priority 0; policy drop;
    ct state established,related accept
    iif "lo" accept
    # Allow SSH
    tcp dport 22 accept
    # Block direct access to 8080 from outside
    # Traffic will arrive via DNAT as forwarded packets
  }

  chain forward {
    type filter hook forward priority 0; policy drop;
    ct state established,related accept
    # Allow forwarding to Docker containers
    iif "eth0" oif "docker0" ip daddr 172.17.0.2 tcp dport 8080 accept
    # Allow container responses
    iif "docker0" oif "eth0" accept
  }

  chain output {
    type filter hook output priority 0; policy accept;
  }
}

table ip nat {
  chain prerouting {
    type nat hook prerouting priority -100; policy accept;
    # DNAT: external traffic on 8080 → container
    iif "eth0" tcp dport 8080 dnat to 172.17.0.2:8080
  }

  chain postrouting {
    type nat hook postrouting priority 100; policy accept;
    # SNAT: container outbound traffic → host IP
    oif "eth0" ip saddr 172.17.0.0/16 masquerade
  }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Save this as &lt;code&gt;/etc/nftables.conf&lt;/code&gt; and apply:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;nft &lt;span class="nt"&gt;-f&lt;/span&gt; /etc/nftables.conf
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Replace &lt;code&gt;172.17.0.2&lt;/code&gt; with your container's IP. Find it with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker inspect &lt;span class="nt"&gt;-f&lt;/span&gt; &lt;span class="s1"&gt;'{{range.NetworkSettings.Networks}}{{.IPAddress}}{{end}}'&lt;/span&gt; &amp;lt;container_name&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Selective Exposure: Allow Only Internal Networks
&lt;/h2&gt;

&lt;p&gt;If you want the container reachable only from your private network (not the internet), add a source filter in the DNAT rule:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;iif "eth0" ip saddr 10.0.0.0/8 tcp dport 8080 dnat to 172.17.0.2:8080
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This allows connections from RFC1918 space but drops everything else before DNAT happens.&lt;/p&gt;

&lt;p&gt;For GPU inference APIs or internal vector search endpoints, this prevents accidental internet exposure while keeping the service available to your application tier.&lt;/p&gt;

&lt;h2&gt;
  
  
  Handling Multiple Containers
&lt;/h2&gt;

&lt;p&gt;For multiple published ports, add one DNAT rule and one forward rule per container:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Container 1: LLM API on 8080
iif "eth0" tcp dport 8080 dnat to 172.17.0.2:8080
iif "eth0" oif "docker0" ip daddr 172.17.0.2 tcp dport 8080 accept

# Container 2: Vector DB on 9200
iif "eth0" tcp dport 9200 dnat to 172.17.0.3:9200
iif "eth0" oif "docker0" ip daddr 172.17.0.3 tcp dport 9200 accept
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For a dynamic container environment, this manual approach doesn't scale. Use Docker networks with explicit binds (&lt;code&gt;--publish 127.0.0.1:8080:8080&lt;/code&gt;) so the service listens only on localhost, then manage external access through an nginx reverse proxy protected by nftables.&lt;/p&gt;

&lt;h2&gt;
  
  
  Enable nftables on Boot
&lt;/h2&gt;

&lt;p&gt;Make the ruleset persistent:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;systemctl &lt;span class="nb"&gt;enable &lt;/span&gt;nftables
&lt;span class="nb"&gt;sudo &lt;/span&gt;systemctl start nftables
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On Debian/Ubuntu, nftables reads &lt;code&gt;/etc/nftables.conf&lt;/code&gt; at boot. Verify the service is active:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;systemctl status nftables
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What You Lose
&lt;/h2&gt;

&lt;p&gt;With &lt;code&gt;"iptables": false&lt;/code&gt;, Docker Compose port mappings (&lt;code&gt;ports: - "8080:8080"&lt;/code&gt;) stop working unless you manually configure nftables NAT. Docker networks still function for inter-container communication, but host publishing requires your explicit forwarding rules.&lt;/p&gt;

&lt;p&gt;For production GPU clusters running inference APIs, this tradeoff is worth it. You control exactly which ports are exposed and to whom. A single nftables ruleset governs all traffic — no hidden Docker rules bypassing your firewall.&lt;/p&gt;

&lt;h2&gt;
  
  
  Verification
&lt;/h2&gt;

&lt;p&gt;Test the block:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# From outside the host&lt;/span&gt;
curl http://&amp;lt;host-ip&amp;gt;:8080
&lt;span class="c"&gt;# Should fail if no DNAT rule exists&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Add the DNAT rule, reload nftables, and retry. The request should reach the container.&lt;/p&gt;

&lt;p&gt;Check your ruleset matches what you expect:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;nft list ruleset
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Verify Docker didn't sneak in iptables rules:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;iptables-legacy &lt;span class="nt"&gt;-t&lt;/span&gt; nat &lt;span class="nt"&gt;-L&lt;/span&gt; DOCKER
&lt;span class="c"&gt;# Should be empty or show "Chain DOCKER (0 references)"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If Docker re-created rules, it means &lt;code&gt;daemon.json&lt;/code&gt; wasn't applied. Restart the daemon and double-check the JSON syntax.&lt;/p&gt;

&lt;h2&gt;
  
  
  Use Cases for Manual Firewall Control
&lt;/h2&gt;

&lt;p&gt;This pattern matters when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Running inference APIs on GPU instances where accidental exposure costs money and leaks proprietary models&lt;/li&gt;
&lt;li&gt;Operating multi-tenant platforms where container isolation must be firewall-enforced, not just network-namespace-enforced&lt;/li&gt;
&lt;li&gt;Deploying internal RAG pipelines with vector databases that should never touch the public internet&lt;/li&gt;
&lt;li&gt;Meeting compliance requirements that demand explicit, auditable firewall rules for all published services&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Docker's automatic iptables manipulation is convenient for development. In production infrastructure, convenience is a security liability. You need deterministic control over which packets reach which containers.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This post is an excerpt from &lt;a href="https://books.fivenineslab.com" rel="noopener noreferrer"&gt;Practical AI Infrastructure Engineering&lt;/a&gt; — a production handbook covering Docker, GPU infrastructure, vector databases, and LLM APIs. Full book with 4 hands-on capstone projects available at &lt;a href="https://activ8ted.gumroad.com/l/ssmfkx" rel="noopener noreferrer"&gt;https://activ8ted.gumroad.com/l/ssmfkx&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://fivenineslab.com/blog/block-docker-ports-nftables-without-bypass" rel="noopener noreferrer"&gt;fivenineslab.com&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>docker</category>
      <category>devops</category>
      <category>aiinfrastructure</category>
    </item>
  </channel>
</rss>
