<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Diogo Martins</title>
    <description>The latest articles on Forem by Diogo Martins (@mda2av).</description>
    <link>https://forem.com/mda2av</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3673839%2Ff0e7f816-9454-449a-b147-f60289176d06.png</url>
      <title>Forem: Diogo Martins</title>
      <link>https://forem.com/mda2av</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/mda2av"/>
    <language>en</language>
    <item>
      <title>HttpArena - Benchmark Web Frameworks</title>
      <dc:creator>Diogo Martins</dc:creator>
      <pubDate>Mon, 20 Apr 2026 21:37:39 +0000</pubDate>
      <link>https://forem.com/mda2av/httparena-benchmark-web-frameworks-4328</link>
      <guid>https://forem.com/mda2av/httparena-benchmark-web-frameworks-4328</guid>
      <description>&lt;p&gt;&lt;a href="https://www.http-arena.com" rel="noopener noreferrer"&gt;HttpArena&lt;/a&gt; is a recent project which goal is to build an open source platform where web frameworks are benchmarked for throughput performance, CPU usage, memory consumption and latency. Sounds familiar? Yes.. there are a few of them so what does HttpArena brings different?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Much broader test coverage including Http/1.1, Http/2, Http/3, gRPC and Websocket tests.&lt;/li&gt;
&lt;li&gt;Community driven, we are not a company or sponsored by companies that compete in the benchmarks.&lt;/li&gt;
&lt;li&gt;Tries to benchmark workloads closer to you see in real world applications including benchmarks with reverse proxies, caching services like redis and distributed systems.&lt;/li&gt;
&lt;li&gt;Cares all about competitive fairness, entries are subdivided by Production, Tuned, Infrastructure and Engine, these have clear specific rules to avoid comparing apples to oranges and potatoes.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  So, what does HttpArena wants to benchmark?
&lt;/h2&gt;

&lt;p&gt;Both micro benchmarks and workload types, see the full test suite &lt;a href="https://www.http-arena.com/docs/test-profiles/" rel="noopener noreferrer"&gt;here&lt;/a&gt;. &lt;/p&gt;

&lt;h2&gt;
  
  
  Target audience
&lt;/h2&gt;

&lt;p&gt;We target both framework developers and users, while micro benchmarks target development metrics, every test specially workload like ones can be useful for users when picking a technology or framework.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hardware
&lt;/h2&gt;

&lt;p&gt;Currently we run all the benchmarks in a single box, a 64 core AMD Threadripper with 256GB RAM. This has pros and cons, some pros are the fact that we don't get networking bottlenecked as we would if using a multiple box option like having 2 or more servers. The cons are sharing the cpu between server and load generators, while this is not optimal we take a lot of measures to minimize this such as running the servers and load generators in separate containers with pinned Cores for each, see &lt;a href="https://www.http-arena.com/docs/hardware/" rel="noopener noreferrer"&gt;Hardware and Topology&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Who can join?
&lt;/h2&gt;

&lt;p&gt;Everyone is welcome! Not just to add a framework but also to improve the existing implementations and give us valuable feedback and ideas on existing and new tests.&lt;/p&gt;

&lt;h2&gt;
  
  
  Instant results
&lt;/h2&gt;

&lt;p&gt;Open a PR and we benchmark it directly before approving or merging, results in 10 mins after opening PR of a maintainer is online.&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>performance</category>
      <category>showdev</category>
      <category>webdev</category>
    </item>
    <item>
      <title>HTTP11Probe Compliance Platform</title>
      <dc:creator>Diogo Martins</dc:creator>
      <pubDate>Sat, 14 Feb 2026 17:00:03 +0000</pubDate>
      <link>https://forem.com/mda2av/http11-compliance-platform-1co5</link>
      <guid>https://forem.com/mda2av/http11-compliance-platform-1co5</guid>
      <description>&lt;p&gt;An open testing platform that probes HTTP/1.1 servers against RFC 9110/9112 requirements, smuggling vectors, and malformed input handling. Add your framework, get compliance results automatically. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://mda2av.github.io/Http11Probe/" rel="noopener noreferrer"&gt;Platform Website&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Check the &lt;a href="https://mda2av.github.io/Http11Probe/probe-results/" rel="noopener noreferrer"&gt;Leaderboard&lt;/a&gt; for various web frameworks.&lt;/p&gt;

</description>
      <category>networking</category>
      <category>security</category>
      <category>showdev</category>
      <category>testing</category>
    </item>
    <item>
      <title>uRocket - Reactor Networking in C# with io_uring</title>
      <dc:creator>Diogo Martins</dc:creator>
      <pubDate>Mon, 29 Dec 2025 23:45:10 +0000</pubDate>
      <link>https://forem.com/mda2av/urocket-reactor-networking-in-c-with-iouring-1j95</link>
      <guid>https://forem.com/mda2av/urocket-reactor-networking-in-c-with-iouring-1j95</guid>
      <description>&lt;p&gt;As a network performance enthusiast I've worked with multiple HTTP web frameworks using the C# System.Net.Socket as the interface between the framework and the OS. Working mainly in Linux, one of the aspects that always frustrated me was the non-existent support for io_uring in C#(Socket uses &lt;a href="https://man7.org/linux/man-pages/man7/epoll.7.html" rel="noopener noreferrer"&gt;epoll&lt;/a&gt;), so I guess, it was time to do it myself.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/MDA2AV/uRocket" rel="noopener noreferrer"&gt;uRocket&lt;/a&gt; (micro ring socket) is a single acceptor multi reactor architecture with await/async support, this means that as a user I can await reads from the wire and write to it as I please. The acceptor and reactors are fully customizable relying on a C-written shim. Basically, uRocket (C#) interops with liburingshim (compiled from uringshim.c) which is an interface between the C# and &lt;a href="https://github.com/axboe/liburing" rel="noopener noreferrer"&gt;liburing&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is the reactor pattern and why it matters
&lt;/h2&gt;

&lt;p&gt;The reactor pattern decouples I/O operations from application threads by using event notification to multiplex thousands of connections across a small thread pool. In uRocket, a single acceptor thread handles incoming connections via io_uring's multishot accept, distributing clients across reactor threads—each owning its own io_uring instance, buffer ring, and connection table. This architecture can eliminate thread-per-connection overhead while avoiding cross-thread contention entirely. With io_uring, reactors achieve unprecedented efficiency: submissions and completions occur through shared memory rings requiring zero syscalls in steady state, the kernel selects receive buffers directly from pre-registered rings (true zero-copy), and multishot operations fire hundreds of completions from a single submission. Application code can still spawn one task per connection for familiar async/await patterns—the reactor handles I/O multiplexing underneath while your code remains sequential and readable. This combination of reactor-pattern efficiency with idiomatic C# async/await is what makes uRocket unique.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;a href="https://kernel.dk/io_uring.pdf" rel="noopener noreferrer"&gt;io_uring&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;io_uring is a modern Linux kernel interface (introduced in 2019) that revolutionizes asynchronous I/O by replacing the traditional "syscall-per-operation" model with shared memory ring buffers. Instead of calling read(), write(), or accept() individually—each triggering expensive kernel transitions—applications submit I/O requests by writing entries to a Submission Queue (SQ) that lives in memory shared between userspace and the kernel. The kernel processes these asynchronously and reports completions via a Completion Queue, also in shared memory. Once initialized, most operations require zero syscalls: applications write SQEs (Submission Queue Events), the kernel polls for work (especially with SQPOLL mode), processes requests, and writes CQEs(Completion Queue Events)—all without crossing the userspace/kernel boundary. io_uring introduces powerful features beyond older APIs like epoll: multishot operations allow a single submission to produce hundreds of completions (one accept submission handles all incoming connections), buffer rings let the kernel select pre-registered buffers and return only a 16-bit ID rather than copying data, and batching enables processing thousands of events per iteration. This design eliminates the fundamental bottlenecks of traditional I/O: syscall overhead, data copies, and resubmission costs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Benchmarking uRocket vs System.Net.Socket
&lt;/h2&gt;

&lt;p&gt;Since I do not own multiple server machines or top of the like Network Interface Cards, there is of course some level of noise in these benchmarks. The load is generated using &lt;a href="https://github.com/wg/wrk" rel="noopener noreferrer"&gt;wrk&lt;/a&gt; and the source code for each:&lt;br&gt;
&lt;a href="https://github.com/MDA2AV/uRocket/blob/main/Playground/Program.cs" rel="noopener noreferrer"&gt;uRocket&lt;/a&gt;&lt;br&gt;
&lt;a href="https://dotnetfiddle.net/E4cNqF" rel="noopener noreferrer"&gt;System.Net.Socket&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A few notes: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Non pipelined requests&lt;/li&gt;
&lt;li&gt;No HTTP parsing, this is not a HTTP framework benchmark.&lt;/li&gt;
&lt;li&gt;No TCP fragmentation so each request - one response&lt;/li&gt;
&lt;li&gt;Requests are sent through localhost and the load (wrk) is running on the same machine as the webservers, sadly due to budget issues which causes some bottleneck as we will see.&lt;/li&gt;
&lt;li&gt;Both uRocket and System.NET.Sockets are built as native AoT with exact same flags:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;lt;PropertyGroup&amp;gt;
    &amp;lt;ServerGarbageCollection&amp;gt;true&amp;lt;/ServerGarbageCollection&amp;gt;
    &amp;lt;TieredPGO&amp;gt;true&amp;lt;/TieredPGO&amp;gt;
    &amp;lt;SelfContained&amp;gt;true&amp;lt;/SelfContained&amp;gt;
&amp;lt;/PropertyGroup&amp;gt;

 &amp;lt;ItemGroup Condition="$(PublishAot) == 'true'"&amp;gt;
        &amp;lt;RuntimeHostConfigurationOption Include="System.Threading.ThreadPool.HillClimbing.Disable" Value="true" /&amp;gt;
&amp;lt;/ItemGroup&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;OS: Ubuntu Server 24.04, .NET 10&lt;br&gt;
Processor: i9 14900K&lt;br&gt;
RAM: 64GB 6000MHz&lt;/p&gt;

&lt;p&gt;uRocket is still in early development phase so these results will likely be at least a little bit different in the future, take it with a little grain of salt, of course as the uRocket maintainer, I am biased. Maybe the code for System.Net.Socket could have some better optimization for the Socket configuration, I checked it vs &lt;a href="https://github.com/TechEmpower/FrameworkBenchmarks/tree/master/frameworks/CSharp/aspnetcore/src/Platform" rel="noopener noreferrer"&gt;Microsoft's asp net platform entry&lt;/a&gt;(uses System.Net.Socket) at TechEmpower benchmarks and it was significantly better performing so it looks legit to me. &lt;/p&gt;

&lt;h2&gt;
  
  
  Results
&lt;/h2&gt;

&lt;p&gt;During the benchmarking I configured uRocket for a different number of reactors, always armed with multishot with or without SQPolling. The System.Net.Socket config remained always the same so this can be a point in favour of System.Net.Socket.&lt;/p&gt;

&lt;p&gt;In the results table we can find:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The type (uRocket or Socket)&lt;/li&gt;
&lt;li&gt;Nº of Reactors (only applies for uRocket)&lt;/li&gt;
&lt;li&gt;Load (wrk command parameters)&lt;/li&gt;
&lt;li&gt;CPU usage - i9 14900k has 32 Threads so each 100% - 1 Thread&lt;/li&gt;
&lt;li&gt;RPS, Requests per Second (Average of 10 runs each)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  High Load -t&amp;gt;16 -c512
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0tfe84rnsh6cn6iq4rmu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0tfe84rnsh6cn6iq4rmu.png" alt=" " width="800" height="686"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;uRocket delivers much less CPU usage even for higher RPS values, I noticed during the test that my hardware bottlenecks uRocket for nº of reactors &amp;gt; 12, this is because for higher RPS the load generator (wrk) also needs more CPU, we can see that for example for nº reactors &amp;lt; 12 the CPU usage is linear to the nº of reactors and then it plateaus.&lt;/p&gt;

&lt;p&gt;We can also see that the perceived efficiency (RPS/CPU) is inversely proportional to the nº of reactors, best case is for 4 reactors(4377) and worst case for 16 reactors(2503) while for Socket all results are similar(~1700), which makes sense because as there is no thread pinning, there is an OS optimization to run the reactors on the best CPU threads i9 14900k has few performance cores, also the wrk load is lower.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lower Load -t&amp;lt;16
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flwar9x9vfxji8w7ydlbh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flwar9x9vfxji8w7ydlbh.png" alt=" " width="800" height="636"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For lower load Socket somehow pulls a lot of CPU usage, the results are way too favorable towards io_uring seeing 4x better RPS/CPU ration for many cases, again, the Socket implementation might not be fully optimized.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;When I started this benchmarking I had one reference from an older reddit &lt;a href="https://www.reddit.com/r/dotnet/comments/gz3k23/23_more_throughput_and_74_less_latency_with/" rel="noopener noreferrer"&gt;post&lt;/a&gt; stating 23% extra performance for io_uring which can actually be seen for the maximum RPS 3_354_231 vs 2_728_015 (~23% more performance). I also found online in other benchmarks that typically io_uring solutions consume up to 50% less CPU usage so it also checks out. Hope this was interesting for you and again, this is a simple benchmark and may have inaccuracies.&lt;/p&gt;

&lt;h2&gt;
  
  
  Future Work
&lt;/h2&gt;

&lt;p&gt;uRocket still requires some features and polishing, after that it will be integrated into frameworks such as &lt;a href="https://github.com/MDA2AV/Wired.IO" rel="noopener noreferrer"&gt;Wired.IO&lt;/a&gt; and &lt;a href="https://github.com/Kaliumhexacyanoferrat/GenHTTP" rel="noopener noreferrer"&gt;GenHTTP&lt;/a&gt; to test it on an actual HTTP framework. &lt;/p&gt;

&lt;p&gt;Latency and startup time tests are also planned as it's extremely fast booting specially with native AoT.&lt;/p&gt;

</description>
      <category>csharp</category>
      <category>linux</category>
      <category>performance</category>
      <category>networking</category>
    </item>
    <item>
      <title>From Serial Ports to WebSockets: Debugging Across Two Worlds</title>
      <dc:creator>Diogo Martins</dc:creator>
      <pubDate>Tue, 23 Dec 2025 14:07:19 +0000</pubDate>
      <link>https://forem.com/mda2av/from-serial-ports-to-websockets-debugging-across-two-worlds-2l7o</link>
      <guid>https://forem.com/mda2av/from-serial-ports-to-websockets-debugging-across-two-worlds-2l7o</guid>
      <description>&lt;p&gt;As an embedded C developer, I can say that I spend some (more than I wish) time in what I usually call the debugging loop: build binaries → flash → execute → measure some signal on my oscilloscope, rinse and repeat. Unlike high-level software development, it is often not simple to extract the information we need while debugging. One of the most common techniques is to wire up a simple UART serial port communication between a microcontroller and a PC and log some messages while the firmware is running — such a fantastic tool: full-duplex, easy to configure, and reliable communication between two targets.&lt;/p&gt;

&lt;p&gt;For over a year now, I’ve been delving into the world of networking, and once again I often find myself needing to take advantage of a channel for debugging — but this time, a different one: the TCP channel. As a Linux user, higher-level languages like Java or Python are quite handy for wiring up a simple TCP socket and flushing some bytes up and down. However, when it comes to browsers, things are not so simple. We need to follow a protocol supported by the browser, such as WebSockets, which are not as simple as they might appear.&lt;br&gt;
A typical use case I am faced with is connecting a Linux-based embedded system — which typically has no visual output — to my development machine, which hosts a simple frontend application that allows me to debug and monitor multiple external systems.&lt;/p&gt;

&lt;p&gt;What I did not expect is that one day I would be using C# as my main high-level programming language on Linux. Big props to Microsoft and the fantastic work done with .NET cross-platform. Programming languages are tools, and coming from C, C# offers great value when it comes to quickly deploying something — whether for debugging, a DevOps script, or a quick prototype — while still providing the option of manual memory control and surprisingly high performance, awkwardly close to C++ or Rust.&lt;/p&gt;

&lt;p&gt;Enter &lt;a href="//genhttp.org"&gt;GenHTTP&lt;/a&gt;.&lt;br&gt;
A third-party C# library that quickly rose to my list of favorites. The sheer utility it provides for building a quick HTTP web server is unparalleled compared to everything I’ve used, from Python to Java to C#. Today, I’d love to present a small piece of code showing how to wire up a very simple WebSocket using this library.&lt;/p&gt;

&lt;p&gt;For the more curious, here is the official documentation on how to build a WebSocket with GenHTTP:&lt;/p&gt;

&lt;p&gt;Echo WebSocket Server&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="nn"&gt;GenHTTP.Engine.Internal&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="nn"&gt;GenHTTP.Modules.Websockets&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;websocket&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Websocket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Functional&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;OnMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;connection&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;=&amp;gt;&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;connection&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WriteAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Data&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;Host&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Create&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Handler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;websocket&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;RunAsync&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This tiny piece of code hosts a server at localhost:8080 and can be easily modified to fit your needs. There are multiple flavors available, but I prefer the functional one, as it keeps everything more compact for me.&lt;/p&gt;

&lt;p&gt;There is, of course, a lot more you can do with this powerful library when it comes to WebSockets. Personally, I often find myself doing very basic things, and for that use case, I extract a lot of value from it.&lt;/p&gt;

</description>
      <category>tooling</category>
      <category>iot</category>
      <category>c</category>
      <category>networking</category>
    </item>
  </channel>
</rss>
