<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Rom</title>
    <description>The latest articles on Forem by Rom (@romclerix).</description>
    <link>https://forem.com/romclerix</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3874797%2Fdbc37ee1-89fc-4698-b09b-e84ad93f274d.png</url>
      <title>Forem: Rom</title>
      <link>https://forem.com/romclerix</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/romclerix"/>
    <language>en</language>
    <item>
      <title>How We Reverse Engineered a TLS Fingerprinting System</title>
      <dc:creator>Rom</dc:creator>
      <pubDate>Thu, 16 Apr 2026 11:33:19 +0000</pubDate>
      <link>https://forem.com/romclerix/how-we-reverse-engineered-a-tls-fingerprinting-system-3o6d</link>
      <guid>https://forem.com/romclerix/how-we-reverse-engineered-a-tls-fingerprinting-system-3o6d</guid>
      <description>&lt;p&gt;Modern anti-bot infrastructure doesn't just look at what you send - it looks at &lt;em&gt;how&lt;/em&gt; you connect. TLS fingerprinting is one of the most effective and least-understood layers of bot detection. Here's how we pulled it apart.&lt;/p&gt;




&lt;h2&gt;
  
  
  The problem
&lt;/h2&gt;

&lt;p&gt;At &lt;a href="https://clerix.io" rel="noopener noreferrer"&gt;Clerix&lt;/a&gt;, we build real-time intelligence infrastructure. That means our systems need to maintain clean, stable connections to extract structured data at scale - deterministically, reliably, and without triggering detection layers that have nothing to do with the content of the request.&lt;/p&gt;

&lt;p&gt;One day, connections that had been stable for months started failing silently. No HTTP error. No rate limit. Just... nothing. The connection would complete, a valid response would come back, and then subsequent requests from the same session would be met with garbage data or empty bodies.&lt;/p&gt;

&lt;p&gt;We ruled out IP reputation. We ruled out cookie or session state. We eventually isolated it to the TLS handshake itself.&lt;/p&gt;




&lt;h2&gt;
  
  
  What is TLS fingerprinting?
&lt;/h2&gt;

&lt;p&gt;When a client initiates a TLS connection, it sends a &lt;code&gt;ClientHello&lt;/code&gt; message. This message contains:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cipher suites&lt;/strong&gt; - the list of encryption algorithms the client supports, in order&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Extensions&lt;/strong&gt; - features like SNI, ALPN, session tickets, supported groups&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Compression methods&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;TLS version&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The combination of these fields - especially the &lt;em&gt;order&lt;/em&gt; - forms a near-unique fingerprint of the client library being used.&lt;/p&gt;

&lt;p&gt;The two dominant fingerprinting standards you'll encounter are:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;JA3&lt;/strong&gt; - MD5 hash of: TLS version, ciphers, extensions, elliptic curves, and elliptic curve point formats. Developed by Salesforce, widely used.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;JA3N / JA3S&lt;/strong&gt; - Variants that normalize extension order or fingerprint the server response.&lt;/p&gt;

&lt;p&gt;Here's what a JA3 string looks like before hashing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;771,4866-4867-4865-49196-49200-159-52393-52392-52394-49195-49199-158-49188-49192-107-49187-49191-103-49162-49172-57-49161-49171-51-157-156-61-60-53-47-255,0-11-10-13172-16-22-23-49-13-43-45-51-21,29-23-24-25,0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Breaking it down:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;771&lt;/code&gt; = TLS 1.2&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;4866-4867-...&lt;/code&gt; = cipher suite list (decimal)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;0-11-10-...&lt;/code&gt; = extension types&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;29-23-24-25&lt;/code&gt; = supported elliptic curves&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;0&lt;/code&gt; = EC point formats&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Every major HTTP client has a recognizable JA3. curl has one. Python's &lt;code&gt;requests&lt;/code&gt; library has one. Node's &lt;code&gt;https&lt;/code&gt; module has one. They're catalogued and blacklisted.&lt;/p&gt;




&lt;h2&gt;
  
  
  Capturing the handshake
&lt;/h2&gt;

&lt;p&gt;Our first step was passive observation - capturing what our outbound ClientHello actually looked like from the server's perspective.&lt;/p&gt;

&lt;p&gt;We stood up a simple TLS inspection proxy using &lt;code&gt;mitmproxy&lt;/code&gt; with a custom addon:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;mitmproxy&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;struct&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;TLSInspector&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;tls_client_hello&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;hello&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;client_hello&lt;/span&gt;
        &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SNI: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;hello&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;extensions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;server_name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Ciphers: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;hello&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cipher_suites&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Extensions: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;type&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;hello&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;extensions&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;addons&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;TLSInspector&lt;/span&gt;&lt;span class="p"&gt;()]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We also used Wireshark with the &lt;code&gt;tls&lt;/code&gt; display filter and exported the ClientHello bytes directly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;tls&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;handshake&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then we fed the raw bytes into a local JA3 calculator to confirm what hash we were generating. The result matched what we expected: our hash was showing up in commercial threat intel feeds as "non-browser."&lt;/p&gt;




&lt;h2&gt;
  
  
  What makes a fingerprint detectable
&lt;/h2&gt;

&lt;p&gt;The key insight is that TLS fingerprints aren't just about which features you claim to support - they're about the &lt;em&gt;default behavior&lt;/em&gt; of the underlying library.&lt;/p&gt;

&lt;p&gt;Python's &lt;code&gt;ssl&lt;/code&gt; module, for example, hardcodes cipher suite order based on OpenSSL's compiled defaults. Even if you upgrade TLS versions, the order is deterministic and well-known.&lt;/p&gt;

&lt;p&gt;Specific tells we found:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Cipher suite ordering&lt;/strong&gt; - Python/OpenSSL prefers different suites than Chrome's BoringSSL&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Extension presence and order&lt;/strong&gt; - Chrome includes a &lt;code&gt;compress_certificate&lt;/code&gt; extension (type &lt;code&gt;27&lt;/code&gt;). Most HTTP libraries don't.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GREASE values&lt;/strong&gt; - Chrome injects random "garbage" values (GREASE - Generate Random Extensions And Sustain Extensibility) into the handshake to prevent ossification. JA3N was specifically designed to normalize these out.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Padding extension&lt;/strong&gt; - Chrome pads its ClientHello to avoid certain sizes that trigger middle-box bugs. Pure library clients don't.&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Spoofing the fingerprint
&lt;/h2&gt;

&lt;p&gt;Once we understood the signal, we had several options:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Option 1: Use a browser engine directly&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Tools like Playwright/Puppeteer use actual browser TLS stacks. Effective, but heavyweight for infrastructure that needs throughput.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Option 2: Patch OpenSSL at runtime&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Possible, fragile, not portable.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Option 3: Use a library that gives you control&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is the approach that scaled. Libraries like &lt;code&gt;curl-impersonate&lt;/code&gt; compile curl against BoringSSL (Chrome's TLS library) and expose Chrome's exact cipher/extension order. There are Python wrappers (&lt;code&gt;curl_cffi&lt;/code&gt;) that expose this at the session level:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;curl_cffi&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;cffi_requests&lt;/span&gt;

&lt;span class="n"&gt;session&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cffi_requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Session&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;impersonate&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;chrome120&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://target.example.com&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Under the hood, this sends a ClientHello that is byte-for-byte identical to Chrome 120. Same cipher suites, same extensions, same GREASE, same padding.&lt;/p&gt;

&lt;p&gt;We benchmarked this against our previous stack:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Approach&lt;/th&gt;
&lt;th&gt;JA3 Hash&lt;/th&gt;
&lt;th&gt;Detection rate&lt;/th&gt;
&lt;th&gt;Throughput&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;requests&lt;/code&gt; + &lt;code&gt;httpx&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;d9f4be3f...&lt;/code&gt; (Python/OpenSSL)&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Playwright&lt;/td&gt;
&lt;td&gt;Chrome-identical&lt;/td&gt;
&lt;td&gt;Very low&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;curl_cffi&lt;/code&gt; (Chrome120)&lt;/td&gt;
&lt;td&gt;Chrome-identical&lt;/td&gt;
&lt;td&gt;Very low&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;code&gt;curl_cffi&lt;/code&gt; won on both axes.&lt;/p&gt;




&lt;h2&gt;
  
  
  JA4 - the newer generation
&lt;/h2&gt;

&lt;p&gt;JA3 has known weaknesses: it's easy to spoof once you know it's being checked. JA4 was introduced by FoxIO in 2023 as a more robust successor.&lt;/p&gt;

&lt;p&gt;JA4 encodes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Protocol version&lt;/li&gt;
&lt;li&gt;SNI presence&lt;/li&gt;
&lt;li&gt;Number of ciphers&lt;/li&gt;
&lt;li&gt;Number of extensions&lt;/li&gt;
&lt;li&gt;First ALPN value&lt;/li&gt;
&lt;li&gt;Sorted cipher suites (order-independent)&lt;/li&gt;
&lt;li&gt;Sorted extensions (order-independent)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The sorting is the key difference - it makes JA4 resistant to order-shuffling spoofs. However, it also means the feature set itself becomes the signal. If you claim to support exactly the extensions that Chrome supports, you'll match Chrome's JA4 - regardless of order.&lt;/p&gt;

&lt;p&gt;This is an arms race. Detection moves to behavioral signals when the cryptographic ones get spoofed.&lt;/p&gt;




&lt;h2&gt;
  
  
  What we learned
&lt;/h2&gt;

&lt;p&gt;A few things that weren't obvious at the start:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;TLS fingerprinting is almost always one layer in a stack.&lt;/strong&gt; Defeating JA3 alone rarely wins. Real detection systems combine JA3/JA4 with HTTP/2 fingerprinting (stream weights, header order, SETTINGS frames), TCP fingerprinting (TTL, window size, options), and behavioral analysis. Solving the TLS layer just moves you to the next one.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Normalization matters more than individual values.&lt;/strong&gt; The most suspicious thing isn't any single cipher - it's inconsistency. A ClientHello that claims to be Chrome but uses Python's HTTP/2 stack is incoherent and trivially flagged.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Impersonation fidelity has to go all the way down.&lt;/strong&gt; &lt;code&gt;curl_cffi&lt;/code&gt; handles the TLS layer. But you still need to match HTTP/2 pseudo-header order, SETTINGS frame parameters, and window update behavior. Fortunately, since &lt;code&gt;curl_cffi&lt;/code&gt; uses curl's full stack (including nghttp2), the HTTP/2 framing matches Chrome's as well.&lt;/p&gt;




&lt;h2&gt;
  
  
  The bigger picture
&lt;/h2&gt;

&lt;p&gt;TLS fingerprinting is a microcosm of a broader dynamic in the infrastructure space: the signal keeps moving lower in the stack. It started with IP reputation, moved to cookies and headers, then to TLS, now increasingly to TCP and even timing characteristics.&lt;/p&gt;

&lt;p&gt;At Clerix, this is the layer we operate at. Understanding these mechanisms - not just working around them but properly modeling them - is what makes the difference between infrastructure that works in controlled conditions and infrastructure that holds up under production adversarial conditions.&lt;/p&gt;

&lt;p&gt;If you're building anything in this space and want to go deeper, the two most useful starting points are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://tlsfingerprint.io" rel="noopener noreferrer"&gt;tlsfingerprint.io&lt;/a&gt; - passive fingerprinting service&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://tls.peet.ws" rel="noopener noreferrer"&gt;tls.peet.ws&lt;/a&gt; - shows your live JA3/JA4/HTTP2 fingerprint&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;curl_cffi&lt;/code&gt; source on GitHub - the implementation is surprisingly readable&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Clerix is real-time intelligence infrastructure for agentic systems. We extract structured data from the web at enterprise scale.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;a href="https://clerix.io" rel="noopener noreferrer"&gt;clerix.io&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>networking</category>
      <category>scraper</category>
      <category>datacollection</category>
      <category>ai</category>
    </item>
  </channel>
</rss>
