Forem: [Tanwydd]

Phantomime: I Spent Three Articles Explaining Bot Detection. Here's the Library I Built to Beat It.

[Tanwydd] — Sun, 10 May 2026 10:38:01 +0000

If you've been following this series, you already know the full picture.

We started with TLS fingerprinting — the fact that Python's HTTP stack sends a ClientHello that looks nothing like Chrome, and that alone is enough to get blocked before a single line of page JavaScript ever runs.

Then Canvas fingerprinting — why randomizing the canvas output on every call is actually worse than doing nothing, and why real browsers produce stable, hardware-bound hashes that your bot needs to replicate correctly.

Then behavioral fingerprinting — mouse trajectories, keystroke timing distributions, scroll inertia, and why fixing TLS and canvas still isn't enough if your mouse teleports to coordinates in zero milliseconds.

Each article was covering one layer of the same problem I was actively solving in code. Phantomime is what that code became.

The Problem With Piecemeal Solutions

There are existing tools that patch parts of this. Playwright-stealth disables navigator.webdriver. Some scripts patch the canvas. Others spoof the User-Agent.

The issue is that detection systems don't check one signal — they check dozens simultaneously and look for consistency across them. A User-Agent claiming Windows with a Linux navigator.platform. A canvas fingerprint that changes on every call. A WebGL renderer string that doesn't match any real GPU. A mouse that moves in a perfect straight line at 60fps.

Each individual patch makes things marginally better. Without coherence across all of them, you're still obviously a bot — just a slightly better-dressed one.

Phantomime was built around a single principle: every layer must be solved together, and they must be internally consistent.

The Four Layers

Layer 1 — TLS (the one most people forget)

As covered in part one of this series, the TLS ClientHello is the first thing a server sees. Playwright uses Chromium's network stack, which is fine — but any direct Python HTTP call (for efficiency, for API access, for anything) exposes Python's TLS fingerprint.

Phantomime solves this with curl-cffi, which impersonates Chrome 124's TLS stack at the socket level. After authenticating through the browser, you can switch to direct HTTP calls without leaking your stack:

# Authenticate via the humanized browser
await browser.goto("https://target.com/login")
await browser.type_text("#email", "user@example.com")
await browser.type_text("#password", "secret")
await browser.click("#submit")
await browser.wait_for(".dashboard")

# Export session to curl-cffi — same cookies, Chrome TLS fingerprint
await browser.sync_cookies_to_session("https://target.com")

# Direct HTTP calls — 10-50x faster than browser navigation
for item_id in item_ids:
    resp = await browser.fetch(f"https://target.com/api/items/{item_id}")
    process(resp.json())

Layer 2 — Browser Fingerprint

This is where most patching libraries stop, and where most of them get the details wrong.

As explained in part two, the key insight about canvas fingerprinting is stability. Real browsers produce the same canvas hash on every call on the same hardware. A bot that randomizes per-call is just as detectable as one that blocks the canvas entirely — the instability itself is the signal.

Phantomime uses a Linear Congruential Generator seeded from an MD5 hash of the profile directory name. The noise is stable for the entire session, and different across sessions — exactly what real hardware produces:

profile_dir name → MD5 → LCG seed → fixed noise sequence for this session

The same principle applies to WebGL (readPixels output), AudioContext (getChannelData), and getBoundingClientRect (used for font enumeration via measureText).

Beyond noise, every surface property is derived from a single hardware profile to ensure coherence:

Property	Source
User-Agent	Profile
`Sec-CH-UA`, `Sec-CH-UA-Platform`	Derived from profile UA
`navigator.platform`	Profile OS (`Win32`, `MacIntel`, `Linux x86_64`)
`navigator.deviceMemory`	Profile (4 / 8 / 16 GB)
`navigator.hardwareConcurrency`	Profile (4 / 8 cores)
`screen.width/height`	Profile resolution
`window.devicePixelRatio`	Profile DPR
WebGL vendor/renderer	Profile GPU string

No mismatched properties. No "Intel GPU" claiming to be an RTX 4090.

One detail worth mentioning: headless=True in Playwright activates --headless=old (pipe mode), which disables the GPU pipeline and makes WebGL/Canvas outputs immediately distinguishable. Phantomime always passes headless=False to Playwright and injects --headless=new as a launch argument — the browser is headless, but the GPU pipeline is intact.

Layer 3 — Behavioral Signals

Covered in detail in part three. The short version: real users don't move their mouse in straight lines, don't type at perfectly uniform intervals, and don't scroll in instant jumps.

Mouse movement uses cubic Bézier trajectories with Fitts' Law velocity modulation — movement time scales with distance and target size, just like human hand movements. 30% of clicks include an overshoot followed by a correction micro-movement.

Typing uses a log-normal inter-keystroke delay distribution (not uniform random). A configurable typo_rate (default 4%) injects QWERTY-neighbor errors with autocorrection. A frustration_rate (default 1%) simulates over-deletion — backspacing one character too many and retyping.

Scroll uses inertial easing with intermediate mousemove events dispatched during the scroll — matching trackpad behavior.

Idle periods include micro-movements, occasional scroll pulses, and randomized pauses drawn from an exponential distribution. warmup() runs a full idle cycle before the first navigation to age the session.

One patch that often gets overlooked: Event.isTrusted. Playwright dispatches synthetic events with isTrusted: false by default — a reliable signal for detection systems listening to event properties. Phantomime patches this to return true for all synthetic events.

Layer 4 — `Function.prototype.toString` Spoofing

When you patch navigator.webdriver or HTMLCanvasElement.prototype.toDataURL, detection scripts can call .toString() on those functions and check whether they return "function toDataURL() { [native code] }" or your custom JS. If they return your custom code, you're caught.

Phantomime patches Function.prototype.toString after all other patches are in place, so every patched function appears native to JS-level inspection.

Concurrency

For volume scraping, run_swarm launches N browsers in parallel, each with its own profile directory and therefore its own distinct fingerprint:

from phantomime import HumanBrowser, run_swarm

async def scrape(browser: HumanBrowser, item: dict) -> dict:
    await browser.goto(item["url"])
    await browser.wait_for(".product-detail")
    return {
        "id":    item["id"],
        "title": await browser.get_text("h1"),
        "price": await browser.get_text(".price"),
    }

results = await run_swarm(
    task=scrape,
    items=product_list,       # list of dicts
    max_concurrent=8,         # tune to available RAM (~350MB per instance)
    browser_kwargs={"headless": True, "locale": "en-US"},
    profile_base_dir="./profiles",
)

Each worker gets ./profiles/worker_0, ./profiles/worker_1, etc. — distinct directory names mean distinct LCG seeds mean distinct fingerprints. Ten workers running in parallel look like ten different machines to the target site.

Installing and Getting Started

pip install phantomime
playwright install chromium

# Optional but recommended — enables the TLS layer
pip install curl-cffi

Basic usage:

import asyncio
from phantomime import HumanBrowser

async def main():
    async with HumanBrowser(
        profile_dir="./profiles/session_01",
        headless=True,
        locale="en-US",
        timezone="America/New_York",
    ) as browser:
        await browser.warmup(duration_s=4.0)
        await browser.goto("https://example.com")
        await browser.type_text("#search", "python automation")
        await browser.click("button[type=submit]")
        await browser.wait_for(".results")
        print(await browser.get_text(".results"))

asyncio.run(main())

The profile directory is persistent — cookies, localStorage, and cache survive across runs. Run once to log in manually, and every subsequent run reuses the saved session.

What It Doesn't Do

CAPTCHA solving — out of scope. Integrate a third-party solver (2captcha, CapMonster) and inject the token via browser.evaluate().

IP reputation — if your IP is in a datacenter range or on a known proxy list, no amount of fingerprint patching helps. Use residential proxies for targets that maintain IP reputation lists.

Cloudflare Turnstile — the interactive checkbox requires a solver. The JS challenge (the spinning wheel) resolves fine with idle(duration_s=8.0).

Links

PyPI: pypi.org/project/phantomime
GitHub: github.com/Tanwydd/phantomime

This series started as a way to document what I was learning while building something real. If any of the previous articles helped you understand why your scraper was getting blocked, this is where all of it ends up.

Questions, issues, and PRs welcome.

Mouse Movement, Typing Speed, and Why Your Bot Still Gets Caught After Fixing TLS

[Tanwydd] — Fri, 01 May 2026 11:23:32 +0000

You fixed the TLS fingerprint. You sorted the Canvas. Your navigator object looks spotless. You're past the first three layers of detection.

And then Akamai blocks you anyway. Thirty seconds into the session.

Welcome to behavioral fingerprinting.

What behavioral analysis actually looks at

The first three detection layers — TLS, HTTP headers, JavaScript fingerprinting — are stateless. They analyze a single snapshot: what does this connection look like right now?

Behavioral analysis is different. It builds a model over time. It watches what you do, how you do it, and whether it looks like something a human would do.

The signals it collects:

Mouse trajectory, speed and acceleration between clicks
Time between keystrokes, and the distribution of that timing
Scroll patterns — speed, direction changes, momentum
Time between page load and first interaction
Navigation flow — which elements get hovered before being clicked
Idle periods — how long between actions, and what happens during them

None of these signals are individually conclusive. Combined over a session, they build a behavioral profile that's surprisingly hard to fake.

The mouse problem

A bot moving a mouse from point A to point B takes the shortest path. Constant speed. Perfectly straight line. No hesitation, no correction, no overshoot.

No human does that.

Human mouse movement has a characteristic shape: slow acceleration at the start, peak speed in the middle, deceleration as you approach the target. The path curves. Sometimes you overshoot and correct. In the final approach, there's micro-tremor — tiny random movements as your hand steadies.

The mathematical model that approximates this is a cubic Bezier curve with randomized control points, combined with a sine-based speed profile:

import math
import random

def bezier_path(p0, p3, steps=40):
    dx = p3[0] - p0[0]
    dy = p3[1] - p0[1]
    dist = math.hypot(dx, dy)

    deviation = dist * random.uniform(0.15, 0.40)
    angle = math.atan2(dy, dx)
    perp = angle + math.pi / 2
    side = random.choice([-1, 1])

    p1 = (
        p0[0] + dx * 0.25 + math.cos(perp) * deviation * side,
        p0[1] + dy * 0.25 + math.sin(perp) * deviation * side,
    )
    p2 = (
        p0[0] + dx * 0.75 + math.cos(perp) * deviation * side * 0.3,
        p0[1] + dy * 0.75 + math.sin(perp) * deviation * side * 0.3,
    )

    points = []
    for i in range(steps + 1):
        t = i / steps
        mt = 1 - t
        x = mt**3*p0[0] + 3*mt**2*t*p1[0] + 3*mt*t**2*p2[0] + t**3*p3[0]
        y = mt**3*p0[1] + 3*mt**2*t*p1[1] + 3*mt*t**2*p2[1] + t**3*p3[1]

        # Sine speed profile: slow-fast-slow
        speed = (math.sin((t - 0.5) * math.pi) + 1) / 2
        delay = 10.0 / (speed + 0.12)

        # Micro-tremor in the final approach
        if t > 0.85:
            x += random.gauss(0, 0.7)
            y += random.gauss(0, 0.7)
            delay *= random.uniform(1.3, 2.2)

        points.append((x, y, delay))

    return points

The sine speed profile is the key. It produces the natural deceleration near the target that's characteristic of human motor control. Without it, the movement looks mechanical even if the path curves.

Overshooting

About 30% of the time, humans overshoot their target and correct. Your bot should do the same:

async def move_to(self, x, y):
    if random.random() < 0.30:
        await self._execute_move(
            x + random.uniform(-20, 20),
            y + random.uniform(-15, 15),
        )
        await asyncio.sleep(random.uniform(0.08, 0.20))
    await self._execute_move(x, y)

30% is calibrated from observational data on human mouse behavior. Too high and it looks like a nervous tic. Too low and you lose the behavioral diversity that makes the pattern look human.

Typing speed: log-normal, not uniform

A bot typing at constant speed is one of the oldest and most reliable detection signals. The fix is obvious — add random delays between keystrokes.

But random uniform delays still don't look human. Human typing speed follows a log-normal distribution — most keystrokes happen in a typical range, but there's a long tail of slower keystrokes when you pause to think, notice a mistake, or just lose focus for a moment.

import numpy as np

def typing_delay() -> float:
    return float(np.clip(np.random.lognormal(mean=4.2, sigma=0.5), 28, 400))

On top of the base delay, about 4% of keystrokes should trigger a longer pause — the distraction event where you look away from the screen for a second:

delay_ms = typing_delay()
if random.random() < 0.04:
    delay_ms += random.uniform(400, 1400)
await asyncio.sleep(delay_ms / 1000)

Typos

Humans make typing errors. Bots don't. A session with zero typos across hundreds of keystrokes is statistically suspicious.

The realistic typo model uses a QWERTY neighbor map:

QWERTY = {
    "a": ["q","w","s","z"], "s": ["a","w","e","d","x","z"],
    "d": ["s","e","r","f","c","x"],
    # ...
}

def get_typo(char: str):
    neighbors = QWERTY.get(char.lower())
    if not neighbors:
        return None
    wrong = random.choice(neighbors)
    return wrong.upper() if char.isupper() else wrong

Typo rate around 4% feels natural. Below 1% starts to look suspicious. Above 8% looks like a bad typist — which is fine, humans vary.

Idle behavior

A session where every second is productive action is not a human session. Humans pause, scroll aimlessly, move the mouse for no reason, hover over things without clicking them.

async def idle(self, duration_s: float):
    end = time.time() + duration_s
    while time.time() < end:
        if random.random() < 0.35:
            action = random.choices(
                ["move", "scroll_tiny", "hover", "tremor", "pause"],
                weights=[3, 2, 2, 2, 3]
            )[0]
            await self._execute_idle_action(action)
        else:
            await asyncio.sleep(random.uniform(0.3, 1.2))

The 35% activity rate during idle periods is empirically reasonable. Higher and the bot never stops moving. Lower and the session looks catatonic between actions.

The timing between actions

The distribution of time between actions matters as much as the actions themselves.

Human session timing follows a roughly 80/20 pattern: most transitions happen quickly (1-3 seconds), but about 20% involve a longer pause — reading, thinking, getting distracted.

async def wait_between_actions(self, long: bool = False):
    if long or random.random() > 0.80:
        delay = float(np.random.lognormal(mean=1.8, sigma=0.6))
    else:
        delay = float(np.random.lognormal(mean=0.3, sigma=0.4))
    await asyncio.sleep(max(0.5, min(delay, 45.0)))

What you can't fully fake

Behavioral analysis at the most sophisticated level builds a model of you specifically, not just humans in general.

The most advanced systems don't just ask "does this look human?" They ask "does this look like the same human who was here yesterday?" A new session with suspiciously perfect behavioral patterns — no warmup, immediately optimal mouse paths — can look wrong even if every individual signal is within human range.

The mitigation is session warmup: spend time doing idle activity before starting meaningful work, and build browsing history before hitting the target site.

await browser.warm_history(
    sites=["https://www.google.com", "https://www.wikipedia.org"],
    dwell_s=10.0,
)
await browser.warmup(duration_s=5.0)
await browser.goto("https://target.com")

The honest ceiling

Behavioral fingerprinting is the hardest layer to beat because it's the hardest to specify. TLS has a defined format. Canvas fingerprinting has a defined algorithm. Human behavior doesn't — it's noisy, variable, and the detection systems are trained on real data you don't have access to.

The goal isn't to perfectly simulate a human. It's to be indistinguishable within the confidence bounds of the detection system. Those bounds are finite and different for every target.

Getting close enough is an engineering problem, not an unsolvable one. But it requires understanding what you're actually trying to match — not just adding random delays and hoping for the best.

This series covered the three main detection layers: TLS fingerprinting, Canvas/WebGL spoofing, and behavioral analysis. The common thread: coherence across all layers matters more than perfecting any single one.

How Your Canvas Fingerprint Gets You Caught (And Why Random Noise Makes It Worse)

[Tanwydd] — Sat, 25 Apr 2026 08:54:12 +0000

You fixed the TLS fingerprint. You patched navigator.webdriver. Your User-Agent is perfect. And you're still getting blocked.

Chances are it's the Canvas.

What canvas fingerprinting actually is

Every browser renders graphics slightly differently. The GPU, the driver version, the OS font rendering engine, the antialiasing settings — all of these introduce tiny variations in how pixels end up on screen.

Canvas fingerprinting exploits this. The detection script draws something on an invisible canvas element — usually a mix of text, shapes and gradients — then reads back the pixel data with toDataURL() or getImageData(). The resulting string is hashed and becomes your fingerprint.

The variations are tiny. We're talking about differences at the level of individual pixel values, often invisible to the human eye. But they're consistent — the same browser on the same machine produces the same hash every time. And they're unique enough to identify you across sessions, across IPs, across proxies.

Your IP changes. Your canvas hash doesn't.

Why headless Chromium is obvious

Here's the specific problem with running Playwright or Puppeteer in headless mode: without a GPU pipeline, the browser falls back to software rendering.

Software rendering is deterministic and perfect. No GPU quirks, no driver variations, no antialiasing artifacts. Every headless Chromium instance on every machine produces an identical canvas output.

That's not how real browsers work. Real browsers are slightly imperfect in ways that are consistent per device. A perfectly identical canvas hash across thousands of sessions is a massive red flag.

The fix for this specific problem is --headless=new — Playwright's modern headless mode that preserves the full GPU pipeline:

context = await playwright.chromium.launch_persistent_context(
    headless=False,
    args=["--headless=new"],  # preserves GPU stack
)

But even with --headless=new, your canvas hash is still consistent across sessions. Which brings us to the noise problem.

Why random noise makes things worse

The obvious solution seems to be: add random noise to the canvas output on every render. Randomize the pixel values slightly so the hash changes.

This is wrong. And it's worse than doing nothing.

Here's why: real browsers produce a consistent canvas hash per device. The same machine always gives the same result. If your canvas hash changes on every page load, every request, every session — that's not how any real browser behaves. Detection systems don't just check what your hash is. They check whether it's stable.

A canvas hash that changes randomly is as obvious as navigator.webdriver = true. It's a different signal, but it's still a signal.

The right approach: deterministic per-session noise

What you want is noise that is:

Consistent within a session — the same hash for the same browser instance
Different across sessions — different profile directories produce different hashes
Realistic in magnitude — tiny variations, not wholesale pixel changes

The way to achieve this is a seeded pseudo-random number generator, where the seed is derived from something stable per profile — like the profile directory name.

import hashlib

def _session_seed(profile_dir_name: str) -> int:
    return int(hashlib.md5(profile_dir_name.encode()).hexdigest()[:8], 16) % (2**31)

Then use that seed to drive a simple LCG (Linear Congruential Generator) in JavaScript, and apply tiny pixel-level noise based on it:

// Seeded LCG — deterministic per session
let _seed = SESSION_SEED;
const _lcg = () => {
    _seed = (_seed * 1664525 + 1013904223) % 4294967296;
    return _seed / 4294967296;
};

// Apply noise without mutating the original canvas
const _applyNoise = (imgData) => {
    for (let i = 0; i < imgData.data.length; i += 4) {
        if (_lcg() < 0.05) {
            imgData.data[i] += (_lcg() > 0.5 ? 1 : -1);
        }
    }
};

The key detail: modify a copy of the canvas, not the original. If you mutate the original canvas, you break legitimate rendering on the page. The correct approach intercepts toDataURL(), getImageData() and toBlob(), draws to a temporary off-screen canvas, applies noise there, and returns the result.

const _origToDataURL = HTMLCanvasElement.prototype.toDataURL;
HTMLCanvasElement.prototype.toDataURL = function() {
    const ctx = this.getContext('2d');
    if (ctx) {
        const off = document.createElement('canvas');
        off.width = this.width;
        off.height = this.height;
        const octx = off.getContext('2d');
        octx.drawImage(this, 0, 0);
        const img = _origGetImageData.call(octx, 0, 0, off.width, off.height);
        _applyNoise(img);
        octx.putImageData(img, 0, 0);
        return _origToDataURL.apply(off, arguments);
    }
    return _origToDataURL.apply(this, arguments);
};

The toString() problem

There's a secondary issue that most canvas spoofing implementations miss: Function.prototype.toString().

When you replace a native browser function with your own JavaScript wrapper, any script that calls .toString() on that function sees JavaScript source code instead of function toDataURL() { [native code] }. That's detectable.

The fix is to maintain a registry of patched functions and override Function.prototype.toString to return the native code string for any function in that registry:

const _patchedFns = new WeakSet();

const _native = (fn, name) => {
    try {
        Object.defineProperty(fn, 'name', { value: name, configurable: true });
    } catch(_) {}
    _patchedFns.add(fn);
    return fn;
};

const _origFnToString = Function.prototype.toString;
Function.prototype.toString = _native(function() {
    if (_patchedFns.has(this)) {
        return `function ${this.name || ''}() { [native code] }`;
    }
    return _origFnToString.call(this);
}, 'toString');

Any function wrapped with _native() now looks indistinguishable from a browser built-in when inspected.

Canvas is one signal among many

Fixing canvas fingerprinting is necessary but not sufficient. Detection systems correlate multiple signals:

WebGL fingerprinting — same concept, different API. The GPU vendor string, renderer string, and the output of readPixels() all contribute to a fingerprint. The same deterministic noise approach applies.

AudioContext fingerprinting — the Web Audio API processes a signal through an oscillator and reads back the output. Again, tiny hardware-level variations create a unique hash. Tiny noise on getChannelData() output breaks this.

Font enumeration — document.fonts.check() and measureText() reveal which fonts are installed, which varies by OS. The browser's reported font list should match the OS implied by the User-Agent.

getBoundingClientRect() noise — font rendering affects element dimensions. Tiny noise on bounding rect values breaks font fingerprinting via layout measurement.

These signals are correlated. A Windows User-Agent with a Linux font list is suspicious. A Mac User-Agent with an NVIDIA GPU renderer is suspicious. Coherence across all signals matters as much as any individual fix.

The timing problem

One more thing that's easy to miss: your JavaScript wrappers add overhead. toDataURL() now does extra work — copying the canvas, applying noise, returning the result. That takes time.

Detection scripts measure how long canvas operations take. A toDataURL() call that takes 3x longer than expected is a signal.

The fix is to add a small amount of noise to performance.now() so timing measurements are slightly unpredictable:

const _origPerfNow = performance.now.bind(performance);
performance.now = _native(function() {
    return _origPerfNow() + (_lcg() - 0.5) * 0.2;
}, 'now');

±0.1ms of jitter is enough to mask the overhead of your wrappers without being detectable itself.

Next: mouse movement, typing speed, and why behavioral fingerprinting is harder to fake than canvas — and what that looks like in code.

Why Playwright Gets You Blocked Even With Proxies

[Tanwydd] — Sat, 18 Apr 2026 23:30:43 +0000

You've spent hours setting up your scraper. Rotating proxies, real Chrome User-Agent, navigator.webdriver set to undefined, perfect headers. You run it. Blocked. Instantly.

What's going on?

Chances are you never even made it to the layer where your headers live.

The problem isn't your IP

When you start with browser automation, the logic feels solid: change the IP, the server doesn't know who I am. In 2015 that was enough. Today Akamai, Cloudflare and PerimeterX haven't cared much about your IP for a long time. What they care about is something else.

They care about how you talk before you say anything.

When your bot opens an HTTPS connection, the first thing that happens isn't an HTTP request. It's a TLS negotiation — the protocol that sets up the encrypted channel. During that negotiation your client sends a message called ClientHello containing, among other things, which cipher suites it supports, which TLS extensions it uses, and in what order it declares them.

That set of characteristics has a hash. It's called JA3. And that hash identifies with reasonable accuracy what software is opening the connection — before reading a single HTTP header, before executing any detection JavaScript, before analyzing anything.

requests has its own JA3. httpx has its own. Playwright's Chromium has its own. And none of them match a commercial Google Chrome installed on a real Windows machine.

The mismatch that gives you away

Here's the specific problem with Playwright: it downloads and uses its own Chromium binary. Technically it's the same codebase as Chrome, but the TLS fingerprint isn't identical to a commercial Chrome build.

The result is a contradiction that antibot systems catch in milliseconds:

Your headers say: "I'm Google Chrome 124 on Windows 10"
Your TLS fingerprint says: "I'm an automated Chromium binary"

Blocked. And it doesn't matter how well you've handled the JavaScript evasion layer, because the block happens at the network level. The server never even gets to run anything in the browser.

The real detection layers

It helps to understand the order these systems operate in:

Layer 1 — Network:      IP, ASN, TLS fingerprint (JA3/JA4)
Layer 2 — HTTP:         Headers, header order, User-Agent, Client Hints
Layer 3 — JavaScript:   Canvas, WebGL, AudioContext, navigator.*
Layer 4 — Behavior:     Mouse, keyboard, scroll, timing

Most evasion guides talk about layers 3 and 4. They matter, but if you fail at layer 1 the other three are irrelevant. The server never gets to execute them.

What actually works

Option 1: use the system's Chrome binary

If you point Playwright at the Chrome binary installed on the operating system, the TLS fingerprint is that of a real Chrome:

context = await playwright.chromium.launch_persistent_context(
    executable_path="/usr/bin/google-chrome",
    user_data_dir="./profile",
    headless=False,
    args=["--headless=new"],
)

The --headless=new flag isn't optional. Playwright's old headless mode disables the GPU pipeline and the Canvas fingerprint goes flat and fake. Akamai's sensor.js catches it immediately. With --headless=new the full graphics stack is preserved.

Option 2: curl-cffi for direct requests

When you don't need JavaScript rendering — APIs, simple endpoints, resource downloads — curl-cffi solves the TLS problem without a browser:

from curl_cffi.requests import AsyncSession, BrowserType

async with AsyncSession(impersonate=BrowserType.chrome124) as session:
    response = await session.get("https://example.com/api/data")

It uses libcurl with patches to reproduce Chrome's exact ClientHello, including cipher suite order and extensions. The resulting JA3 is indistinguishable from a real Chrome.

One detail that often gets missed: the Sec-Fetch-* headers. If you're hitting an API, the mode has to match what a real browser would send in that situation:

# Same-origin AJAX call
headers = {
    "Sec-Fetch-Site": "same-origin",
    "Sec-Fetch-Mode": "cors",
    "Sec-Fetch-Dest": "empty",
}

# Full page load
headers = {
    "Sec-Fetch-Site": "none",
    "Sec-Fetch-Mode": "navigate",
    "Sec-Fetch-Dest": "document",
    "Sec-Fetch-User": "?1",
}

Sending Sec-Fetch-Mode: navigate in an AJAX call never happens in a real browser. Akamai knows that.

The architecture that holds up best

Combine both: system Chrome for interactive navigation where you need JS, curl-cffi for direct requests where you just need data. The trick is syncing cookies between the two layers — if you don't, the session cookies generated by the browser won't be available in your direct requests and vice versa.

JA4: what comes after JA3

JA4 is the evolution of JA3, published in 2023. More robust, harder to evade with simple tricks, and more useful for analysts. If JA3 was the ID card of your TLS connection, JA4 is the ID card plus the history.

The good news is that curl-cffi covers it too — by impersonating Chrome at the libcurl level, the resulting JA4 is equally coherent.

What fixing TLS doesn't solve

Getting the TLS fingerprint right is necessary but it's not the finish line. Once you pass layer 1 they keep analyzing:

HTTP/2 fingerprinting — similar to JA3 but for the HTTP/2 protocol. The order of SETTINGS frames and WINDOW_UPDATE values also identify the client.

JavaScript fingerprinting — Canvas, WebGL, AudioContext, fonts, navigator.webdriver, window.chrome. This layer requires active spoofing via scripts that run before the page loads.

Behavior — mouse movement patterns, typing speed, timing between actions, scroll. The more advanced systems build a behavioral model per session and detect anomalies.

Each layer adds work. But it also adds robustness. A scraper that only handles TLS falls at the JavaScript layer. One that handles all layers consistently is genuinely hard to tell apart from a real user.

Next up: Canvas fingerprinting in detail — how it actually works, why deterministic per-session noise beats random noise, and how antibot systems detect poorly implemented spoofing.

Building a Self-Hosted Bot Management Platform with Docker, FastAPI and React

[Tanwydd] — Wed, 15 Apr 2026 19:25:46 +0000

Managing more than two or three automated scripts in production is where things get messy. You end up with a collection of Python files scattered across servers, credentials hardcoded or stored in plaintext somewhere, no visibility into what's running or failing, and deployments that require SSH access and manual intervention.

I built BotFarm to solve this. It's a self-hosted platform that lets you create, deploy, monitor and maintain containerized Python bots from a web dashboard — no SSH required.

The problem with ad-hoc bot management

The typical evolution of a bot infrastructure goes like this:

You write a script. It works. You add another. Then five more. At some point you have fifteen scripts running on a server, some as cron jobs, some as systemd services, some launched manually. Credentials are in config files with 644 permissions. Logs go to files that nobody reads until something breaks. Deploying a new version means SSHing in, pulling the repo, restarting the service, and hoping nothing else broke.

This is fine for personal projects. It's not fine when other developers need to deploy and manage their own bots, when you need an audit trail, or when credentials need to be kept secure.

BotFarm replaces all of that with a centralized dashboard.

Architecture

Every bot runs in its own Docker container with resource limits (512MB RAM, 0.5 CPUs by default). The dashboard never touches the Docker socket directly — all container operations go through tecnativa/docker-socket-proxy, which exposes only the operations the dashboard actually needs.

  Internet
      │
      ▼  :80 / :443
  ┌──────────────┐
  │    Nginx     │  TLS · reverse proxy · static assets
  └──────┬───────┘
         │  :8080 (internal)
         ▼
  ┌──────────────────┐         ┌───────────────────────┐
  │  Dashboard       │─────────▶  Docker Socket Proxy │
  │  FastAPI + React │  :2375  │  (allowlist only)     │
  └──────┬───────────┘ internal└──────────┬────────────┘
         │                                │  /var/run/docker.sock
         │  :3306 (internal)              ▼
         ▼                        Docker Engine
  ┌─────────────┐                         │
  │   MariaDB   │          ┌──────────────┼──────────────┐
  └─────────────┘       bot-a           bot-b           bot-n
                        [512MB]         [512MB]         [512MB]
                        [0.5CPU]        [0.5CPU]        [0.5CPU]

The Docker Socket Proxy is not optional — it's the architectural decision that makes this safe to run in a multi-developer environment. Without it, a compromised dashboard container has full control over the Docker daemon. With it, the blast radius is contained to the allowlisted operations: CONTAINERS, IMAGES, BUILD, NETWORKS.

Credential management

Storing bot credentials is the part most people get wrong. Plaintext in environment files, base64-encoded strings passed as environment variables, secrets committed to git — all of these are real patterns I've seen in production.

BotFarm encrypts all credentials with AES-256-GCM before storing them in the database. Each encryption operation uses a random IV, so the same plaintext produces a different ciphertext every time. The master key lives in /etc/botfarm.env with 600 permissions, generated at install time and never touched again.

import os
import base64
from cryptography.hazmat.primitives.ciphers.aead import AESGCM

def encrypt(plaintext: str, master_key: bytes) -> str:
    iv = os.urandom(12)  # 96-bit random IV per encryption
    aesgcm = AESGCM(master_key)
    ciphertext = aesgcm.encrypt(iv, plaintext.encode(), None)
    # Store IV + ciphertext together
    return base64.b64encode(iv + ciphertext).decode()

def decrypt(encrypted: str, master_key: bytes) -> str:
    data = base64.b64decode(encrypted)
    iv, ciphertext = data[:12], data[12:]
    aesgcm = AESGCM(master_key)
    return aesgcm.decrypt(iv, ciphertext, None).decode()

At runtime, decrypted credentials are injected into each bot container as environment variables — they exist in memory during execution and are never written to disk.

Real-time log streaming via WebSocket

One of the more interesting technical problems was streaming container logs to the browser in real time. The naive approach — polling a REST endpoint every few seconds — introduces latency and hammers the database unnecessarily.

The solution is a WebSocket endpoint that reads directly from the Docker container's log stream:

@router.websocket("/ws/logs/{bot_id}")
async def stream_logs(websocket: WebSocket, bot_id: int):
    await websocket.accept()
    try:
        container = docker_client.containers.get(f"bot_{bot_id}")
        for log_line in container.logs(stream=True, follow=True, tail=50):
            await websocket.send_text(log_line.decode().strip())
    except Exception as exc:
        await websocket.send_text(f"Stream error: {exc}")
    finally:
        await websocket.close()

The same pattern applies to build logs — when you deploy a new version of a bot, the Docker image build output streams directly to the browser line by line via a separate WebSocket endpoint.

Bot versioning with visual diff

Every time you save a new version of a bot's code, BotFarm stores the previous version and generates a diff. From the dashboard you can see the full history, compare any two versions side by side, and roll back to any previous version in one click.

The code editor is Monaco — the same editor that powers VS Code — embedded directly in the dashboard. You get syntax highlighting, autocomplete and error detection without leaving the browser.

Writing a bot

Bots are standard Python scripts with access to bot_logger, a shared library that handles logging and metrics:

import os
import json
from bot_logger import BotLogger

logger = BotLogger()

try:
    creds = json.loads(os.environ.get("BOT_CREDENTIALS", "{}"))
    logger.log("INFO", "Bot started")

    records = process_data(creds)

    logger.log("INFO", f"Cycle complete: {records} records processed")
    logger.metric("records_processed", records)

finally:
    logger.close(exit_code=0)

The finally block is important — logger.close() writes the final execution status to the database and releases resources. If it doesn't run, the dashboard shows the bot as still running.

Security model

A few decisions worth explaining:

JWT with automatic refresh. Tokens expire after one hour. The frontend automatically refreshes at the 50-minute mark without interrupting the user's session. On logout, the token's JTI is revoked in the database — subsequent requests with that token are rejected even if it hasn't expired.

Rate limiting on login. Ten attempts per minute per IP using slowapi. After that, requests are rejected with 429 until the window resets. Simple, effective, no CAPTCHA complexity.

Audit log as append-only. The database user that the application uses (botfarm_app) has INSERT and SELECT on the audit log table — never UPDATE or DELETE. This is enforced at the database privilege level, not just in application code. An audit log you can delete is not an audit log.

bcrypt with cost 12. Slow enough to make brute force impractical, fast enough that legitimate logins don't feel slow on modern hardware.

The stack

Layer	Technology
Backend	FastAPI 0.115 · Python 3.12 · Uvicorn
Frontend	React 18 · Vite · Chart.js · Monaco Editor
Database	MariaDB 11
Containers	Docker Engine · Docker Compose v2
Proxy	Nginx (TLS 1.2/1.3)
Docker security	tecnativa/docker-socket-proxy
Encryption	AES-256-GCM · bcrypt · JWT HS256
Platform	AlmaLinux 10 LTS

What it doesn't do

BotFarm is infrastructure for managing bots, not a scraping framework. It doesn't handle proxy rotation, browser automation, or anti-detection. Those concerns live in the bot code itself — BotFarm just provides the container to run it in, the credentials to authenticate with, and the visibility to know when something breaks.

Detecting Deepfake Audio in Python: Why the Threshold Matters More Than the Model

[Tanwydd] — Sun, 12 Apr 2026 11:32:00 +0000

Cloning a voice used to require a recording studio and a professional impersonator. Today it takes a few seconds of audio and a free API call.

That changes the threat model for any system that verifies identity by voice.

The problem with voice verification in 2026

Voice biometrics have been used in contact centers and banking for years. The assumption was that a voice is hard to fake — you either sound like someone or you don't, and training a model to tell the difference was expensive enough to deter casual fraud.

That assumption is gone. Modern voice cloning tools can reproduce a speaker's voice with enough fidelity to fool both humans and many biometric systems, using as little as three to five seconds of target audio. The barrier is now effectively zero for anyone motivated enough to try.

The response can't just be "better biometrics." It has to include detection of synthetic audio alongside speaker verification.

Two problems, two models

VoiceID Compare solves both problems in a single API call:

Speaker verification — do these two audio samples belong to the same person?
Deepfake detection — was either sample generated by AI?

These are separate tasks that require separate models. Confusing them is a common mistake — a deepfake detector doesn't tell you if two voices match, and a speaker verification model doesn't tell you if the audio is synthetic.

Speaker verification: embeddings and cosine similarity

The speaker verification component uses SpeechBrain's ResNet model, trained on VoxCeleb — a large-scale dataset of celebrity speech collected from YouTube.

The model doesn't compare audio files directly. It converts each audio sample into an embedding — a vector of floating point numbers that represents the speaker's vocal characteristics in a high-dimensional space.

from speechbrain.inference.speaker import SpeakerRecognition

model = SpeakerRecognition.from_hparams(
    source="speechbrain/spkrec-resnet-voxceleb"
)

score, prediction = model.verify_files(audio_path_1, audio_path_2)

The similarity between two embeddings is calculated using cosine similarity — the angle between the two vectors in that high-dimensional space. Vectors pointing in the same direction (same speaker) have high cosine similarity. Vectors pointing in different directions (different speakers) have low similarity.

The raw score is normalized to a 0-100 percentage scale for human readability.

Why audio preprocessing matters

Raw audio files from real-world sources are messy. Different sample rates, different channel counts, different durations, background noise. Feeding inconsistent audio to the model produces unreliable results.

Before embedding extraction, every audio file goes through normalization:

import torchaudio

def preprocess_audio(path: str):
    waveform, sample_rate = torchaudio.load(path)

    # Convert to mono
    if waveform.shape[0] > 1:
        waveform = waveform.mean(dim=0, keepdim=True)

    # Resample to 16kHz (model requirement)
    if sample_rate != 16000:
        resampler = torchaudio.transforms.Resample(sample_rate, 16000)
        waveform = resampler(waveform)

    return waveform

16kHz mono is the format the ResNet model was trained on. Deviating from it degrades accuracy in ways that aren't always obvious — the model still produces an output, it's just less reliable.

Deepfake detection: Wav2Vec2

The deepfake detection component uses a fine-tuned Wav2Vec2 model from HuggingFace, trained to classify audio as real or synthetic.

Wav2Vec2 is a self-supervised model originally designed for speech recognition. The fine-tuned version used here has been trained on a dataset of real and AI-generated speech samples, learning to identify the subtle artifacts that synthetic audio introduces — phase discontinuities, unnatural prosody, artifacts from vocoder processing.

from transformers import pipeline

deepfake_detector = pipeline(
    "audio-classification",
    model="garystafford/wav2vec2-deepfake-voice-detector"
)

result = deepfake_detector(audio_path)
# Returns: [{'label': 'fake', 'score': 0.73}, {'label': 'real', 'score': 0.27}]

The output is a probability score per class. A score of 0.73 for 'fake' means the model is 73% confident the audio was synthetically generated.

The threshold problem

Here's where most implementations go wrong: they treat the model's output as ground truth.

It isn't. It's a probability estimate with a confidence interval that varies depending on audio quality, recording conditions, the specific voice cloning tool used, and how much the model has been updated relative to the latest generation of synthesis tools.

The threshold — the score above which you classify audio as a deepfake — is a design decision, not a model parameter. And it has asymmetric consequences:

Too low (e.g. 40%): high false positive rate. Legitimate users get flagged. Trust collapses.
Too high (e.g. 80%): high false negative rate. Actual deepfakes get through. False confidence.

For forensic use, the threshold needs to be calibrated against your specific threat model and your tolerance for each type of error. The system should surface the raw score, not just a binary verdict.

DEEPFAKE_ALERT_THRESHOLD = 0.60  # tunable per deployment

def interpret_deepfake_score(score: float) -> dict:
    return {
        "score_pct": round(score * 100, 1),
        "alert": score >= DEEPFAKE_ALERT_THRESHOLD,
        "verdict": "possible deepfake" if score >= DEEPFAKE_ALERT_THRESHOLD else "appears genuine",
    }

The 60% default is a starting point, not a recommendation. In a banking compliance context you might want 50%. In a forensic investigation you might want to surface everything above 30% for manual review.

Running both models in parallel

Speaker verification and deepfake detection are independent. Running them in parallel cuts processing time roughly in half.

Since both are CPU/GPU bound, they need to run in a thread pool to avoid blocking an async event loop:

import asyncio
from concurrent.futures import ThreadPoolExecutor

executor = ThreadPoolExecutor(max_workers=2)

async def compare(audio_path_1: str, audio_path_2: str) -> dict:
    loop = asyncio.get_event_loop()

    speaker_task = loop.run_in_executor(
        executor, verify_speaker, audio_path_1, audio_path_2
    )
    deepfake_task = loop.run_in_executor(
        executor, detect_deepfake, audio_path_1
    )

    similarity, deepfake = await asyncio.gather(speaker_task, deepfake_task)

    return {
        "similarity_pct": similarity,
        "audio_1_deepfake": deepfake,
    }

Wrap with asyncio.wait_for — model inference on long audio files can take tens of seconds and you don't want hung requests blocking the server.

The combination that should raise flags

The most dangerous scenario isn't a low similarity score or a high deepfake score in isolation. It's high similarity combined with a high deepfake score.

That means: the voice sounds like the target person, but the audio was probably synthesized. That's a cloning attack.

def interpret_result(similarity_pct: float, deepfake_score: float) -> str:
    if similarity_pct >= 75 and deepfake_score >= 0.60:
        return "HIGH RISK: voice matches but audio may be synthetic — possible cloning attack"
    if similarity_pct >= 75:
        return "same person"
    if similarity_pct >= 55:
        return "inconclusive — manual review recommended"
    return "different people"

This case needs explicit handling in the interpretation logic, not just surfacing both scores independently.

What this doesn't solve

No system catches everything. The current generation of voice cloning tools produces audio that fools both humans and models at rates that should make anyone uncomfortable relying on voice verification as a sole authentication factor.

This is a layer in a defense stack, not a complete solution. Combined with behavioral signals, session context, and human review for edge cases, it raises the cost of a successful attack significantly. Used alone as a binary gate, it will eventually be bypassed.

The honest answer to "is this voice real?" is always a probability, not a fact. The system's job is to surface that probability clearly enough that the humans making decisions can act on it.