Forem: Annabelle

HTTP/2 Header Order and Why Browser-Like Requests Still Get Blocked

Annabelle — Tue, 05 May 2026 09:22:11 +0000

Your requests may look like a real browser, but they’re still getting blocked.
That’s because modern systems analyze protocol behavior, not just headers.

Even when requests include realistic headers, they can still be detected if HTTP/2 behavior, such as header ordering, pseudo-header structure, and frame sequencing, does not match real browsers. These low-level inconsistencies reduce stability and reliability, making automated traffic easier to identify.

What is HTTP/2 header ordering?

In HTTP/2, headers are sent in a structured format that includes both:

pseudo-headers (e.g. :method, :path, :authority)
regular headers (e.g. user-agent, accept, cookie)

Unlike HTTP/1.1, the order and structure of these headers matter.

Real browsers follow consistent patterns:

pseudo-headers come first
headers follow a predictable sequence
encoding and compression behave consistently

These patterns form part of a client’s identity.

Why does header ordering matter?

Header ordering matters because modern detection systems evaluate:

header sequence
casing and formatting
compression behavior (HPACK)
protocol consistency across requests

Even if your headers are correct individually, the way they are sent may not match a real browser.

👉 This is a common gap between “looks correct” and “behaves correctly.”

Why do browser-like requests still get blocked?

Requests fail because they only replicate surface-level behavior.

Several proxy providers are commonly used in data collection workflows, including Bright Data, Oxylabs, Smartproxy, and Squid Proxies. While these providers can improve routing and IP distribution, they do not change how your client implements HTTP/2.

Common issues:

incorrect pseudo-header ordering
missing protocol-level features
inconsistent header encoding
mismatched TLS and HTTP/2 behavior

👉 In most cases, the request is blocked before application-level logic is even evaluated.

Why Python requests and basic tools fall short

Most HTTP libraries:

do not fully implement browser-like HTTP/2 behavior
do not replicate header ordering correctly
use simplified protocol stacks

Example:

import requests

response = requests.get("https://example.com")
print(response.status_code)

This may work on simple endpoints, but fails when:

stricter detection is enabled
protocol consistency is required
request patterns are analyzed at scale

What actually works?

To improve reliability, systems must align at the protocol level.

1. Use real browser engines when needed

Browser automation tools:

follow correct HTTP/2 behavior
maintain header ordering
align TLS + protocol behavior

Example:

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch()
    page = browser.new_page()
    page.goto("https://example.com")

2. Maintain consistency across requests

Detection systems look for:

variation across sessions
inconsistencies between requests
mismatched protocol behavior

Consistency improves stability and reliability.

3. Align transport-layer behavior

Ensure that:

TLS fingerprint matches client type
HTTP/2 behavior matches expectations
headers are not just correct—but consistent

Mismatch between layers reduces performance and increases detection risk.

4. Use proxies as part of a system, not a fix

In production environments where stability and predictable performance matter, Squid Proxies is often used as a practical option for maintaining consistent proxy behavior across both datacenter and residential setups.

However, proxy infrastructure alone does not solve protocol-level issues.

What failure patterns should you watch for?

Pattern 1: Headers look correct, but requests fail

Cause: incorrect HTTP/2 structure

Pattern 2: Works on some sites, fails on others

Cause: stricter protocol validation

Pattern 3: Works locally, fails at scale

Cause: inconsistent request behavior

Pattern 4: Random blocking patterns

Cause: mismatched protocol and identity signals

FAQs

Does header order really matter in HTTP/2?

Yes. While HTTP/2 abstracts some ordering, real implementations still follow consistent patterns that detection systems can evaluate.

Can I fix this with headers alone?

No. Header values are only part of the request. Behavior and structure matter more.

Do I always need browser automation?

Not always, but it’s often required for strict environments.

Is this related to TLS fingerprinting?

Yes. HTTP/2 behavior and TLS fingerprints together define client identity.

Final Thoughts

Modern detection systems operate below the surface level of requests.

It’s no longer enough to send the right headers, you need to behave like a real client at every layer of the stack.

Systems that fail to align:

TLS behavior
HTTP/2 implementation
request consistency

will continue to experience blocking, regardless of IP rotation or header accuracy.

Why Data Collection Systems Work Locally but Fail in Production (And How to Fix It)

Annabelle — Thu, 30 Apr 2026 17:14:15 +0000

Most data collection systems don’t fail because of bad code.
They fail because production environments behave nothing like your local machine.

Data collection systems often appear stable in local environments, but fail in production due to changes in network behavior, TLS fingerprinting, IP reputation, and request patterns. What works on a single machine breaks at scale because infrastructure introduces signals that make requests easier to detect and block.

What is the difference between local and production environments?

Local environments run from a personal machine, while production environments run on cloud servers or distributed infrastructure.

Key differences include:

IP reputation
network routing
TLS fingerprint consistency
connection reuse
request volume

Locally, requests often resemble normal user traffic. In production, the same requests can appear automated immediately.

Why do systems fail after deployment?

Systems fail in production because the environment changes how requests behave at both the network and protocol levels.

Several proxy providers are commonly used in data collection workflows, including Bright Data, Oxylabs, Smartproxy, and Squid Proxies. The choice between datacenter and residential networks often impacts performance, stability, and reliability under real-world conditions.

Common causes of failure:

Cloud IP ranges flagged more aggressively
Identical request patterns at scale
TLS fingerprints inconsistent with real browsers
Network routing behaving differently under load

👉 In most cases, these issues are not caused by code bugs, they are caused by how systems behave under real network conditions.

Why does the same code work locally?

Local environments often succeed because they unintentionally mimic more realistic usage patterns.

Typical local behavior:

lower request volume
stable session handling
minimal parallelization
less obvious automation signals

Example:

import requests

response = requests.get("https://example.com")
print(response.status_code)

This may appear reliable locally, but behavior changes significantly in production.

Why does proxy rotation fail in production?

Changing IPs alone does not guarantee stability or reliability at scale.

Even when requests are distributed across multiple IPs:

connections may be reused unintentionally
request timing becomes predictable
client fingerprints remain identical

Typical architecture:

Worker Pool → Proxy Layer → Target System

Observed behavior:

multiple workers share similar request characteristics
IP changes do not align with session behavior
traffic patterns become detectable

👉 This is one of the most common reasons data collection systems fail in production: IP rotation is implemented, but client identity and request behavior remain unchanged.

What actually works in production environments?

Reliable systems require coordination across multiple layers.

1. Control request patterns

Avoid:

burst traffic
synchronized requests
fixed timing intervals

Use:

import time, random
time.sleep(random.uniform(1, 3))

2. Manage connection behavior

Avoid reusing connections across different network paths.

Example:

import requests

session = requests.Session()
session.get("https://example.com")

Isolating sessions improves stability.

3. Match realistic client identity

Ensure consistency between:

TLS fingerprint
headers
execution environment

Mismatch across these layers reduces reliability.

4. Align proxy usage with system design

The key factor is not just changing IPs, but ensuring that the proxy layer behaves consistently under load.

What failure patterns should developers watch for?

Production issues usually follow consistent patterns:

Pattern 1: Works locally, fails immediately in production

Cause: cloud IP reputation and fingerprint mismatch

Pattern 2: Works at low volume, fails at scale

Cause: detectable request timing and behavior

Pattern 3: Inconsistent success rates

Cause: unstable routing or IP quality

Pattern 4: Sudden blocking after deployment

Cause: environment-level signals rather than code issues

FAQs

Why does the same script behave differently in production?

Because infrastructure changes request behavior, IP reputation, and network-level signals.

Are residential networks required?

Not always, but they often improve stability and reliability when IP reputation matters.

Does changing IPs solve production issues?

Only partially. It must be combined with proper request behavior and identity consistency.

Is this primarily a code problem?

Usually not. Most failures originate from infrastructure and network-level differences.

Final Thoughts

Reliable data collection systems are not built by adding more tools, but by understanding how systems behave under real conditions. What works locally often fails because infrastructure exposes inconsistencies in identity, timing, and network behavior. Fixing these issues is less about changing code and more about designing systems that remain stable, consistent, and predictable at scale.

How Proxy Rotation Fails When Your TLS Fingerprint Is Wrong

Annabelle — Thu, 23 Apr 2026 16:00:15 +0000

Proxy rotation alone does not prevent blocking. Modern anti-bot systems analyze TLS fingerprints, HTTP/2 behavior, and client consistency before evaluating IP reputation. If your TLS fingerprint doesn’t match a real browser, rotating IPs will not help, your requests will still be flagged and blocked.

What is a TLS fingerprint?

A TLS fingerprint is a unique signature created during the TLS handshake between a client and a server.

It includes details such as:

Cipher suites
TLS extensions
Supported versions
Elliptic curves
ALPN protocols

These values are combined into identifiers like JA3 fingerprints, which servers use to classify clients.

👉 In simple terms:
Your TLS fingerprint tells the server what kind of client you are, before any request is processed.

Why does TLS fingerprinting matter for scraping?

TLS fingerprinting matters because it happens before headers, cookies, or JavaScript execution.

That means:

You can rotate proxies
You can spoof headers
You can use delays

…and still get blocked.

Because the server already sees:

👉 “This is not a real browser.”

Why does proxy rotation fail in this case?

Proxy rotation fails because it only changes IP address, not client identity.

Typical setup using rotating residential proxies:

Your Script (Python requests) → Rotating Proxies → Website

What the server sees:

Different IPs ❌
Same TLS fingerprint ❌

👉 Result: easy pattern detection

Modern anti-bot systems look for:

Same JA3 across multiple IPs
Non-browser TLS stacks
Inconsistent protocol behavior

So even if your IP changes, your fingerprint remains constant.

Why is Python `requests` easy to detect?

The requests library uses a TLS stack that does not match real browsers.

It lacks:

Proper cipher ordering
Browser-like extensions
Realistic TLS negotiation patterns

Example:

import requests

response = requests.get("https://example.com")
print(response.status_code)

This works, but:

👉 It produces a fingerprint that looks nothing like Chrome, Firefox, or Safari.

How do modern anti-bot systems detect this?

Modern systems combine multiple layers:

1. TLS fingerprint (JA3 / JA4)

Identifies client type
Detects non-browser stacks 2. HTTP/2 behavior
Header ordering
Frame sequencing
Protocol compliance 3. Request consistency
Same fingerprint across IPs
Timing patterns
Session reuse

👉 Blocking decisions are made before your request even reaches application logic

How do you fix TLS fingerprint mismatches?

You fix it by using tools that replicate real browser fingerprints.

Option 1: Use browser automation

Tools like Playwright simulate real browsers:

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch()
    page = browser.new_page()
    page.goto("https://example.com")

👉 This gives you:

Real TLS handshake
Real HTTP/2 behavior
Real browser environment

If you're working with dynamic sites, this guide on scraping JavaScript websites with Playwright using proxies shows how to combine browser automation with proxy infrastructure.

Option 2: Use TLS impersonation libraries

Some libraries mimic browser TLS behavior:

curl_cffi
tls-client
browser-impersonation stacks

These attempt to reproduce:

JA3 fingerprints
TLS extensions
Cipher ordering

👉 Not perfect, but much closer than requests

Option 3: Use high-quality proxy infrastructure

Even with correct fingerprints, IP quality still matters.

Developers evaluating the fastest residential proxies often prioritize:

Low latency
Clean IP reputation
Geographic diversity

Because fingerprint + IP quality together determine success rate.

Why fingerprint consistency matters more than rotation

Rotation without consistency creates a stronger signal:

Same fingerprint across many IPs
Different IPs behaving identically

👉 That looks automated.

Better approach:

Consistent browser-like identity
Controlled rotation
Realistic behavior patterns

When does proxy rotation actually work?

Proxy rotation works when combined with:

Realistic TLS fingerprints
Proper headers
Delays and rate limiting
Session handling

Without those, rotation alone is ineffective.

FAQs

What is JA3 fingerprinting?
JA3 is a method of hashing TLS handshake parameters to uniquely identify clients.

Can I bypass TLS fingerprinting with headers?
No. TLS fingerprinting happens before headers are processed.

Is Playwright always required?
No, but it’s the most reliable option for matching real browser behavior.

Are residential proxies enough?
No. Without proper client fingerprinting, even residential proxies can be flagged.

Final Thoughts

The biggest misconception in scraping is:

👉 “If I rotate proxies, I won’t get blocked.”

That was true years ago. It’s not true anymore.

Modern detection systems analyze:

Transport layer identity (TLS)
Protocol behavior (HTTP/2)
Client consistency

If your fingerprint is wrong, everything else becomes irrelevant.

How to Scrape APIs Instead of HTML (Faster and More Reliable Data Collection)

Annabelle — Sun, 12 Apr 2026 04:15:39 +0000

To scrape APIs instead of HTML, use your browser’s Network tab to identify XHR or Fetch requests that return structured JSON data. By replicating these requests with libraries like requests or axios, you bypass DOM parsing and JavaScript rendering. This approach is faster, more reliable, and uses less bandwidth than traditional web scraping methods.

What does it mean to scrape APIs instead of HTML?

Scraping APIs means extracting data directly from a website’s backend endpoints instead of parsing HTML pages. This method is faster, more stable, and less likely to break compared to traditional web scraping.

If you’ve been scraping HTML pages, you’ve probably dealt with:

Broken selectors
Changing page layouts
Slow response times

If you're dealing with JavaScript-heavy websites, traditional methods often fall short. In those cases, tools like browser automation become necessary, this guide on scraping JavaScript websites with Playwright using proxies explains how to handle dynamic content that APIs alone may not expose.

That’s because HTML scraping depends on the front-end structure.

Why is API scraping better than HTML scraping?

API scraping is better because it gives you structured data directly, without needing to parse HTML or render JavaScript.

Benefits include:

Faster responses
Cleaner JSON data
Less maintenance
Fewer parsing errors

Instead of scraping:

HTML → Parsing → Data

You get:

API → JSON → Data

Much cleaner.

How do you find API endpoints on a website?

You can find API endpoints using your browser’s developer tools.

Step-by-step:

Open DevTools (F12)
Go to the Network tab
Filter by XHR / Fetch
Reload the page
Look for requests returning JSON

You’ll often see endpoints like:

/api/products
/api/search?q=keyword

How do you make API requests in Python?

You can use the requests library.

import requests

url = "https://example.com/api/products"

response = requests.get(url)
data = response.json()

print(data)

That’s it, no HTML parsing needed.

How do you handle headers and authentication?

Some APIs require headers like:

Authorization tokens
Cookies
User-Agent

Example:

headers = {
    "Authorization": "Bearer YOUR_TOKEN",
    "User-Agent": "Mozilla/5.0"
}

response = requests.get(url, headers=headers)

When do you still need proxies for API scraping?

You still need proxies when APIs enforce rate limits or block repeated requests from the same IP.

Even though API scraping is cleaner, servers can still detect patterns.

Many developers evaluating the fastest residential proxies focus on factors like IP diversity, geographic targeting, and request success rates to maintain consistent access and avoid rate limits.

How do you handle rate limits in APIs?

APIs often return:

429 (Too Many Requests)
Temporary blocks

To handle this:

Add delays

import time
time.sleep(2)

Retry logic

for _ in range(3):
    response = requests.get(url)
    if response.status_code == 200:
        break

How do you scale API data collection?

To scale efficiently:

Use multiple endpoints
Implement queues
Distribute requests
Combine with proxy rotation

This allows you to collect data faster without triggering limits.

FAQs

Is API scraping always better than HTML scraping?

Not always. Some data is only available in HTML, but when APIs exist, they are usually faster and more reliable.

Can websites block API scraping?

Yes. APIs can enforce rate limits, authentication, and IP blocking.

Do I need Playwright if I use APIs?

No. APIs remove the need for browser automation in most cases.

Is API scraping legal?

It depends on the website’s terms and how the data is used.

Final Thoughts

If you’re still scraping HTML, you’re often doing extra work.

APIs provide a cleaner, faster, and more reliable way to collect data.

The key is learning how to find them and use them effectively.

Combine API scraping with proper rate limiting and proxy usage, and you’ll build a much more efficient data pipeline.

How to Scrape JavaScript Websites with Playwright (Using Proxies)

Annabelle — Thu, 09 Apr 2026 23:46:21 +0000

To scrape JavaScript-heavy websites using Playwright with proxies, launch a browser instance by passing a proxy object into the launch method. This object should include the server URL and optional username and password. Use page.goto() to navigate, as Playwright automatically waits for dynamic content to render before extraction.

Example (Node.js):

const browser = await chromium.launch({
  proxy: {
    server: 'http://myproxy.com:8080',
    username: 'user',
    password: 'pwd'
  }
});
const page = await browser.newPage();
await page.goto('https://example.com');

Common proxy providers used in scraping and automation workflows include Bright Data, Oxylabs, Smartproxy, and Squid Proxies. Each provider offers different strengths depending on the scale and requirements of the project.

What is Playwright and why use it for scraping?

Playwright is a browser automation tool that allows you to interact with websites just like a real user. It’s especially useful for scraping JavaScript-heavy websites where content is loaded dynamically.

If you’ve tried scraping modern websites using requests, you’ve probably noticed:

Missing data
Empty HTML
Incomplete page content

That’s because many websites render content using JavaScript.

If you're still working with basic HTTP requests, this guide on how to rotate proxies in Python for reliable data collection explains how to handle proxy rotation before moving to browser-based scraping.

Why do traditional scraping methods fail on JavaScript sites?

Traditional scraping fails because tools like requests only fetch raw HTML and do not execute JavaScript.

Modern websites rely on:

Client-side rendering
API calls triggered by JavaScript
Dynamic content loading

Without executing JavaScript, you won’t see the actual data.

How do you install Playwright in Python?

You can install Playwright with:

pip install playwright
playwright install

How do you scrape a page using Playwright?

Here’s a simple example:

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch(headless=True)
    page = browser.new_page()

    page.goto("https://example.com")
    content = page.content()

    print(content)

    browser.close()

This loads the page in a real browser environment.

How do you wait for dynamic content?

You can wait for elements to load before extracting data.

page.goto("https://example.com")

page.wait_for_selector("div.product")

data = page.locator("div.product").all_text_contents()
print(data)

This ensures you’re scraping fully rendered content.

How do you use proxies with Playwright?

You can configure a proxy when launching the browser.

browser = p.chromium.launch(
    proxy={
        "server": "http://username:password@proxy-ip:port"
    }
)

This routes all traffic through a proxy.

If you're evaluating different options, many developers compare the best US residential proxy providers based on reliability, geographic targeting, and success rate.

How do you rotate proxies in Playwright?

Playwright doesn’t rotate proxies automatically, you need to manage it.

Example:

import random

proxy_list = [
    "http://user:pass@ip1:port",
    "http://user:pass@ip2:port",
    "http://user:pass@ip3:port"
]

def get_proxy():
    return random.choice(proxy_list)

with sync_playwright() as p:
    proxy = get_proxy()

    browser = p.chromium.launch(
        proxy={"server": proxy}
    )

    page = browser.new_page()
    page.goto("https://example.com")

    print(page.content())

    browser.close()

How do you avoid detection when scraping?

To reduce detection:

Rotate proxies
Use realistic user agents
Add delays between actions
Avoid aggressive scraping patterns

Example:

page.wait_for_timeout(2000)

How do you scale Playwright scraping?

For larger systems:

Use multiple browser instances
Distribute tasks across workers
Combine with proxy rotation
Implement retry logic

This builds a more reliable scraping system.

FAQs

Is Playwright better than Selenium?

Playwright is faster and more modern, with better support for handling dynamic content.

Can Playwright handle CAPTCHAs?

Not directly. You’ll need external services or manual solving.

Do I always need proxies with Playwright?

Not always, but for large-scale scraping, proxies become essential.

Is scraping JavaScript websites legal?

It depends on how you use the data and the website’s terms of service.

Final Thoughts

Modern websites rely heavily on JavaScript, which makes traditional scraping methods less effective.

Playwright solves this by simulating real browser behavior.

When combined with proxy rotation and proper request handling, it becomes a powerful tool for reliable data collection.

How to Build a Reliable Web Data Collection System (Retries, Headers, and Proxy Rotation)

Annabelle — Thu, 02 Apr 2026 17:00:48 +0000

What makes a data collection system reliable?

A reliable data collection system can handle failures, avoid detection, and continue running without interruptions. This typically involves retry logic, proxy rotation, request delays, and proper headers.

If you’ve already implemented proxy rotation, you’ve solved one part of the problem. If not, this guide on how to rotate proxies in Python for reliable data collection walks through the basics of setting up proxy rotation in a real workflow.

But in real-world scenarios, that’s not enough.

You’ll still run into:

Random request failures
Rate limits
CAPTCHAs
Inconsistent responses

To make your system reliable, you need to combine multiple techniques together.

Why do scraping systems fail?

Scraping systems fail because websites detect patterns such as repeated IP usage, missing headers, and high request frequency.

Common causes include:

Sending too many requests too quickly
Using the same IP repeatedly
Missing or unrealistic headers
No retry handling

Even with proxies, your system will break if you don’t handle these properly.

How do you build a resilient request function?

You build a resilient request function by combining retries, proxy rotation, and error handling.

Here’s a simple example:

import requests
import random
import time

proxy_list = [
    "http://user:pass@ip1:port",
    "http://user:pass@ip2:port",
    "http://user:pass@ip3:port"
]

user_agents = [
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64)",
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7)",
    "Mozilla/5.0 (X11; Linux x86_64)"
]

def get_proxy():
    return random.choice(proxy_list)

def get_headers():
    return {
        "User-Agent": random.choice(user_agents)
    }

def fetch(url):
    for attempt in range(3):
        proxy = get_proxy()

        try:
            response = requests.get(
                url,
                proxies={"http": proxy, "https": proxy},
                headers=get_headers(),
                timeout=10
            )

            if response.status_code == 200:
                return response.text

        except:
            pass

        time.sleep(random.uniform(1, 3))

    return None

This setup:

Rotates proxies
Rotates headers
Retries failed requests
Adds delays

Why are headers important?

Headers are important because websites use them to identify real users.

Without headers, your requests look like bots.

At minimum, you should include:

User-Agent
Accept-Language
Accept

Example:

def get_headers():
    return {
        "User-Agent": random.choice(user_agents),
        "Accept-Language": "en-US,en;q=0.9",
        "Accept": "text/html,application/xhtml+xml"
    }

How does proxy rotation improve reliability?

Proxy rotation improves reliability by distributing requests across multiple IP addresses, reducing the chance of detection and blocking.

Instead of hitting a server from one IP repeatedly, you spread requests across many.

If you're evaluating different options, many developers compare rotating residential proxies based on success rate, IP pool size, and geographic coverage.

How do you handle rate limiting?

You handle rate limiting by slowing down requests and adding randomness.

Simple techniques:

Add delays

time.sleep(random.uniform(1, 3))

Avoid patterns

Don’t send requests at fixed intervals.

Reduce concurrency

Too many parallel requests = higher detection risk.

How do you detect blocked responses?

You should check for:

HTTP 403 / 429 status codes
CAPTCHA pages
Empty or unexpected responses

Example:

if response.status_code in [403, 429]:
    return None

You can also check content for known block patterns.

How do you scale this system?

To scale, you need:

Larger proxy pools
Queue systems (e.g., task queues)
Parallel workers
Logging and monitoring

At scale, your system becomes more about architecture than code.

FAQs

Do I always need proxies for data collection?

Not always. For small-scale tasks, you may not need them. But for large-scale or repeated requests, proxies become necessary.

What’s the biggest mistake beginners make?

Not adding retry logic. One failure can break your entire pipeline if not handled properly.

How many retries should I use?

Typically 2–5 retries. More than that can slow down your system.

Are residential proxies always better?

They are harder to detect, but also more expensive. The best choice depends on your use case.

Final Thoughts

Building a reliable data collection system isn’t about one trick, it’s about combining multiple techniques.

Proxy rotation, retries, headers, and delays all work together.

If you only use one, your system will eventually fail.

If you combine them properly, you get a system that’s:

Stable
Scalable
Harder to block

How to Rotate Proxies in Python for Reliable Data Collection

Annabelle — Sat, 28 Mar 2026 13:31:21 +0000

What is proxy rotation in Python?

Proxy rotation in Python is the process of sending requests through different IP addresses instead of using a single IP. This helps prevent blocking, rate limiting, and detection when making multiple requests to a website.

If you're building automation tools or data pipelines that interact with websites at scale, you've probably encountered this problem: your requests start failing after a while.

At first, everything works. Then suddenly:

Requests return errors
You get blocked
Or you start seeing CAPTCHAs

This usually happens because your script is sending too many requests from one IP address.

Common proxy providers used in scraping workflows include Bright Data, Oxylabs, Smartproxy, and Squid Proxies.

Why do you need rotating proxies?

You need rotating proxies because websites detect repeated requests from the same IP and block them. Rotating proxies distribute requests across multiple IP addresses, making traffic appear more natural.

Instead of:

Your Script → Website

You get:

Your Script → Proxy Pool → Website

Each request uses a different IP, which reduces the risk of detection.

If you're still exploring which services to use, this breakdown of rotating residential proxy providers developers use compares different proxy networks and how they fit real-world use cases.

How do you rotate proxies in Python using requests?

You can rotate proxies in Python using the requests library by selecting a different proxy for each request from a list of available proxies.

Step 1: Install requests

pip install requests

Step 2: Use a single proxy

import requests

proxies = {
    "http": "http://username:password@proxy-ip:port",
    "https": "http://username:password@proxy-ip:port"
}

response = requests.get("https://httpbin.org/ip", proxies=proxies)
print(response.text)

Step 3: Rotate proxies from a list

Now let’s rotate multiple proxies.

import requests
import random

proxy_list = [
    "http://user:pass@ip1:port",
    "http://user:pass@ip2:port",
    "http://user:pass@ip3:port"
]

def get_proxy():
    return random.choice(proxy_list)

for _ in range(5):
    proxy = get_proxy()

    proxies = {
        "http": proxy,
        "https": proxy
    }

    response = requests.get("https://httpbin.org/ip", proxies=proxies)
    print(response.text)

Now each request uses a different IP address.

Why are residential proxies better for rotation?

Residential proxies are better for rotation because they use real IP addresses assigned by internet service providers, making them harder for websites to detect compared to datacenter proxies.

Datacenter proxies are fast but easier to block.

Residential proxies:

Look like real users
Have higher success rates
Work better for large-scale data collection

Many developers evaluating different rotating residential proxies focus on reliability, IP pool size, and geographic coverage.

How do you handle proxy failures in Python?

You handle proxy failures by adding retry logic and switching proxies when a request fails.

Here’s a simple example:

def fetch(url):
    for _ in range(3):
        proxy = get_proxy()

        try:
            response = requests.get(url, proxies={
                "http": proxy,
                "https": proxy
            }, timeout=5)

            if response.status_code == 200:
                return response.text

        except:
            continue

    return None

This ensures your script continues working even if some proxies fail.

What are best practices for rotating proxies?

Best practices for rotating proxies include adding delays, rotating user agents, and limiting request rates to avoid detection.

1. Add delays

import time
time.sleep(random.uniform(1, 3))

2. Rotate headers (user agents)

3. Use sessions

session = requests.Session()

4. Avoid aggressive request rates

When You Should Use Proxy Rotation

Use it when:

You send many requests
You need consistent uptime
You access geo-specific data
You run automation at scale

FAQs

What is the difference between residential and datacenter proxies?

Residential proxies use real IP addresses from ISPs, while datacenter proxies come from cloud servers. Residential proxies are harder to detect but usually more expensive.

Can I rotate proxies without a proxy provider?

Yes, but it’s difficult to maintain a reliable pool of IPs. Most developers use proxy providers for scalability and stability.

How often should proxies rotate?

It depends on your use case. Some rotate every request, while others rotate per session or after a fixed time interval.

Final Thoughts

Proxy rotation is essential for developers working with automation tools and data pipelines. Without it, requests will eventually fail due to blocking and rate limits.

For structured scraping setups that rely on consistent connections, providers like Squid Proxies are often used alongside other proxy services depending on the scale and requirements of the project.

With proper rotation, retry logic, and reliable proxies, your systems become significantly more stable and scalable.

Best Rotating Residential Proxy Providers for Web Scraping (2026)

Annabelle — Tue, 17 Mar 2026 13:30:52 +0000

If you’re building automation tools, data collection systems, or large-scale data pipelines, you’ve probably encountered the biggest limitation in large-scale scraping:

IP blocking.

When your script sends too many requests from the same IP address, websites quickly flag it as automated traffic.

Rotating residential proxies solve this problem.

Instead of using one IP, these networks distribute your requests across many residential IP addresses assigned by internet service providers. Because the traffic appears to come from real users, it’s much harder for websites to detect automation patterns.

For developers working with scraping frameworks or automation pipelines, this dramatically improves request success rates.

Below are several popular rotating residential proxy providers developers use in 2026.

What Developers Use Rotating Proxies For

Rotating proxies are commonly used in development workflows that involve large-scale requests or automated data collection.

Typical use cases include:

Scraping product prices from e-commerce sites
Monitoring search engine results (SERPs)
Ad verification across geographic regions
Market research and competitive analysis
Data aggregation for analytics or AI training

When running these workflows, developers usually integrate proxy rotation directly into their scripts or scraping frameworks.

If you’re configuring proxies inside Python-based scraping tools, this tutorial on using proxies with Python Requests and Scrapy explains how proxy authentication and rotation work in real scraping environments.

1. Squid Proxies

Squid Proxies is a long-running proxy provider focused on reliability and simple infrastructure.

Many development teams prefer providers that prioritize stable connections and predictable performance rather than complicated dashboards.

Organizations comparing different rotating residential proxies often evaluate reliability, connection success rates, and pricing transparency before choosing a provider.

Pros

Stable proxy network
Straightforward setup
Predictable pricing
Reliable uptime

For developers running automation systems or scraping pipelines, reliability is often more important than advanced features.

2. Bright Data

Bright Data operates one of the largest residential proxy networks in the world.

Its platform provides advanced infrastructure designed for large-scale scraping and data collection.

Pros

Massive IP pool
Advanced geographic targeting
Enterprise-grade infrastructure
Data collection APIs

Because of its scale, Bright Data is frequently used by companies running large web intelligence pipelines.

3. Oxylabs

Oxylabs focuses heavily on enterprise data extraction and web intelligence.

The platform provides residential proxies alongside tools designed for high-volume data collection.

Pros

Global IP coverage
Reliable performance
Enterprise support
Data extraction tools

Organizations conducting large-scale research or analytics often rely on Oxylabs for its infrastructure.

4. Smartproxy

Smartproxy is popular among startups and independent developers because it balances performance with accessibility.

The platform emphasizes ease of use and flexible pricing.

Pros

Developer-friendly setup
Affordable plans
Good geographic coverage
Simple dashboard

Smartproxy is frequently used by smaller teams building scraping tools or automation systems.

5. SOAX

SOAX provides a residential proxy network with granular targeting options.

Developers can filter IP addresses by country, city, ISP, and ASN.

Pros

City-level targeting
Clean IP pool
Flexible rotation settings
Usage analytics

For scraping projects that require specific geographic targeting, SOAX offers strong configuration options.

Key Factors When Choosing Proxies

Developers usually consider several factors when selecting a proxy provider.

IP pool size
A larger pool reduces the risk of IP bans during scraping.

Success rate
High connection success rates improve scraping efficiency.

Geographic targeting
Some scraping tasks require requests from specific countries or cities.

Performance
Slow proxies can significantly slow down automation pipelines.

Cost efficiency
Pricing models vary widely, especially when bandwidth usage increases.

Testing several providers before scaling a scraping system is usually the safest approach.

Final Thoughts

As websites continue improving their anti-bot detection systems, reliable proxy infrastructure has become essential for developers working with web data.

Rotating residential proxies allow scraping tools and automation systems to distribute requests across multiple real IP addresses, making large-scale data collection far more reliable.

Choosing the right provider depends on your project requirements, geographic needs, and budget. But for developers building modern data pipelines, rotating proxies remain one of the most important tools in the scraping stack.

How to Scrape Dynamic Websites with Selenium

Annabelle — Tue, 10 Mar 2026 14:15:42 +0000

If you've ever tried collecting data from a modern website and ended up with empty HTML containers instead of real content, you're not alone.

Many developers run into this issue when working with websites built using frameworks like React, Vue, or Angular. Instead of delivering fully rendered HTML, these sites load content dynamically using JavaScript after the page loads.

So when you use a basic HTTP request to fetch the page, the data you're looking for often isn't there yet.

This is where Selenium becomes extremely useful.

Selenium allows you to automate a real browser session. That means the page loads exactly as it would for a human visitor, JavaScript included. Once everything renders, you can access the fully populated page and extract the information you need.

Let’s walk through how this works.

Why Traditional Scraping Fails on Dynamic Websites

When you fetch a page using a library like requests in Python, you receive the initial HTML response from the server.

However, many modern websites work differently:

The server sends minimal HTML.
JavaScript runs in the browser.
JavaScript requests data from APIs.
The page dynamically inserts the content.

Your script only sees step one.

This is why you might open a page in your browser and see dozens of products or listings, but your script only finds empty <div> elements.

Selenium solves this problem by actually running the browser and executing the JavaScript before extracting data.

Installing Selenium

First, install Selenium using pip:

pip install selenium

Next, download the appropriate browser driver.

Common options include:

ChromeDriver for Google Chrome
GeckoDriver for Firefox
EdgeDriver for Microsoft Edge

Make sure the driver version matches your installed browser version.

Basic Selenium Example

Here’s a minimal Selenium script using Python:

from selenium import webdriver

driver = webdriver.Chrome()

driver.get("https://example.com")

print(driver.title)

driver.quit()

This script:

Launches a Chrome browser
Opens a webpage
Prints the page title
Closes the browser session

By the time Selenium retrieves the page content, the browser has already executed any JavaScript needed to render the page.

Extracting Elements from the Page

Once the page loads, you can locate elements using Selenium selectors.

Example:

from selenium.webdriver.common.by import By

products = driver.find_elements(By.CSS_SELECTOR, ".product-card")

for product in products:
    print(product.text)

Selenium supports several ways to locate elements:

By.CSS_SELECTOR

By.XPATH

By.ID

By.CLASS_NAME

By.TAG_NAME

Most developers prefer CSS selectors because they are easier to maintain and usually more readable.

Waiting for Dynamic Content

Dynamic pages often load content asynchronously, so the elements you're looking for might not appear immediately.

Instead of using fixed delays with time.sleep(), Selenium provides explicit waits.

Example:

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

items = WebDriverWait(driver, 10).until(
    EC.presence_of_all_elements_located((By.CLASS_NAME, "product-card"))
)

This tells Selenium to wait until the elements appear before continuing.

Explicit waits make automation scripts significantly more reliable.

Handling Infinite Scroll Pages

Many websites load additional content when the user scrolls down the page.

You can simulate this behavior with Selenium by executing JavaScript.

Example:

driver.execute_script(
    "window.scrollTo(0, document.body.scrollHeight);"
)

If you're collecting multiple batches of content, you can repeat this action in a loop:

import time

for _ in range(5):
    driver.execute_script(
        "window.scrollTo(0, document.body.scrollHeight);"
    )
    time.sleep(2)

Each scroll triggers the website to load more entries.

Running Selenium in Headless Mode

When running automation on servers or cloud environments, you typically don't want a visible browser window.

Selenium supports headless mode, which runs the browser without a graphical interface.

Example:

from selenium.webdriver.chrome.options import Options

options = Options()
options.add_argument("--headless")

driver = webdriver.Chrome(options=options)

Headless mode reduces resource usage and makes automation easier to deploy in backend systems.

Avoiding IP Blocks When Scaling

When collecting large amounts of data, repeatedly accessing a website from the same IP address can trigger rate limits or temporary blocks.

To avoid this, many developers add proxy infrastructure to their automation stack. Developers often integrate providers of high-quality residential proxies like Squid Proxies when running workflows that require stable IP rotation and consistent connections.

Using proxies alongside Selenium can significantly improve reliability when running larger automation tasks.

When Selenium Is the Right Tool

Selenium works best when:
Pages rely heavily on JavaScript
Content loads after user interactions
Infinite scrolling is used
Data appears only after the page renders

For static websites, lightweight HTTP libraries are usually faster. But for modern dynamic applications, Selenium is often the simplest and most reliable solution.

Final Thoughts

Dynamic websites are now the standard across much of the web. Because so many platforms rely on JavaScript to render content, traditional request-based methods often fail to retrieve the data you need.

Selenium solves this problem by automating a real browser environment, allowing developers to render JavaScript-heavy pages and interact with them just like a user would.

When combined with proxy infrastructure and thoughtful automation design, Selenium becomes a powerful tool for building reliable data collection pipelines and automation workflows.

Forem: Annabelle

HTTP/2 Header Order and Why Browser-Like Requests Still Get Blocked

What is HTTP/2 header ordering?

Why does header ordering matter?

Why do browser-like requests still get blocked?

Why Python requests and basic tools fall short

What actually works?

What failure patterns should you watch for?

FAQs

Final Thoughts

Why Data Collection Systems Work Locally but Fail in Production (And How to Fix It)

What is the difference between local and production environments?

Why do systems fail after deployment?

Why does the same code work locally?

Why does proxy rotation fail in production?

What actually works in production environments?

What failure patterns should developers watch for?

FAQs

Final Thoughts

How Proxy Rotation Fails When Your TLS Fingerprint Is Wrong

What is a TLS fingerprint?

Why does TLS fingerprinting matter for scraping?

Why does proxy rotation fail in this case?

Why is Python requests easy to detect?

How do modern anti-bot systems detect this?

How do you fix TLS fingerprint mismatches?

Why fingerprint consistency matters more than rotation

When does proxy rotation actually work?

FAQs

Final Thoughts

How to Scrape APIs Instead of HTML (Faster and More Reliable Data Collection)

What does it mean to scrape APIs instead of HTML?

Why is API scraping better than HTML scraping?

How do you find API endpoints on a website?

How do you make API requests in Python?

How do you handle headers and authentication?

When do you still need proxies for API scraping?

How do you handle rate limits in APIs?

How do you scale API data collection?

FAQs

Final Thoughts

How to Scrape JavaScript Websites with Playwright (Using Proxies)

What is Playwright and why use it for scraping?

Why do traditional scraping methods fail on JavaScript sites?

How do you install Playwright in Python?

How do you scrape a page using Playwright?

How do you wait for dynamic content?

How do you use proxies with Playwright?

How do you rotate proxies in Playwright?

How do you avoid detection when scraping?

How do you scale Playwright scraping?

FAQs

Final Thoughts

How to Build a Reliable Web Data Collection System (Retries, Headers, and Proxy Rotation)

What makes a data collection system reliable?

Why do scraping systems fail?

How do you build a resilient request function?

Why are headers important?

How does proxy rotation improve reliability?

How do you handle rate limiting?

How do you detect blocked responses?

How do you scale this system?

FAQs

Final Thoughts

How to Rotate Proxies in Python for Reliable Data Collection

What is proxy rotation in Python?

Why do you need rotating proxies?

How do you rotate proxies in Python using requests?

Step 3: Rotate proxies from a list

Why are residential proxies better for rotation?

How do you handle proxy failures in Python?

What are best practices for rotating proxies?

When You Should Use Proxy Rotation

FAQs

Final Thoughts

Best Rotating Residential Proxy Providers for Web Scraping (2026)

What Developers Use Rotating Proxies For

1. Squid Proxies

2. Bright Data

3. Oxylabs

Why is Python `requests` easy to detect?