<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Annabelle</title>
    <description>The latest articles on Forem by Annabelle (@ellebanna).</description>
    <link>https://forem.com/ellebanna</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3816989%2F3ee73845-1076-40b3-b806-f1b634bc2302.jpg</url>
      <title>Forem: Annabelle</title>
      <link>https://forem.com/ellebanna</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/ellebanna"/>
    <language>en</language>
    <item>
      <title>How to Scrape APIs Instead of HTML (Faster and More Reliable Data Collection)</title>
      <dc:creator>Annabelle</dc:creator>
      <pubDate>Sun, 12 Apr 2026 04:15:39 +0000</pubDate>
      <link>https://forem.com/ellebanna/how-to-scrape-apis-instead-of-html-faster-and-more-reliable-data-collection-5hdo</link>
      <guid>https://forem.com/ellebanna/how-to-scrape-apis-instead-of-html-faster-and-more-reliable-data-collection-5hdo</guid>
      <description>&lt;p&gt;To scrape APIs instead of HTML, use your browser’s &lt;strong&gt;Network tab&lt;/strong&gt; to identify XHR or Fetch requests that return structured JSON data. By replicating these requests with libraries like &lt;code&gt;requests&lt;/code&gt; or &lt;code&gt;axios&lt;/code&gt;, you bypass DOM parsing and JavaScript rendering. This approach is faster, more reliable, and uses less bandwidth than traditional web scraping methods.&lt;/p&gt;

&lt;h2&gt;
  
  
  What does it mean to scrape APIs instead of HTML?
&lt;/h2&gt;

&lt;p&gt;Scraping APIs means extracting data directly from a website’s backend endpoints instead of parsing HTML pages. This method is faster, more stable, and less likely to break compared to traditional web scraping.&lt;/p&gt;

&lt;p&gt;If you’ve been scraping HTML pages, you’ve probably dealt with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Broken selectors&lt;/li&gt;
&lt;li&gt;Changing page layouts&lt;/li&gt;
&lt;li&gt;Slow response times&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you're dealing with JavaScript-heavy websites, traditional methods often fall short. In those cases, tools like browser automation become necessary, this guide on &lt;a href="https://dev.to/ellebanna/how-to-scrape-javascript-websites-with-playwright-using-proxies-h30"&gt;scraping JavaScript websites with Playwright using proxies&lt;/a&gt; explains how to handle dynamic content that APIs alone may not expose.&lt;/p&gt;

&lt;p&gt;That’s because HTML scraping depends on the front-end structure.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why is API scraping better than HTML scraping?
&lt;/h2&gt;

&lt;p&gt;API scraping is better because it gives you structured data directly, without needing to parse HTML or render JavaScript.&lt;/p&gt;

&lt;p&gt;Benefits include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Faster responses&lt;/li&gt;
&lt;li&gt;Cleaner JSON data&lt;/li&gt;
&lt;li&gt;Less maintenance&lt;/li&gt;
&lt;li&gt;Fewer parsing errors&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Instead of scraping:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;HTML → Parsing → Data
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You get:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;API → JSON → Data
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Much cleaner.&lt;/p&gt;

&lt;h2&gt;
  
  
  How do you find API endpoints on a website?
&lt;/h2&gt;

&lt;p&gt;You can find API endpoints using your browser’s developer tools.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step-by-step:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Open DevTools (F12)&lt;/li&gt;
&lt;li&gt;Go to the Network tab&lt;/li&gt;
&lt;li&gt;Filter by XHR / Fetch&lt;/li&gt;
&lt;li&gt;Reload the page&lt;/li&gt;
&lt;li&gt;Look for requests returning JSON&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;You’ll often see endpoints like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;/api/products
/api/search?q&lt;span class="o"&gt;=&lt;/span&gt;keyword
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  How do you make API requests in Python?
&lt;/h2&gt;

&lt;p&gt;You can use the &lt;code&gt;requests&lt;/code&gt; library.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;

&lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://example.com/api/products&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That’s it, no HTML parsing needed.&lt;/p&gt;

&lt;h2&gt;
  
  
  How do you handle headers and authentication?
&lt;/h2&gt;

&lt;p&gt;Some APIs require headers like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Authorization tokens&lt;/li&gt;
&lt;li&gt;Cookies&lt;/li&gt;
&lt;li&gt;User-Agent&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;headers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Authorization&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Bearer YOUR_TOKEN&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;User-Agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Mozilla/5.0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  When do you still need proxies for API scraping?
&lt;/h2&gt;

&lt;p&gt;You still need proxies when APIs enforce rate limits or block repeated requests from the same IP.&lt;/p&gt;

&lt;p&gt;Even though API scraping is cleaner, servers can still detect patterns.&lt;/p&gt;

&lt;p&gt;Many developers evaluating the &lt;a href="https://www.squidproxies.com/?utm_source=dev.to&amp;amp;utm_campaign=scrape+API+instead+HTML"&gt;fastest residential proxies&lt;/a&gt; focus on factors like IP diversity, geographic targeting, and request success rates to maintain consistent access and avoid rate limits.&lt;/p&gt;

&lt;h2&gt;
  
  
  How do you handle rate limits in APIs?
&lt;/h2&gt;

&lt;p&gt;APIs often return:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;429 (Too Many Requests)&lt;/li&gt;
&lt;li&gt;Temporary blocks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To handle this:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Add delays&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;
&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Retry logic&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;break&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  How do you scale API data collection?
&lt;/h2&gt;

&lt;p&gt;To scale efficiently:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use multiple endpoints&lt;/li&gt;
&lt;li&gt;Implement queues&lt;/li&gt;
&lt;li&gt;Distribute requests&lt;/li&gt;
&lt;li&gt;Combine with proxy rotation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This allows you to collect data faster without triggering limits.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQs
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Is API scraping always better than HTML scraping?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Not always. Some data is only available in HTML, but when APIs exist, they are usually faster and more reliable.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can websites block API scraping?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Yes. APIs can enforce rate limits, authentication, and IP blocking.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Do I need Playwright if I use APIs?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;No. APIs remove the need for browser automation in most cases.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Is API scraping legal?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It depends on the website’s terms and how the data is used.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;If you’re still scraping HTML, you’re often doing extra work.&lt;/p&gt;

&lt;p&gt;APIs provide a cleaner, faster, and more reliable way to collect data.&lt;/p&gt;

&lt;p&gt;The key is learning how to find them and use them effectively.&lt;/p&gt;

&lt;p&gt;Combine API scraping with proper rate limiting and proxy usage, and you’ll build a much more efficient data pipeline.&lt;/p&gt;

</description>
      <category>webscraping</category>
      <category>python</category>
      <category>api</category>
      <category>programming</category>
    </item>
    <item>
      <title>How to Scrape JavaScript Websites with Playwright (Using Proxies)</title>
      <dc:creator>Annabelle</dc:creator>
      <pubDate>Thu, 09 Apr 2026 23:46:21 +0000</pubDate>
      <link>https://forem.com/ellebanna/how-to-scrape-javascript-websites-with-playwright-using-proxies-h30</link>
      <guid>https://forem.com/ellebanna/how-to-scrape-javascript-websites-with-playwright-using-proxies-h30</guid>
      <description>&lt;p&gt;To scrape JavaScript-heavy websites using Playwright with proxies, launch a browser instance by passing a &lt;code&gt;proxy&lt;/code&gt; object into the &lt;code&gt;launch&lt;/code&gt; method. This object should include the &lt;code&gt;server&lt;/code&gt; URL and optional &lt;code&gt;username&lt;/code&gt; and &lt;code&gt;password&lt;/code&gt;. Use &lt;code&gt;page.goto()&lt;/code&gt; to navigate, as Playwright automatically waits for dynamic content to render before extraction.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example (Node.js):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;browser&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;chromium&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;launch&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;proxy&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;server&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;http://myproxy.com:8080&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;username&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;password&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;pwd&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;browser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;newPage&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;goto&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;https://example.com&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What is Playwright and why use it for scraping?
&lt;/h2&gt;

&lt;p&gt;Playwright is a browser automation tool that allows you to interact with websites just like a real user. It’s especially useful for scraping JavaScript-heavy websites where content is loaded dynamically.&lt;/p&gt;

&lt;p&gt;If you’ve tried scraping modern websites using &lt;code&gt;requests&lt;/code&gt;, you’ve probably noticed:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Missing data&lt;/li&gt;
&lt;li&gt;Empty HTML&lt;/li&gt;
&lt;li&gt;Incomplete page content&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s because many websites render content using JavaScript.&lt;/p&gt;

&lt;p&gt;If you're still working with basic HTTP requests, this guide on &lt;a href="https://dev.to/ellebanna/how-to-rotate-proxies-in-python-for-reliable-data-collection-5eao"&gt;how to rotate proxies in Python for reliable data collection&lt;/a&gt; explains how to handle proxy rotation before moving to browser-based scraping.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why do traditional scraping methods fail on JavaScript sites?
&lt;/h2&gt;

&lt;p&gt;Traditional scraping fails because tools like requests only fetch raw HTML and do not execute JavaScript.&lt;/p&gt;

&lt;p&gt;Modern websites rely on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Client-side rendering&lt;/li&gt;
&lt;li&gt;API calls triggered by JavaScript&lt;/li&gt;
&lt;li&gt;Dynamic content loading&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without executing JavaScript, you won’t see the actual data.&lt;/p&gt;

&lt;h2&gt;
  
  
  How do you install Playwright in Python?
&lt;/h2&gt;

&lt;p&gt;You can install Playwright with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;playwright
playwright &lt;span class="nb"&gt;install&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  How do you scrape a page using Playwright?
&lt;/h2&gt;

&lt;p&gt;Here’s a simple example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;playwright.sync_api&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sync_playwright&lt;/span&gt;

&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;sync_playwright&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;browser&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chromium&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;launch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;headless&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;page&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;browser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new_page&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;goto&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://example.com&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;content&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;browser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This loads the page in a real browser environment.&lt;/p&gt;

&lt;h2&gt;
  
  
  How do you wait for dynamic content?
&lt;/h2&gt;

&lt;p&gt;You can wait for elements to load before extracting data.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;goto&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://example.com&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;wait_for_selector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;div.product&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;locator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;div.product&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;all_text_contents&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This ensures you’re scraping fully rendered content.&lt;/p&gt;

&lt;h2&gt;
  
  
  How do you use proxies with Playwright?
&lt;/h2&gt;

&lt;p&gt;You can configure a proxy when launching the browser.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;browser&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chromium&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;launch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;proxy&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;server&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://username:password@proxy-ip:port&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This routes all traffic through a proxy.&lt;/p&gt;

&lt;p&gt;If you're evaluating different options, many developers compare the &lt;a href="https://www.squidproxies.com/" rel="noopener noreferrer"&gt;best US residential proxy providers&lt;/a&gt; based on reliability, geographic targeting, and success rate.&lt;/p&gt;

&lt;h2&gt;
  
  
  How do you rotate proxies in Playwright?
&lt;/h2&gt;

&lt;p&gt;Playwright doesn’t rotate proxies automatically, you need to manage it.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;

&lt;span class="n"&gt;proxy_list&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://user:pass@ip1:port&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://user:pass@ip2:port&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://user:pass@ip3:port&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_proxy&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;choice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;proxy_list&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;sync_playwright&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;proxy&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_proxy&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;browser&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chromium&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;launch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;proxy&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;server&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;proxy&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;page&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;browser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new_page&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;goto&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://example.com&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;content&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;

    &lt;span class="n"&gt;browser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  How do you avoid detection when scraping?
&lt;/h2&gt;

&lt;p&gt;To reduce detection:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Rotate proxies&lt;/li&gt;
&lt;li&gt;Use realistic user agents&lt;/li&gt;
&lt;li&gt;Add delays between actions&lt;/li&gt;
&lt;li&gt;Avoid aggressive scraping patterns&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;wait_for_timeout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  How do you scale Playwright scraping?
&lt;/h2&gt;

&lt;p&gt;For larger systems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use multiple browser instances&lt;/li&gt;
&lt;li&gt;Distribute tasks across workers&lt;/li&gt;
&lt;li&gt;Combine with proxy rotation&lt;/li&gt;
&lt;li&gt;Implement retry logic&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This builds a more reliable scraping system.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQs
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Is Playwright better than Selenium?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Playwright is faster and more modern, with better support for handling dynamic content.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can Playwright handle CAPTCHAs?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Not directly. You’ll need external services or manual solving.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Do I always need proxies with Playwright?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Not always, but for large-scale scraping, proxies become essential.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Is scraping JavaScript websites legal?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It depends on how you use the data and the website’s terms of service.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;Modern websites rely heavily on JavaScript, which makes traditional scraping methods less effective.&lt;/p&gt;

&lt;p&gt;Playwright solves this by simulating real browser behavior.&lt;/p&gt;

&lt;p&gt;When combined with proxy rotation and proper request handling, it becomes a powerful tool for reliable data collection.&lt;/p&gt;

</description>
      <category>automation</category>
      <category>javascript</category>
      <category>python</category>
      <category>webscraping</category>
    </item>
    <item>
      <title>How to Build a Reliable Web Data Collection System (Retries, Headers, and Proxy Rotation)</title>
      <dc:creator>Annabelle</dc:creator>
      <pubDate>Thu, 02 Apr 2026 17:00:48 +0000</pubDate>
      <link>https://forem.com/ellebanna/how-to-build-a-reliable-web-data-collection-system-retries-headers-and-proxy-rotation-48jk</link>
      <guid>https://forem.com/ellebanna/how-to-build-a-reliable-web-data-collection-system-retries-headers-and-proxy-rotation-48jk</guid>
      <description>&lt;h2&gt;
  
  
  What makes a data collection system reliable?
&lt;/h2&gt;

&lt;p&gt;A reliable data collection system can handle failures, avoid detection, and continue running without interruptions. This typically involves retry logic, proxy rotation, request delays, and proper headers.&lt;/p&gt;

&lt;p&gt;If you’ve already implemented proxy rotation, you’ve solved one part of the problem. If not, this guide on &lt;a href="https://dev.to/ellebanna/how-to-rotate-proxies-in-python-for-reliable-data-collection-5eao"&gt;how to rotate proxies in Python for reliable data collection&lt;/a&gt; walks through the basics of setting up proxy rotation in a real workflow.&lt;/p&gt;

&lt;p&gt;But in real-world scenarios, that’s not enough.&lt;/p&gt;

&lt;p&gt;You’ll still run into:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Random request failures&lt;/li&gt;
&lt;li&gt;Rate limits&lt;/li&gt;
&lt;li&gt;CAPTCHAs&lt;/li&gt;
&lt;li&gt;Inconsistent responses&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To make your system reliable, you need to combine &lt;strong&gt;multiple techniques together&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why do scraping systems fail?
&lt;/h2&gt;

&lt;p&gt;Scraping systems fail because websites detect patterns such as repeated IP usage, missing headers, and high request frequency.&lt;/p&gt;

&lt;p&gt;Common causes include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Sending too many requests too quickly&lt;/li&gt;
&lt;li&gt;Using the same IP repeatedly&lt;/li&gt;
&lt;li&gt;Missing or unrealistic headers&lt;/li&gt;
&lt;li&gt;No retry handling&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Even with proxies, your system will break if you don’t handle these properly.&lt;/p&gt;

&lt;h2&gt;
  
  
  How do you build a resilient request function?
&lt;/h2&gt;

&lt;p&gt;You build a resilient request function by combining retries, proxy rotation, and error handling.&lt;/p&gt;

&lt;p&gt;Here’s a simple example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;

&lt;span class="n"&gt;proxy_list&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://user:pass@ip1:port&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://user:pass@ip2:port&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://user:pass@ip3:port&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;user_agents&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Mozilla/5.0 (Windows NT 10.0; Win64; x64)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Mozilla/5.0 (X11; Linux x86_64)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_proxy&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;choice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;proxy_list&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_headers&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;User-Agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;choice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_agents&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;attempt&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;proxy&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_proxy&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;proxies&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;proxy&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;proxy&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;get_headers&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
                &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;

            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;

        &lt;span class="k"&gt;except&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;pass&lt;/span&gt;

        &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;uniform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This setup:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Rotates proxies&lt;/li&gt;
&lt;li&gt;Rotates headers&lt;/li&gt;
&lt;li&gt;Retries failed requests&lt;/li&gt;
&lt;li&gt;Adds delays&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why are headers important?
&lt;/h2&gt;

&lt;p&gt;Headers are important because websites use them to identify real users.&lt;/p&gt;

&lt;p&gt;Without headers, your requests look like bots.&lt;/p&gt;

&lt;p&gt;At minimum, you should include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;User-Agent&lt;/li&gt;
&lt;li&gt;Accept-Language&lt;/li&gt;
&lt;li&gt;Accept&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_headers&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;User-Agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;choice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_agents&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Accept-Language&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;en-US,en;q=0.9&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Accept&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text/html,application/xhtml+xml&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  How does proxy rotation improve reliability?
&lt;/h2&gt;

&lt;p&gt;Proxy rotation improves reliability by distributing requests across multiple IP addresses, reducing the chance of detection and blocking.&lt;/p&gt;

&lt;p&gt;Instead of hitting a server from one IP repeatedly, you spread requests across many.&lt;/p&gt;

&lt;p&gt;If you're evaluating different options, many developers compare &lt;a href="https://www.squidproxies.com/" rel="noopener noreferrer"&gt;rotating residential proxies&lt;/a&gt; based on success rate, IP pool size, and geographic coverage.&lt;/p&gt;

&lt;h2&gt;
  
  
  How do you handle rate limiting?
&lt;/h2&gt;

&lt;p&gt;You handle rate limiting by slowing down requests and adding randomness.&lt;/p&gt;

&lt;p&gt;Simple techniques:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Add delays&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;uniform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Avoid patterns&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Don’t send requests at fixed intervals.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reduce concurrency&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Too many parallel requests = higher detection risk.&lt;/p&gt;

&lt;h2&gt;
  
  
  How do you detect blocked responses?
&lt;/h2&gt;

&lt;p&gt;You should check for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;HTTP 403 / 429 status codes&lt;/li&gt;
&lt;li&gt;CAPTCHA pages&lt;/li&gt;
&lt;li&gt;Empty or unexpected responses&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;403&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;429&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can also check content for known block patterns.&lt;/p&gt;

&lt;h2&gt;
  
  
  How do you scale this system?
&lt;/h2&gt;

&lt;p&gt;To scale, you need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Larger proxy pools&lt;/li&gt;
&lt;li&gt;Queue systems (e.g., task queues)&lt;/li&gt;
&lt;li&gt;Parallel workers&lt;/li&gt;
&lt;li&gt;Logging and monitoring&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At scale, your system becomes more about &lt;strong&gt;architecture than code&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQs
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Do I always need proxies for data collection?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Not always. For small-scale tasks, you may not need them. But for large-scale or repeated requests, proxies become necessary.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What’s the biggest mistake beginners make?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Not adding retry logic. One failure can break your entire pipeline if not handled properly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How many retries should I use?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Typically 2–5 retries. More than that can slow down your system.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Are residential proxies always better?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;They are harder to detect, but also more expensive. The best choice depends on your use case.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;Building a reliable data collection system isn’t about one trick, it’s about combining multiple techniques.&lt;/p&gt;

&lt;p&gt;Proxy rotation, retries, headers, and delays all work together.&lt;/p&gt;

&lt;p&gt;If you only use one, your system will eventually fail.&lt;/p&gt;

&lt;p&gt;If you combine them properly, you get a system that’s:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Stable&lt;/li&gt;
&lt;li&gt;Scalable&lt;/li&gt;
&lt;li&gt;Harder to block&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>webscraping</category>
      <category>python</category>
      <category>devops</category>
      <category>programming</category>
    </item>
    <item>
      <title>How to Rotate Proxies in Python for Reliable Data Collection</title>
      <dc:creator>Annabelle</dc:creator>
      <pubDate>Sat, 28 Mar 2026 13:31:21 +0000</pubDate>
      <link>https://forem.com/ellebanna/how-to-rotate-proxies-in-python-for-reliable-data-collection-5eao</link>
      <guid>https://forem.com/ellebanna/how-to-rotate-proxies-in-python-for-reliable-data-collection-5eao</guid>
      <description>&lt;h2&gt;
  
  
  What is proxy rotation in Python?
&lt;/h2&gt;

&lt;p&gt;Proxy rotation in Python is the process of sending requests through different IP addresses instead of using a single IP. This helps prevent blocking, rate limiting, and detection when making multiple requests to a website.&lt;/p&gt;

&lt;p&gt;If you're building automation tools or data pipelines that interact with websites at scale, you've probably encountered this problem: your requests start failing after a while.&lt;/p&gt;

&lt;p&gt;At first, everything works. Then suddenly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Requests return errors&lt;/li&gt;
&lt;li&gt;You get blocked&lt;/li&gt;
&lt;li&gt;Or you start seeing CAPTCHAs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This usually happens because your script is sending too many requests from one IP address.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why do you need rotating proxies?
&lt;/h2&gt;

&lt;p&gt;You need rotating proxies because websites detect repeated requests from the same IP and block them. Rotating proxies distribute requests across multiple IP addresses, making traffic appear more natural.&lt;/p&gt;

&lt;p&gt;Instead of:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;Your Script → Website&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;You get:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;Your Script → Proxy Pool → Website&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Each request uses a different IP, which reduces the risk of detection.&lt;/p&gt;

&lt;p&gt;If you're still exploring which services to use, this breakdown of &lt;a href="https://dev.to/ellebanna/best-rotating-residential-proxy-providers-for-web-scraping-2026-29o1"&gt;rotating residential proxy providers developers use&lt;/a&gt; compares different proxy networks and how they fit real-world use cases.&lt;/p&gt;

&lt;h2&gt;
  
  
  How do you rotate proxies in Python using requests?
&lt;/h2&gt;

&lt;p&gt;You can rotate proxies in Python using the requests library by selecting a different proxy for each request from a list of available proxies.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 1: Install requests&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;requests
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Step 2: Use a single proxy&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;

&lt;span class="n"&gt;proxies&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://username:password@proxy-ip:port&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://username:password@proxy-ip:port&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://httpbin.org/ip&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;proxies&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;proxies&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 3: Rotate proxies from a list
&lt;/h2&gt;

&lt;p&gt;Now let’s rotate multiple proxies.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;

&lt;span class="n"&gt;proxy_list&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://user:pass@ip1:port&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://user:pass@ip2:port&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://user:pass@ip3:port&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_proxy&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;choice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;proxy_list&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;proxy&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_proxy&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;proxies&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;proxy&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;proxy&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://httpbin.org/ip&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;proxies&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;proxies&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now each request uses a different IP address.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why are residential proxies better for rotation?
&lt;/h2&gt;

&lt;p&gt;Residential proxies are better for rotation because they use real IP addresses assigned by internet service providers, making them harder for websites to detect compared to datacenter proxies.&lt;/p&gt;

&lt;p&gt;Datacenter proxies are fast but easier to block.&lt;/p&gt;

&lt;p&gt;Residential proxies:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Look like real users&lt;/li&gt;
&lt;li&gt;Have higher success rates&lt;/li&gt;
&lt;li&gt;Work better for large-scale data collection&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Many developers evaluating different &lt;a href="https://www.squidproxies.com/residential-proxies" rel="noopener noreferrer"&gt;rotating residential proxies&lt;/a&gt; focus on reliability, IP pool size, and geographic coverage.&lt;/p&gt;

&lt;h2&gt;
  
  
  How do you handle proxy failures in Python?
&lt;/h2&gt;

&lt;p&gt;You handle proxy failures by adding retry logic and switching proxies when a request fails.&lt;/p&gt;

&lt;p&gt;Here’s a simple example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;proxy&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_proxy&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;proxies&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;proxy&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;proxy&lt;/span&gt;
            &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;

        &lt;span class="k"&gt;except&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;continue&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This ensures your script continues working even if some proxies fail.&lt;/p&gt;

&lt;h2&gt;
  
  
  What are best practices for rotating proxies?
&lt;/h2&gt;

&lt;p&gt;Best practices for rotating proxies include adding delays, rotating user agents, and limiting request rates to avoid detection.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Add delays&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;
&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;uniform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;2. Rotate headers (user agents)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Use sessions&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;session&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Session&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;4. Avoid aggressive request rates&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  When You Should Use Proxy Rotation
&lt;/h2&gt;

&lt;p&gt;Use it when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You send many requests&lt;/li&gt;
&lt;li&gt;You need consistent uptime&lt;/li&gt;
&lt;li&gt;You access geo-specific data&lt;/li&gt;
&lt;li&gt;You run automation at scale&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  FAQs
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What is the difference between residential and datacenter proxies?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Residential proxies use real IP addresses from ISPs, while datacenter proxies come from cloud servers. Residential proxies are harder to detect but usually more expensive.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can I rotate proxies without a proxy provider?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Yes, but it’s difficult to maintain a reliable pool of IPs. Most developers use proxy providers for scalability and stability.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How often should proxies rotate?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It depends on your use case. Some rotate every request, while others rotate per session or after a fixed time interval.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;Proxy rotation is essential for developers working with automation tools and data pipelines. Without it, requests will eventually fail due to blocking and rate limits.&lt;/p&gt;

&lt;p&gt;With proper rotation, retry logic, and reliable proxies, your systems become significantly more stable and scalable.&lt;/p&gt;

</description>
      <category>automation</category>
      <category>python</category>
      <category>tutorial</category>
      <category>webscraping</category>
    </item>
    <item>
      <title>Best Rotating Residential Proxy Providers for Web Scraping (2026)</title>
      <dc:creator>Annabelle</dc:creator>
      <pubDate>Tue, 17 Mar 2026 13:30:52 +0000</pubDate>
      <link>https://forem.com/ellebanna/best-rotating-residential-proxy-providers-for-web-scraping-2026-29o1</link>
      <guid>https://forem.com/ellebanna/best-rotating-residential-proxy-providers-for-web-scraping-2026-29o1</guid>
      <description>&lt;p&gt;If you’re building automation tools, data collection systems, or large-scale data pipelines, you’ve probably encountered the biggest limitation in large-scale scraping:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;IP blocking.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When your script sends too many requests from the same IP address, websites quickly flag it as automated traffic.&lt;/p&gt;

&lt;p&gt;Rotating residential proxies solve this problem.&lt;/p&gt;

&lt;p&gt;Instead of using one IP, these networks distribute your requests across many residential IP addresses assigned by internet service providers. Because the traffic appears to come from real users, it’s much harder for websites to detect automation patterns.&lt;/p&gt;

&lt;p&gt;For developers working with scraping frameworks or automation pipelines, this dramatically improves request success rates.&lt;/p&gt;

&lt;p&gt;Below are several &lt;strong&gt;popular rotating residential proxy providers developers use in 2026&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Developers Use Rotating Proxies For
&lt;/h2&gt;

&lt;p&gt;Rotating proxies are commonly used in development workflows that involve large-scale requests or automated data collection.&lt;/p&gt;

&lt;p&gt;Typical use cases include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Scraping product prices from e-commerce sites&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Monitoring search engine results (SERPs)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Ad verification across geographic regions&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Market research and competitive analysis&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Data aggregation for analytics or AI training&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When running these workflows, developers usually integrate proxy rotation directly into their scripts or scraping frameworks.&lt;/p&gt;

&lt;p&gt;If you’re configuring proxies inside Python-based scraping tools, this tutorial on &lt;a href="https://medium.com/@ellebanna/using-proxies-with-python-requests-and-scrapy-b9342e90de65" rel="noopener noreferrer"&gt;using proxies with Python Requests and Scrapy&lt;/a&gt; explains how proxy authentication and rotation work in real scraping environments.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Squid Proxies
&lt;/h2&gt;

&lt;p&gt;Squid Proxies is a long-running proxy provider focused on reliability and simple infrastructure.&lt;/p&gt;

&lt;p&gt;Many development teams prefer providers that prioritize &lt;strong&gt;stable connections and predictable performance&lt;/strong&gt; rather than complicated dashboards.&lt;/p&gt;

&lt;p&gt;Organizations comparing different &lt;a href="https://www.squidproxies.com/residential-proxies?utm_source=devto&amp;amp;utm_campaign=2026+rotating+residential+proxies" rel="noopener noreferrer"&gt;rotating residential proxies&lt;/a&gt; often evaluate reliability, connection success rates, and pricing transparency before choosing a provider.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pros&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Stable proxy network&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Straightforward setup&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Predictable pricing&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Reliable uptime&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For developers running automation systems or scraping pipelines, reliability is often more important than advanced features.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Bright Data
&lt;/h2&gt;

&lt;p&gt;Bright Data operates one of the largest residential proxy networks in the world.&lt;/p&gt;

&lt;p&gt;Its platform provides advanced infrastructure designed for large-scale scraping and data collection.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pros&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Massive IP pool&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Advanced geographic targeting&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Enterprise-grade infrastructure&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Data collection APIs&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Because of its scale, Bright Data is frequently used by companies running large web intelligence pipelines.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Oxylabs
&lt;/h2&gt;

&lt;p&gt;Oxylabs focuses heavily on enterprise data extraction and web intelligence.&lt;/p&gt;

&lt;p&gt;The platform provides residential proxies alongside tools designed for high-volume data collection.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pros&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Global IP coverage&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Reliable performance&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Enterprise support&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Data extraction tools&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Organizations conducting large-scale research or analytics often rely on Oxylabs for its infrastructure.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Smartproxy
&lt;/h2&gt;

&lt;p&gt;Smartproxy is popular among startups and independent developers because it balances performance with accessibility.&lt;/p&gt;

&lt;p&gt;The platform emphasizes ease of use and flexible pricing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pros&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Developer-friendly setup&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Affordable plans&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Good geographic coverage&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Simple dashboard&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Smartproxy is frequently used by smaller teams building scraping tools or automation systems.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. SOAX
&lt;/h2&gt;

&lt;p&gt;SOAX provides a residential proxy network with granular targeting options.&lt;/p&gt;

&lt;p&gt;Developers can filter IP addresses by country, city, ISP, and ASN.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pros&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;City-level targeting&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Clean IP pool&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Flexible rotation settings&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Usage analytics&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For scraping projects that require specific geographic targeting, SOAX offers strong configuration options.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Factors When Choosing Proxies
&lt;/h2&gt;

&lt;p&gt;Developers usually consider several factors when selecting a proxy provider.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;IP pool size&lt;/strong&gt;&lt;br&gt;
A larger pool reduces the risk of IP bans during scraping.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Success rate&lt;/strong&gt;&lt;br&gt;
High connection success rates improve scraping efficiency.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Geographic targeting&lt;/strong&gt;&lt;br&gt;
Some scraping tasks require requests from specific countries or cities.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Performance&lt;/strong&gt;&lt;br&gt;
Slow proxies can significantly slow down automation pipelines.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost efficiency&lt;/strong&gt;&lt;br&gt;
Pricing models vary widely, especially when bandwidth usage increases.&lt;/p&gt;

&lt;p&gt;Testing several providers before scaling a scraping system is usually the safest approach.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;As websites continue improving their anti-bot detection systems, reliable proxy infrastructure has become essential for developers working with web data.&lt;/p&gt;

&lt;p&gt;Rotating residential proxies allow scraping tools and automation systems to distribute requests across multiple real IP addresses, making large-scale data collection far more reliable.&lt;/p&gt;

&lt;p&gt;Choosing the right provider depends on your project requirements, geographic needs, and budget. But for developers building modern data pipelines, rotating proxies remain one of the most important tools in the scraping stack.&lt;/p&gt;

</description>
      <category>webscraping</category>
      <category>python</category>
      <category>devops</category>
      <category>programming</category>
    </item>
    <item>
      <title>How to Scrape Dynamic Websites with Selenium</title>
      <dc:creator>Annabelle</dc:creator>
      <pubDate>Tue, 10 Mar 2026 14:15:42 +0000</pubDate>
      <link>https://forem.com/ellebanna/how-to-scrape-dynamic-websites-with-selenium-4a42</link>
      <guid>https://forem.com/ellebanna/how-to-scrape-dynamic-websites-with-selenium-4a42</guid>
      <description>&lt;p&gt;If you've ever tried collecting data from a modern website and ended up with empty HTML containers instead of real content, you're not alone.&lt;/p&gt;

&lt;p&gt;Many developers run into this issue when working with websites built using frameworks like &lt;strong&gt;&lt;a href="https://react.dev/" rel="noopener noreferrer"&gt;React&lt;/a&gt;, &lt;a href="https://vuejs.org/" rel="noopener noreferrer"&gt;Vue&lt;/a&gt;, or &lt;a href="https://angularjs.org/" rel="noopener noreferrer"&gt;Angular&lt;/a&gt;&lt;/strong&gt;. Instead of delivering fully rendered HTML, these sites load content dynamically using JavaScript after the page loads.&lt;/p&gt;

&lt;p&gt;So when you use a basic HTTP request to fetch the page, the data you're looking for often isn't there yet.&lt;/p&gt;

&lt;p&gt;This is where Selenium becomes extremely useful.&lt;/p&gt;

&lt;p&gt;Selenium allows you to automate a real browser session. That means the page loads exactly as it would for a human visitor, JavaScript included. Once everything renders, you can access the fully populated page and extract the information you need.&lt;/p&gt;

&lt;p&gt;Let’s walk through how this works.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Traditional Scraping Fails on Dynamic Websites
&lt;/h2&gt;

&lt;p&gt;When you fetch a page using a library like &lt;code&gt;requests&lt;/code&gt; in Python, you receive the &lt;strong&gt;initial HTML response from the server&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;However, many modern websites work differently:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The server sends minimal HTML.&lt;/li&gt;
&lt;li&gt;JavaScript runs in the browser.&lt;/li&gt;
&lt;li&gt;JavaScript requests data from APIs.&lt;/li&gt;
&lt;li&gt;The page dynamically inserts the content.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Your script only sees step one.&lt;/p&gt;

&lt;p&gt;This is why you might open a page in your browser and see dozens of products or listings, but your script only finds empty &lt;code&gt;&amp;lt;div&amp;gt;&lt;/code&gt; elements.&lt;/p&gt;

&lt;p&gt;Selenium solves this problem by actually &lt;strong&gt;running the browser and executing the JavaScript&lt;/strong&gt; before extracting data.&lt;/p&gt;

&lt;h2&gt;
  
  
  Installing Selenium
&lt;/h2&gt;

&lt;p&gt;First, install Selenium using pip:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;selenium
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next, download the appropriate browser driver.&lt;/p&gt;

&lt;p&gt;Common options include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;ChromeDriver&lt;/strong&gt; for Google Chrome&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GeckoDriver&lt;/strong&gt; for Firefox&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;EdgeDriver&lt;/strong&gt; for Microsoft Edge&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Make sure the driver version matches your installed browser version.&lt;/p&gt;

&lt;h2&gt;
  
  
  Basic Selenium Example
&lt;/h2&gt;

&lt;p&gt;Here’s a minimal Selenium script using Python:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;selenium&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;webdriver&lt;/span&gt;

&lt;span class="n"&gt;driver&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;webdriver&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Chrome&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;driver&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://example.com&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;driver&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;driver&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;quit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This script:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Launches a Chrome browser&lt;/li&gt;
&lt;li&gt;Opens a webpage&lt;/li&gt;
&lt;li&gt;Prints the page title&lt;/li&gt;
&lt;li&gt;Closes the browser session&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;By the time Selenium retrieves the page content, the browser has already executed any JavaScript needed to render the page.&lt;/p&gt;

&lt;h2&gt;
  
  
  Extracting Elements from the Page
&lt;/h2&gt;

&lt;p&gt;Once the page loads, you can locate elements using Selenium selectors.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;selenium.webdriver.common.by&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;By&lt;/span&gt;

&lt;span class="n"&gt;products&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;driver&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find_elements&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;By&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CSS_SELECTOR&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.product-card&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;product&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;product&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Selenium supports several ways to locate elements:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;By.CSS_SELECTOR&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;By.XPATH&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;By.ID&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;By.CLASS_NAME&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;By.TAG_NAME&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Most developers prefer &lt;strong&gt;CSS selectors&lt;/strong&gt; because they are easier to maintain and usually more readable.&lt;/p&gt;

&lt;h2&gt;
  
  
  Waiting for Dynamic Content
&lt;/h2&gt;

&lt;p&gt;Dynamic pages often load content asynchronously, so the elements you're looking for might not appear immediately.&lt;/p&gt;

&lt;p&gt;Instead of using fixed delays with &lt;code&gt;time.sleep()&lt;/code&gt;, Selenium provides explicit waits.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;selenium.webdriver.support.ui&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;WebDriverWait&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;selenium.webdriver.support&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;expected_conditions&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;EC&lt;/span&gt;

&lt;span class="n"&gt;items&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;WebDriverWait&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;driver&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;until&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;EC&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;presence_of_all_elements_located&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;By&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CLASS_NAME&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;product-card&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This tells Selenium to wait until the elements appear before continuing.&lt;/p&gt;

&lt;p&gt;Explicit waits make automation scripts significantly more reliable.&lt;/p&gt;

&lt;h2&gt;
  
  
  Handling Infinite Scroll Pages
&lt;/h2&gt;

&lt;p&gt;Many websites load additional content when the user scrolls down the page.&lt;/p&gt;

&lt;p&gt;You can simulate this behavior with Selenium by executing JavaScript.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;driver&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute_script&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;window.scrollTo(0, document.body.scrollHeight);&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you're collecting multiple batches of content, you can repeat this action in a loop:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;driver&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute_script&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;window.scrollTo(0, document.body.scrollHeight);&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each scroll triggers the website to load more entries.&lt;/p&gt;

&lt;h2&gt;
  
  
  Running Selenium in Headless Mode
&lt;/h2&gt;

&lt;p&gt;When running automation on servers or cloud environments, you typically don't want a visible browser window.&lt;/p&gt;

&lt;p&gt;Selenium supports &lt;strong&gt;headless mode&lt;/strong&gt;, which runs the browser without a graphical interface.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;selenium.webdriver.chrome.options&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Options&lt;/span&gt;

&lt;span class="n"&gt;options&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Options&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;options&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_argument&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;--headless&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;driver&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;webdriver&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Chrome&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;options&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;options&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Headless mode reduces resource usage and makes automation easier to deploy in backend systems.&lt;/p&gt;

&lt;h2&gt;
  
  
  Avoiding IP Blocks When Scaling
&lt;/h2&gt;

&lt;p&gt;When collecting large amounts of data, repeatedly accessing a website from the same IP address can trigger rate limits or temporary blocks.&lt;/p&gt;

&lt;p&gt;To avoid this, many developers add proxy infrastructure to their automation stack. Developers often integrate providers of &lt;a href="https://www.squidproxies.com/?utm_source=devto&amp;amp;utm_campaign=scrape+websites+selenium" rel="noopener noreferrer"&gt;high-quality residential proxies&lt;/a&gt; like Squid Proxies when running workflows that require stable IP rotation and consistent connections.&lt;/p&gt;

&lt;p&gt;Using proxies alongside Selenium can significantly improve reliability when running larger automation tasks.&lt;/p&gt;

&lt;h2&gt;
  
  
  When Selenium Is the Right Tool
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Selenium works best when:&lt;/li&gt;
&lt;li&gt;Pages rely heavily on JavaScript&lt;/li&gt;
&lt;li&gt;Content loads after user interactions&lt;/li&gt;
&lt;li&gt;Infinite scrolling is used&lt;/li&gt;
&lt;li&gt;Data appears only after the page renders&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For static websites, lightweight HTTP libraries are usually faster. But for modern dynamic applications, Selenium is often the simplest and most reliable solution.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;Dynamic websites are now the standard across much of the web. Because so many platforms rely on JavaScript to render content, traditional request-based methods often fail to retrieve the data you need.&lt;/p&gt;

&lt;p&gt;Selenium solves this problem by automating a real browser environment, allowing developers to render JavaScript-heavy pages and interact with them just like a user would.&lt;/p&gt;

&lt;p&gt;When combined with proxy infrastructure and thoughtful automation design, Selenium becomes a powerful tool for building reliable data collection pipelines and automation workflows.&lt;/p&gt;

</description>
      <category>automation</category>
      <category>javascript</category>
      <category>tutorial</category>
      <category>webscraping</category>
    </item>
  </channel>
</rss>
