I built a web scraping companion tool to instantly make any scrapers scalable and unblockable

paramaw — Mon, 20 Sep 2021 18:05:29 +0000

Over the years of web scraping for many clients, and over billions of pages scraped at DataHen, I realized that we kept on doing the same things over and over again with regards to scalability, unblockability and general problems that web scraping typically face.

So, I built Till, a companion tool that integrates with any scraper in 5 minutes, without much code changes.

It works as a man-in-the-middle proxy, that your scraper can connect to.

All you need to do is connect to Till via the proxy protocol, and Till handles things such as:

User agent generation and randomization
Proxy IP randomization
Cookie management
HTTP Caching
HTTP Request interceptions
Sticky Sessions
Request Logging

When you use Till, you don't need to build many of the repetitive logics required to scale and unblock scrapers, you can simply focus on the main scraping steps/tasks itself.

Let me know of any feedback, or comments etc.
Here is the Github link. Please give it a star, if you find it useful.
And here is the product link

Thanks

Forem: paramaw

I built a web scraping companion tool to instantly make any scrapers scalable and unblockable