<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Antoine Ross</title>
    <description>The latest articles on Forem by Antoine Ross (@antoine_ross_93d7d37905fd).</description>
    <link>https://forem.com/antoine_ross_93d7d37905fd</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3465919%2Fd2e1bf10-16aa-4872-af0f-3999d0e7d20a.png</url>
      <title>Forem: Antoine Ross</title>
      <link>https://forem.com/antoine_ross_93d7d37905fd</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/antoine_ross_93d7d37905fd"/>
    <language>en</language>
    <item>
      <title>Supacrawler: lightweight, and ultra-fast web scraping api</title>
      <dc:creator>Antoine Ross</dc:creator>
      <pubDate>Sun, 14 Sep 2025 19:48:49 +0000</pubDate>
      <link>https://forem.com/antoine_ross_93d7d37905fd/supacrawler-lightweight-and-ultra-fast-web-scraping-api-2den</link>
      <guid>https://forem.com/antoine_ross_93d7d37905fd/supacrawler-lightweight-and-ultra-fast-web-scraping-api-2den</guid>
      <description>&lt;p&gt;Supacrawler is an opensource webscraping api engine written in Go. Out of the box it comes with 3 endpoints: Scrape, Crawl, and Screenshots. &lt;/p&gt;

&lt;p&gt;It's a light wrapper on playwright with Dockerfiles for both local development and for production. It's also ultra-fast because of go concurrency and channels. I have a write-up of the benchmarks in the documentation in &lt;a href="https://docs.supacrawler.com/comparisons/selenium" rel="noopener noreferrer"&gt;Supacrawler benchmarks&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;Going through the endpoints, we have the following:&lt;/p&gt;

&lt;p&gt;Scrape: This endpoint allows you to scrape the web using headless browsers and receive the output automatically cleaned in markdown. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpvlwxf7xadlwfhn525nm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpvlwxf7xadlwfhn525nm.png" alt="Scrape Dashboard" width="800" height="561"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Crawl: This endpoint allows you to, with a headless browser, systematically crawl an entire website and receive it back in both markdown/html format. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8mm1pwhw3z9tpa9lt9lt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8mm1pwhw3z9tpa9lt9lt.png" alt="Crawl Dashboard" width="800" height="561"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Screnshots: This endpoint is for rendering javascript pages, rendering full page screenshots, mobile screenshots all through an api endpoint. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnds8y7da0x9bt71hz7wv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnds8y7da0x9bt71hz7wv.png" alt="Screenshots Dashboard" width="800" height="561"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Watch (app exclusive): This endpoint is for watching/monitoring changes within the contents of a website. You can run a job that uses a cron job and then sends you an email notification if anything changes. Works like a charm!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8pxilnluwr3w4ji2o5xv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8pxilnluwr3w4ji2o5xv.png" alt="Watch changes email notification" width="800" height="662"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The best part about Supacrawler is that it works out of the box with just a few lines of code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;curl -O https://raw.githubusercontent.com/supacrawler/supacrawler/main/docker-compose.yml
docker compose up
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I'm always keen to know more about how people will use tools like this. Let me know if you find this useful or if you have any questions!&lt;/p&gt;

&lt;p&gt;If you're interested in seeing more you can visit the following:&lt;br&gt;
&lt;a href="https://supacrawler.com" rel="noopener noreferrer"&gt;Website&lt;/a&gt;&lt;br&gt;
&lt;a href="https://github.com/supacrawler/supacrawler" rel="noopener noreferrer"&gt;Github&lt;/a&gt;&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>webdev</category>
      <category>programming</category>
      <category>python</category>
    </item>
  </channel>
</rss>
