<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Goscrapy</title>
    <description>The latest articles on Forem by Goscrapy (@goscrapy).</description>
    <link>https://forem.com/goscrapy</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3880027%2F3e0c8ca1-b308-47ae-bffe-429c8d83b776.png</url>
      <title>Forem: Goscrapy</title>
      <link>https://forem.com/goscrapy</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/goscrapy"/>
    <language>en</language>
    <item>
      <title>Efficiently Extracting Business Data from Google Maps</title>
      <dc:creator>Goscrapy</dc:creator>
      <pubDate>Fri, 17 Apr 2026 16:25:14 +0000</pubDate>
      <link>https://forem.com/goscrapy/efficiently-extracting-business-data-from-google-maps-1epj</link>
      <guid>https://forem.com/goscrapy/efficiently-extracting-business-data-from-google-maps-1epj</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffyni5z3q7t0destx185w.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffyni5z3q7t0destx185w.gif" alt="Google maps scraping" width="826" height="248"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Hi everyone,&lt;/p&gt;

&lt;p&gt;Extracting reliable business data from Google Maps is a common requirement for market research and local lead generation. However, the technical implementation is a bit difficult not because of anti-bot measures, but because of how Google structures its responses. &lt;/p&gt;

&lt;p&gt;Instead of clean HTML or standard JSON, Google serves data in massive, obfuscated arrays that change based on your geographic center. While many developers default to heavy headless browsers to solve this, there is a much more efficient, "browserless" approach that relies on understanding the underlying API logic. &lt;/p&gt;

&lt;p&gt;In this guide, we'll walk through how to solve these using &lt;strong&gt;&lt;a href="https://github.com/tech-engine/goscrapy" rel="noopener noreferrer"&gt;GoScrapy&lt;/a&gt;&lt;/strong&gt; in a step by step manner.&lt;/p&gt;

&lt;p&gt;Let's get started.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 &lt;strong&gt;Tip:&lt;/strong&gt; For this you will need to setup goscrapy. If you haven't checked out our &lt;a href="https://dev.to/goscrapy/scrapy-but-in-go-building-high-performance-scrapers-without-the-boilerplate-49k3"&gt;last article&lt;/a&gt; on how to do it, you should—it's very simple and will get you up and running in minutes.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Step 1: Defining the Data Blueprint
&lt;/h2&gt;

&lt;p&gt;The first step in to define a model for the scraped data. Google Maps provides a wealth of information, but without a clear schema, your extraction logic will quickly become unmanageable. In &lt;strong&gt;&lt;a href="https://github.com/tech-engine/goscrapy" rel="noopener noreferrer"&gt;GoScrapy&lt;/a&gt;&lt;/strong&gt;, data is modeled using a &lt;strong&gt;record&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Benefit:&lt;/strong&gt; &lt;br&gt;
Defining a typed structure upfront ensures that your data collection is consistent and that your downstream processing (like CSV or Database exports) remains stable even when dealing with varied business listings.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="c"&gt;// google_maps_scraper/record.go&lt;/span&gt;

&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;Record&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;Query&lt;/span&gt;              &lt;span class="kt"&gt;string&lt;/span&gt;           &lt;span class="s"&gt;`json:"query" csv:"query"`&lt;/span&gt;
    &lt;span class="n"&gt;QueryLoc&lt;/span&gt;           &lt;span class="n"&gt;location&lt;/span&gt;         &lt;span class="s"&gt;`json:"-" csv:"-"`&lt;/span&gt;
    &lt;span class="n"&gt;Status&lt;/span&gt;             &lt;span class="kt"&gt;string&lt;/span&gt;           &lt;span class="s"&gt;`json:"status" csv:"status"`&lt;/span&gt;
    &lt;span class="n"&gt;OpenHours&lt;/span&gt;          &lt;span class="n"&gt;OpeningHours&lt;/span&gt;     &lt;span class="s"&gt;`json:"opening_hours" csv:"-"`&lt;/span&gt;
    &lt;span class="n"&gt;OpenHoursStr&lt;/span&gt;       &lt;span class="kt"&gt;string&lt;/span&gt;           &lt;span class="s"&gt;`json:"opening_hours_str" csv:"opening_hours_str"`&lt;/span&gt;
    &lt;span class="n"&gt;Phone&lt;/span&gt;              &lt;span class="kt"&gt;string&lt;/span&gt;           &lt;span class="s"&gt;`json:"phone" csv:"phone"`&lt;/span&gt;
    &lt;span class="n"&gt;WebResultsUrl&lt;/span&gt;      &lt;span class="kt"&gt;string&lt;/span&gt;           &lt;span class="s"&gt;`json:"webresults_url" csv:"webresults_url"`&lt;/span&gt;
    &lt;span class="n"&gt;ShortDescription&lt;/span&gt;   &lt;span class="kt"&gt;string&lt;/span&gt;           &lt;span class="s"&gt;`json:"short_description" csv:"short_description"`&lt;/span&gt;
    &lt;span class="n"&gt;Description&lt;/span&gt;        &lt;span class="kt"&gt;string&lt;/span&gt;           &lt;span class="s"&gt;`json:"description" csv:"description"`&lt;/span&gt;
    &lt;span class="n"&gt;TimeZone&lt;/span&gt;           &lt;span class="kt"&gt;string&lt;/span&gt;           &lt;span class="s"&gt;`json:"timezone" csv:"timezone"`&lt;/span&gt;
    &lt;span class="n"&gt;Categories&lt;/span&gt;         &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;         &lt;span class="s"&gt;`json:"category" csv:"category"`&lt;/span&gt;
    &lt;span class="n"&gt;Title&lt;/span&gt;              &lt;span class="kt"&gt;string&lt;/span&gt;           &lt;span class="s"&gt;`json:"title" csv:"title"`&lt;/span&gt;
    &lt;span class="n"&gt;Website&lt;/span&gt;            &lt;span class="kt"&gt;string&lt;/span&gt;           &lt;span class="s"&gt;`json:"website" csv:"website"`&lt;/span&gt;
    &lt;span class="n"&gt;Review&lt;/span&gt;             &lt;span class="kt"&gt;float64&lt;/span&gt;          &lt;span class="s"&gt;`json:"review" csv:"review"`&lt;/span&gt;
    &lt;span class="n"&gt;ReviewDistribution&lt;/span&gt; &lt;span class="n"&gt;string2Uint64Map&lt;/span&gt; &lt;span class="s"&gt;`json:"review_distribution" csv:"review_distribution"`&lt;/span&gt;
    &lt;span class="n"&gt;ReviewsUrl&lt;/span&gt;         &lt;span class="kt"&gt;string&lt;/span&gt;           &lt;span class="s"&gt;`json:"reviews_url" csv:"reviews_url"`&lt;/span&gt;
    &lt;span class="n"&gt;ReviewsCount&lt;/span&gt;       &lt;span class="kt"&gt;uint64&lt;/span&gt;           &lt;span class="s"&gt;`json:"reviews_count" csv:"reviews_count"`&lt;/span&gt;
    &lt;span class="n"&gt;Street&lt;/span&gt;             &lt;span class="kt"&gt;string&lt;/span&gt;           &lt;span class="s"&gt;`json:"street" csv:"street"`&lt;/span&gt;
    &lt;span class="n"&gt;City&lt;/span&gt;               &lt;span class="kt"&gt;string&lt;/span&gt;           &lt;span class="s"&gt;`json:"city" csv:"city"`&lt;/span&gt;
    &lt;span class="n"&gt;ZipCode&lt;/span&gt;            &lt;span class="kt"&gt;string&lt;/span&gt;           &lt;span class="s"&gt;`json:"zipcode" csv:"zipcode"`&lt;/span&gt;
    &lt;span class="n"&gt;State&lt;/span&gt;              &lt;span class="kt"&gt;string&lt;/span&gt;           &lt;span class="s"&gt;`json:"state" csv:"state"`&lt;/span&gt;
    &lt;span class="n"&gt;StateCode&lt;/span&gt;          &lt;span class="kt"&gt;string&lt;/span&gt;           &lt;span class="s"&gt;`json:"state_code" csv:"state_code"`&lt;/span&gt;
    &lt;span class="n"&gt;Country&lt;/span&gt;            &lt;span class="kt"&gt;string&lt;/span&gt;           &lt;span class="s"&gt;`json:"country" csv:"country"`&lt;/span&gt;
    &lt;span class="n"&gt;CountryCode&lt;/span&gt;        &lt;span class="kt"&gt;string&lt;/span&gt;           &lt;span class="s"&gt;`json:"country_code" csv:"country_code"`&lt;/span&gt;
    &lt;span class="n"&gt;Details&lt;/span&gt;            &lt;span class="n"&gt;Details&lt;/span&gt;          &lt;span class="s"&gt;`json:"details" csv:"details"`&lt;/span&gt;
    &lt;span class="n"&gt;ReservationUrls&lt;/span&gt;    &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;         &lt;span class="s"&gt;`json:"reservation_urls" csv:"reservation_urls"`&lt;/span&gt;
    &lt;span class="n"&gt;OrderUrls&lt;/span&gt;          &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;         &lt;span class="s"&gt;`json:"order_urls" csv:"order_urls"`&lt;/span&gt;
    &lt;span class="n"&gt;OrderPlatforms&lt;/span&gt;     &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;         &lt;span class="s"&gt;`json:"order_platforms" csv:"order_platforms"`&lt;/span&gt;
    &lt;span class="n"&gt;GoogleReserveUrl&lt;/span&gt;   &lt;span class="kt"&gt;string&lt;/span&gt;           &lt;span class="s"&gt;`json:"google_reserve_url" csv:"google_reserve_url"`&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  Step 2: Overcoming the "Spatial Anchor" Requirement
&lt;/h3&gt;

&lt;p&gt;Most developers start by searching for a keyword, only to find the results are inconsistent. This is because the Search API requires a spatial "anchor" (Latitude and Longitude) to provide high-density results.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Technique:&lt;/strong&gt;&lt;br&gt;
You can resolve any text query (e.g., "Gyms in Austin, TX") by first hitting a &lt;strong&gt;geocoding&lt;/strong&gt; endpoint. Once you have the coordinates, you can center your search exactly where the density is highest, ensuring you don't miss any data points. As you can see, for jobs, where don't have lat/lng in advanced, we try to grab the location for the query using the &lt;strong&gt;parseGeoding&lt;/strong&gt; step.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="c"&gt;// google_maps_scraper/spider.go&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;Spider&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;StartRequest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;job&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;Job&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;loc&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="c"&gt;// Resolve coordinates to ensure a high-density search area&lt;/span&gt;
        &lt;span class="n"&gt;req&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;prepareRequest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;generateGeocodingUrl&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parseGeocoding&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c"&gt;// If we have coordinates, the search results will be far more accurate&lt;/span&gt;
    &lt;span class="n"&gt;req&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;prepareRequest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;generateSearchUrl&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parseMapListing&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;


&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;Spider&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;parseGeocoding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;resp&lt;/span&gt; &lt;span class="n"&gt;core&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;IResponseReader&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ok&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;getJob&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;ok&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;lat&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lng&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ok&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;extractGeocoding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Bytes&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;ok&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;setLocation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lat&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lng&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Logger&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Infof&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Location found for %s: %f, %f"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lat&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lng&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="c"&gt;// check util.go for details on prepareRequest&lt;/span&gt;
    &lt;span class="n"&gt;req&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;prepareRequest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;generateSearchUrl&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"https://www.google.com"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parseMapListing&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  Step 3: Handling Large Datasets via Cursor Offsets
&lt;/h3&gt;

&lt;p&gt;Google Maps doesn't use simple "Next Page" links or page numbers. Instead, it relies on a cursor-based offset system embedded within a &lt;code&gt;pb&lt;/code&gt; query parameter.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Technique:&lt;/strong&gt;&lt;br&gt;
To crawl thousands of results, you need to track this offset state across your requests. Using GoScrapy's &lt;code&gt;Meta&lt;/code&gt; context we pass this "tracking job" between asynchronous workers without the risk of data races or state corruption that global variables would introduce.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="c"&gt;// google_maps_scraper/spider.go&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;Spider&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;parseMapListing&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;resp&lt;/span&gt; &lt;span class="n"&gt;core&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;IResponseReader&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ok&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;getJob&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;ok&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;StatusCode&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="m"&gt;200&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="c"&gt;// Here we basically try to parse the data surgically&lt;/span&gt;
    &lt;span class="n"&gt;records&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;extractMapResults&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Bytes&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Logger&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Infof&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Found %d records for [%s] (At cursor %d)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;records&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;records&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c"&gt;// update cursor for the next page&lt;/span&gt;
    &lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;SetCursor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cursor&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="m"&gt;20&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;record&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="k"&gt;range&lt;/span&gt; &lt;span class="n"&gt;records&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;

                &lt;span class="c"&gt;//  this is where we export the record to be saved by csv/json pipeline, which ever we have configured.&lt;/span&gt;
        &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Yield&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;req&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;prepareRequest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;generateSearchUrl&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"https://www.google.com"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parseMapListing&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  Step 4: "Surgical" Extraction from Deeply Nested Arrays
&lt;/h3&gt;

&lt;p&gt;Google's internal JSON is not served in standard "key-value" pairs. It is a deeply nested array structure designed for internal engine processing, not external consumption.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Benefit:&lt;/strong&gt;&lt;br&gt;
Instead of trying to map the entire massive response into memory, you can use &lt;strong&gt;GJSON&lt;/strong&gt; to perform "surgical" extractions. By reaching directly into specific array indices (like index &lt;code&gt;11&lt;/code&gt; for Title).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="c"&gt;// google_maps_scraper/utils.go&lt;/span&gt;
&lt;span class="c"&gt;// check repo examples for full code.&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;extractMapResults&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="kt"&gt;byte&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="n"&gt;Record&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c"&gt;// After removing the non JSON prefix, we dive directly into the results array&lt;/span&gt;
    &lt;span class="n"&gt;records&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;gjson&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GetBytes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"0.1.#.14"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="c"&gt;// This approach allows for extracting deep values with zero-allocation speed&lt;/span&gt;
   &lt;span class="c"&gt;// Refer to full code in repo attached. Pasting it here would create clutter&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  Conclusion: Performance in Production
&lt;/h3&gt;

&lt;p&gt;By moving away from browser engines and toward a raw HTTP approach, you can achieve significantly higher throughput with minimal resources. &lt;/p&gt;

&lt;p&gt;A GoScrapy driven scraper compiles into a &lt;strong&gt;standalone 13MB binary&lt;/strong&gt;, making it incredibly efficient to deploy in cloud environments compared to multi-hundred-megabyte headless browser images. This efficiency translates directly into lower infrastructure costs and faster data turnaround.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If you’re looking to scale your data extraction projects, you can find the full source code for this example and more in the &lt;a href="https://github.com/tech-engine/goscrapy/tree/main/_examples/google_maps_scraper" rel="noopener noreferrer"&gt;Google map scraper&lt;/a&gt;.&lt;/strong&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Note: This scraper was created using **goscrapy&lt;/em&gt;* and for educational purposes only to showcase the capabilities of goscrapy and I am not liable for any misuse of this scraper.*&lt;/p&gt;

</description>
      <category>data</category>
      <category>tutorial</category>
      <category>webscraping</category>
      <category>googlemaps</category>
    </item>
    <item>
      <title>Scrapy 🕷️, but in Go: Building High-Performance Scrapers without the Boilerplate</title>
      <dc:creator>Goscrapy</dc:creator>
      <pubDate>Wed, 15 Apr 2026 16:28:56 +0000</pubDate>
      <link>https://forem.com/goscrapy/scrapy-but-in-go-building-high-performance-scrapers-without-the-boilerplate-49k3</link>
      <guid>https://forem.com/goscrapy/scrapy-but-in-go-building-high-performance-scrapers-without-the-boilerplate-49k3</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsf9gdmzxhcoou8gcrt4g.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsf9gdmzxhcoou8gcrt4g.gif" alt="GoScrapy: Harnessing Go's perfomance for blazingly fast web scraping, inspired by Python's Scrapy framework." width="880" height="433"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Hi everyone 👋&lt;/p&gt;

&lt;p&gt;Web scraping can start out pretty basic. You just loop through some pages, grab the HTML, pull out what you need, and store it somewhere. But then when you try to make it bigger, like dealing with a ton of requests or figuring out retries and cookies for different sites, it turns into a real hassle quick.&lt;/p&gt;

&lt;p&gt;I remember using &lt;strong&gt;Scrapy&lt;/strong&gt; in Python, and it handled all that stuff without me having to think too hard about the details. It felt structured, you know. When I switched over to Go, there are good tools for the basics, but I missed that kind of ready-made setup. So thats why I ended up making &lt;strong&gt;GoScrapy&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is GoScrapy? (A quick intro)
&lt;/h2&gt;

&lt;p&gt;GoScrapy is basically this &lt;strong&gt;framework that tries to mimic the experience of Scrapy but in Go&lt;/strong&gt;. It is not just for pulling HTML, it manages the whole process from start to finish for extracting data. And it uses Go's built in concurrency, so you &lt;strong&gt;get fast performance without messing around with goroutines yourself&lt;/strong&gt; all the time. I think that part is what makes it stand out, especially if you are coming from other languages.&lt;/p&gt;

&lt;h2&gt;
  
  
  To get going with it
&lt;/h2&gt;

&lt;p&gt;There is a CLI tool that feels similar to Scrapy's. You install it with go install and it needs Go 1.22 or later.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;go &lt;span class="nb"&gt;install &lt;/span&gt;github.com/tech-engine/goscrapy/cmd/...@latest
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then verify the installation with &lt;code&gt;gos -v&lt;/code&gt;. &lt;/p&gt;

&lt;p&gt;Starting a project is simple too&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gos startproject my_scraper
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It sets up everything for you, like the module and files.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;gos startproject books_to_scrape

🚀 GoScrapy generating project files. Please wait!

📦 Initializing Go module: books_to_scrape...
go: creating new go.mod: module books_to_scrape
go: to add module requirements and sums:
        go mod tidy
✔️  books_to_scrape\base.go
✔️  books_to_scrape\constants.go
✔️  books_to_scrape\errors.go
✔️  books_to_scrape\job.go
✔️  main.go
✔️  books_to_scrape\record.go
✔️  books_to_scrape\settings.go
✔️  books_to_scrape\spider.go

📦 Do you want to resolve dependencies now (go mod tidy)? [Y/n]: y
📦 Resolving dependencies...
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The output shows it's scaffolding the project will all the necessary file, so you don't have to manually. It even asks if you want to run go mod tidy right then. Pretty handy, I guess, though sometimes it takes a second to finish.&lt;/p&gt;

&lt;h2&gt;
  
  
  Project Structure
&lt;/h2&gt;

&lt;p&gt;Once you have the project, the structure looks like this for something like books_to_scrape&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;books_to_scrape/
├── main.go               # The starting point
└── books_to_scrape/
    ├── base.go           # Initializing the engine, middlewares etc
    ├── constants.go      # Shared stuff
    ├── errors.go         # Custom errors, if you need any
    ├── job.go            # Parameters for your spider
    ├── record.go         # The structure of data you scrape
    ├── settings.go       # Configs like pipelines
    └── spider.go         # Where the actual extraction happens
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;main.go&lt;/code&gt; sets up the context, instantiate the spider, starts the request, and waits for it to finish. It keeps things straightforward.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="c"&gt;// main.go&lt;/span&gt;

&lt;span class="c"&gt;// this template will be autogenerate as well like all the files&lt;/span&gt;
&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; 
    &lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Background&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; 
    &lt;span class="n"&gt;spider&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;books_to_scrape&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;New&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
    &lt;span class="c"&gt;// start the first request&lt;/span&gt;
    &lt;span class="n"&gt;spider&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;StartRequest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 

    &lt;span class="c"&gt;// wait for the spider to finish, true mean, the spider would&lt;/span&gt;
    &lt;span class="c"&gt;// exit as soon as it's done. But if you are running goscrapy&lt;/span&gt;
    &lt;span class="c"&gt;// as a server, you can set it to false(default), and it will keep running, accepts more jobs via the StartRequest() method.&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;spider&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Wait&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;true&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; 
        &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Fatal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
    &lt;span class="p"&gt;}&lt;/span&gt; 
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Writing Spiders
&lt;/h2&gt;

&lt;p&gt;In &lt;code&gt;spider.go&lt;/code&gt;, thats where you write how to handle requests and parse responses. There are two main things you use all the time: &lt;code&gt;s.Request&lt;/code&gt; and &lt;code&gt;s.Parse&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;s.Request(ctx)&lt;/code&gt; is what you call each time for a new fetch. It gives you a request object where you can chain things like &lt;code&gt;Url&lt;/code&gt;, &lt;code&gt;Meta&lt;/code&gt;, or &lt;code&gt;Headers&lt;/code&gt;. Each request has to be its own thing don't try to reuse them for different pages/requests.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;s.Parse(req, callback)&lt;/code&gt; is what actually sends the request off to the engine. You also tell it which function should handle the response once it comes back. Usually, you start with a &lt;code&gt;StartRequest&lt;/code&gt; method that kicks things off.&lt;/p&gt;

&lt;p&gt;For example, with &lt;code&gt;books.toscrape.com&lt;/code&gt;, in the parse function you grab product links, make a full URL, and parse each one with &lt;code&gt;parseProduct&lt;/code&gt;. Then check for the next page, and if there is one, follow it back to parse.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="c"&gt;// spider.go&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;Spider&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;resp&lt;/span&gt; &lt;span class="n"&gt;core&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;IResponseReader&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c"&gt;// grab product links&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;productUrl&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="k"&gt;range&lt;/span&gt; &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Css&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"article.product_pod h3 a"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Attr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"href"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;req&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Url&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Sprintf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"%s/%s"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;baseUrl&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;productUrl&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parseProduct&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c"&gt;// check for next page&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;next&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Css&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"li.next a"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Attr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"href"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;next&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;req&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Url&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Sprintf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"%s/%s"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;baseUrl&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;next&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]))&lt;/span&gt;
        &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;Spider&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;parseProduct&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;resp&lt;/span&gt; &lt;span class="n"&gt;core&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;IResponseReader&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;product&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Css&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"article.product_page"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c"&gt;// yield a Record&lt;/span&gt;
    &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Yield&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;Record&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;Title&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;product&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Css&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;".product_main h1"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Text&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;Price&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;product&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Css&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;".price_color"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Text&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;Stock&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;product&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Css&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;".availability"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Text&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It handles pagination kind of naturally, but you have to be careful with the base URL.&lt;/p&gt;

&lt;h2&gt;
  
  
  Middlewares and Pipelines
&lt;/h2&gt;

&lt;p&gt;The framework has middlewares and pipelines like Scrapy. Middlewares for things like retry with backoff or dupe filter are set in &lt;code&gt;settings.go&lt;/code&gt; as a slice:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;
&lt;span class="c"&gt;// settings.go&lt;/span&gt;
&lt;span class="k"&gt;var&lt;/span&gt; &lt;span class="n"&gt;MIDDLEWARES&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="n"&gt;middlewaremanager&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Middleware&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;middlewares&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Retry&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; 
    &lt;span class="n"&gt;middlewares&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DupeFilter&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Pipelines take the yielded records and export them, say to CSV. Every yield goes right there. It seems efficient for output, though I am not totally sure how it scales with huge datasets yet.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;var&lt;/span&gt; &lt;span class="n"&gt;export2CSV&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pipelines&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Export2CSV&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;Record&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="n"&gt;pipelines&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Export2CSVOpts&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt; 
    &lt;span class="n"&gt;Filename&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"output.csv"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  TUI Dashboard
&lt;/h2&gt;

&lt;p&gt;There is also this TUI dashboard for watching progress in the terminal. You can add it in &lt;code&gt;base.go&lt;/code&gt; by making a &lt;code&gt;tui.New(app.Logger())&lt;/code&gt;, a telemetry hub, and starting it in a goroutine.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="c"&gt;//base.go&lt;/span&gt;

&lt;span class="c"&gt;// Tweak base.go to wire up the TUI&lt;/span&gt;
&lt;span class="n"&gt;dashboard&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;tui&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;New&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Logger&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;span class="c"&gt;// create a telemetry hub&lt;/span&gt;
&lt;span class="n"&gt;hub&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;ts&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewTelemetryHub&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="c"&gt;// add the dashboard as an observer&lt;/span&gt;
&lt;span class="n"&gt;hub&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;AddObserver&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dashboard&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c"&gt;// set the telemetry hub to the app&lt;/span&gt;
&lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WithTelemetry&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;hub&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;go&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;gos&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;StartWithTUI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dashboard&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It shows stats visually, which is nice if you are fond of that.&lt;/p&gt;

&lt;h2&gt;
  
  
  Anyway, thats the gist of it
&lt;/h2&gt;

&lt;p&gt;GoScrapy is still in early versions, v0.x, and under development. If you want to check it out or help, feel free.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GitHub&lt;/strong&gt;: &lt;a href="https://github.com/tech-engine/goscrapy" rel="noopener noreferrer"&gt;github.com/tech-engine/goscrapy&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Discord&lt;/strong&gt;: &lt;a href="https://discord.gg/FPvxETjYPH" rel="noopener noreferrer"&gt;Join our community&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I think it could be useful for Go folks doing scraping, but it might need more tweaks.&lt;/p&gt;

&lt;p&gt;Thank you for reading this far 💚&lt;/p&gt;

</description>
      <category>webscraping</category>
      <category>go</category>
      <category>goscrapy</category>
    </item>
  </channel>
</rss>
