<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Adam Tauber</title>
    <description>The latest articles on Forem by Adam Tauber (@asciimoo).</description>
    <link>https://forem.com/asciimoo</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F41076%2Fe2223504-77f9-4b0f-bb2f-8d224f38c44f.jpeg</url>
      <title>Forem: Adam Tauber</title>
      <link>https://forem.com/asciimoo</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/asciimoo"/>
    <language>en</language>
    <item>
      <title>How I Cut My Google Search Dependence in Half</title>
      <dc:creator>Adam Tauber</dc:creator>
      <pubDate>Thu, 02 Apr 2026 15:05:43 +0000</pubDate>
      <link>https://forem.com/hister/how-i-cut-my-google-search-dependence-in-half-4mi1</link>
      <guid>https://forem.com/hister/how-i-cut-my-google-search-dependence-in-half-4mi1</guid>
      <description>&lt;p&gt;&lt;strong&gt;TL;DR:&lt;/strong&gt; I built &lt;a href="https://github.com/asciimoo/hister" rel="noopener noreferrer"&gt;Hister&lt;/a&gt;, a self-hosted web history search tool that indexes visited pages locally. In just 1.5 months, I reduced my reliance on Google Search by 50%.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem: Online Search Isn't What It Used to Be
&lt;/h2&gt;

&lt;p&gt;Like many developers and knowledge workers, I found myself constantly reaching for Google Search throughout my workday. It had become such an ingrained habit that I barely noticed how often I was context-switching away from my actual work to perform searches. But over time, something had changed about the experience. The search results that once felt reliable and helpful were increasingly problematic in several ways.&lt;/p&gt;

&lt;h3&gt;
  
  
  Too Many Advertisements
&lt;/h3&gt;

&lt;p&gt;What used to be a clean list of relevant links now requires scrolling past multiple sponsored results, shopping suggestions, and promoted content just to reach the organic results. Often, the actual information I'm looking for doesn't appear until halfway down the page, after I've mentally filtered out all the commercial noise.&lt;/p&gt;

&lt;h3&gt;
  
  
  Manipulative SEO Tactics
&lt;/h3&gt;

&lt;p&gt;Organic results themselves have been manipulated by SEO tactics rather than truly reflecting the most relevant and helpful content. Websites optimized for search engines rather than humans dominate the rankings, while genuinely useful resources from smaller sites or personal blogs get buried on page two or three. The signal-to-noise ratio has degraded significantly.&lt;/p&gt;

&lt;h3&gt;
  
  
  AI Suggestions
&lt;/h3&gt;

&lt;p&gt;Google has recently added AI-generated summaries at the top of many search results. While sometimes helpful, these summaries often miss crucial nuance, provide oversimplified or occasionally incorrect information, and add yet another layer between me and the actual source material I'm trying to find. For technical queries where precision matters, these AI answers can be misleading or incomplete.&lt;/p&gt;

&lt;h3&gt;
  
  
  Lack of Privacy
&lt;/h3&gt;

&lt;p&gt;Google tracks every query I make, building a detailed profile of my interests, work patterns, and information needs. This data is used for ad targeting and who knows what else. The convenience of search comes at the cost of giving away intimate details about my work and life.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Insight
&lt;/h2&gt;

&lt;p&gt;But the realization that pushed me to build a solution was that I was often searching for pages &lt;strong&gt;I'd already visited&lt;/strong&gt;. That documentation page I read last week but forgot to bookmark. That GitHub issue I commented on yesterday but couldn't remember the exact project name. Those internal wiki pages with crucial information about our infrastructure. I was using Google as a personal memory aid, outsourcing my recall to an external service that was tracking my every query. And for content behind authentication (internal tools, documentation, private repositories) Google couldn't help at all, since it can't index pages it can't access.&lt;/p&gt;

&lt;h3&gt;
  
  
  Two Types of Search
&lt;/h3&gt;

&lt;p&gt;Thinking on how to replace Google led me to a crucial realization about the nature of search itself. When we type queries into a search box, we're actually doing one of two fundamentally different things, even though the interface is identical:&lt;/p&gt;

&lt;h4&gt;
  
  
  Discovery Search: Finding New Information
&lt;/h4&gt;

&lt;p&gt;Discovery search is what we typically think of when we imagine "searching the internet". It's about finding information we've never encountered before. This is true exploration, we're venturing into unknown territory, discovering new resources, learning about topics we're unfamiliar with, and finding answers to questions we've never asked before. For this type of search, we genuinely need the vast index of the internet that services like Google provide. We need to cast a wide net and see what the collective knowledge of the web has to offer.&lt;/p&gt;

&lt;h4&gt;
  
  
  Recall Search: Refinding Known Information
&lt;/h4&gt;

&lt;p&gt;But then there's the other type of search what I call "recall search". This is when we're trying to find information we've already encountered. We're not discovering something new; we're trying to remember where we saw something. Examples of this include searches like "That authentication bug I fixed last month..." when you remember solving a problem but can't recall the exact solution, or "The Bleve docs page about result highlighters..." when you know you've read the documentation before but can't remember the specific URL or section title. Another common example: "That Stack Overflow answer about async/await..". when you remember reading a particularly clear explanation but didn't save the link.&lt;/p&gt;

&lt;p&gt;A significant portion of my daily searches — probably more than half — were recall searches, not discovery searches.&lt;/p&gt;

&lt;p&gt;The revelation that changed everything for me was this: A significant portion of my daily searches - probably more than half - were recall searches, not discovery searches. I was constantly using Google to search my own browsing history, to refind pages I'd already visited and information I'd already read. But Google's interface treats both types of search identically, and it has no special optimization for helping you refind your own content. Worse, for pages behind authentication or on private networks, Google can't help at all because it can't index content it can't access.&lt;/p&gt;

&lt;p&gt;This insight suggested a solution: What if I had a dedicated tool optimized specifically for recall search for refinding my own browsing history, and only fall back to Google for true discovery search?&lt;/p&gt;

&lt;p&gt;The potential benefits were enormous:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;faster results&lt;/li&gt;
&lt;li&gt;better privacy&lt;/li&gt;
&lt;li&gt;access to authenticated content&lt;/li&gt;
&lt;li&gt;results tailored specifically to my interests and work&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Solution: Index Everything Locally
&lt;/h2&gt;

&lt;p&gt;The solution seemed obvious once I'd articulated the problem: what if I could search my entire browsing history - including the full page content, not just URLs and titles - locally and privately? This would give me a personal search engine optimized specifically for recall search, while still allowing me to fall back to Google for discovery search when needed.&lt;/p&gt;

&lt;p&gt;I started looking for existing solutions. Surely someone had built this before? Browser history exists, but it only stores URLs and page titles, making it nearly useless for finding pages based on their content. Some note-taking apps like Evernote or Notion offer web clippers, but they require manual action for each page you want to save. Personal knowledge management tools like &lt;a href="https://github.com/asciimoo/omnom" rel="noopener noreferrer"&gt;Omnom&lt;/a&gt; exist, but they're focused on curated notes rather than comprehensive browsing history, but they require conscious decisions about what to save.&lt;/p&gt;

&lt;p&gt;None of the existing tools I found met all my requirements. I needed something that combined the comprehensive automatic capture of browser history, the full-text search capabilities of a search engine, the performance of local software, and the privacy of self-hosted solutions. Since nothing existed that checked all these boxes, I decided to build it myself.&lt;/p&gt;

&lt;h3&gt;
  
  
  What I Needed
&lt;/h3&gt;

&lt;p&gt;The requirements for my ideal solution were clear:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fast lookup&lt;/strong&gt; If searching my local index took longer than just Googling, I'd never use it. I needed instant, sub-second search response times, keyboard shortcuts to make it faster to search locally than to context-switch to Google.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Automatic indexing&lt;/strong&gt; I didn't want to manually save pages or make conscious decisions about what to index. It needed to capture pages as I browse with zero manual work on my part. The tool should disappear into the background and just work.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Authentication aware indexing&lt;/strong&gt; So much of the content I reference daily is behind authentication: internal wikis, private documents, authenticated API documentation, internal dashboards. Any solution that couldn't handle authenticated content would miss a huge portion of my actual browsing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Full-text search&lt;/strong&gt; Meant searching the actual page content, not just URLs and titles. Browser history is useless when you remember reading something about "microservice authentication patterns" but can't remember which blog or doc site it was on. I needed to be able to search the words within the pages.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Powerful query capabilities&lt;/strong&gt; Like Boolean operators (AND, OR, NOT), field-specific searches (search only URLs, or only titles), and wildcard matching would make it possible to narrow down results quickly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Zero cognitive overhead&lt;/strong&gt; The tool needed to work seamlessly in my workflow. It should integrate naturally with how I already browse and search.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Transparent fallback to online search engines&lt;/strong&gt; If I searched locally and didn't find what I wanted, I should be able to immediately fall back to Google with the same query, making adoption gradual rather than requiring a complete workflow change.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fine-tuning capabilities&lt;/strong&gt; Let me customize the experience over time. I wanted to be able to blacklist irrelevant sites I never want to see again, prioritize important sources, and create keyword aliases for common searches.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Offline preview of saved content&lt;/strong&gt; I could read indexed pages even if the original site went down or the page was deleted; a nice bonus that would occasionally save me from link rot.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Import existing history&lt;/strong&gt; I wanted to start with years of browsing data already indexed, rather than building up an index from scratch over months.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Free software&lt;/strong&gt; Self-Hosted, with no recurring costs or vendor lock-in. My browsing history is my personal data, it should not be owned by companies.&lt;/p&gt;

&lt;p&gt;No existing tool checked all these boxes. So I decided to build &lt;a href="https://github.com/asciimoo/hister" rel="noopener noreferrer"&gt;Hister&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Introducing Hister
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/asciimoo/hister" rel="noopener noreferrer"&gt;Hister&lt;/a&gt; is a self-hosted web history management tool that treats your browsing history as a personal search engine.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Results: 50% Reduction in 1.5 Months
&lt;/h2&gt;

&lt;p&gt;After using Hister for six weeks, I analyzed my search patterns:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;~50% of my Google searches now answered locally&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Found content Google couldn't&lt;/strong&gt; (authenticated pages, deleted content)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Zero privacy concerns&lt;/strong&gt; No tracking, no profiling&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Better results&lt;/strong&gt; for my specific needs (because it's MY history)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The more I use it, the better it gets. My local index is now:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;More relevant than Google for my common queries&lt;/li&gt;
&lt;li&gt;As fast as opening a new browser tab&lt;/li&gt;
&lt;li&gt;Comprehensive across authenticated services&lt;/li&gt;
&lt;li&gt;A personal knowledge base of everything I've read&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Unexpected Benefits
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Rediscovery:&lt;/strong&gt; I'm finding valuable content I'd forgotten about. That article I bookmarked 2 years ago but never revisited? Now it shows up in relevant searches.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Learning patterns:&lt;/strong&gt; Seeing what I search for reveals my knowledge gaps and interests.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Offline access:&lt;/strong&gt; When documentation sites go down or pages get deleted, I still have the content.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;We've accepted that search means "go to Google" for so long that we've forgotten there are alternatives. But for a huge portion of my daily searches - probably more than half - I don't need the entire internet. I need OUR internet: the pages I've read, the docs I've opened, the internal tools I use daily.&lt;/p&gt;

&lt;p&gt;Hister isn't trying to replace Google for discovery. It's trying to replace Google for recall. And in that domain, it's already better than Google could ever be, because:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It knows about authenticated pages Google will never see&lt;/li&gt;
&lt;li&gt;It searches YOUR history, not the entire web&lt;/li&gt;
&lt;li&gt;It's instant, private, and ad-free&lt;/li&gt;
&lt;li&gt;It gets better the more you use it&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;After 1.5 months, I've cut my Google dependence in half. I expect this number will increase as my index grows.&lt;/p&gt;

&lt;p&gt;If you're a developer, researcher, or knowledge worker who constantly re-searches for information you've already found, give Hister a try. It might just change how you find information on the internet.&lt;/p&gt;

&lt;h3&gt;
  
  
  Before Hister:
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Open Google&lt;/li&gt;
&lt;li&gt;Search: "bleve query"&lt;/li&gt;
&lt;li&gt;Click first result (probably wrong)&lt;/li&gt;
&lt;li&gt;Click second result (looks familiar…)&lt;/li&gt;
&lt;li&gt;Realize I've been here before&lt;/li&gt;
&lt;li&gt;Finally find the specific page I wanted&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Time: ~1-2 minutes, 5-10 clicks&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  With Hister:
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Open Hister&lt;/li&gt;
&lt;li&gt;Type: "bleve query", press enter&lt;/li&gt;
&lt;li&gt;First result is opened with the EXACT page I visited last month&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Time: ~5 seconds, few keystrokes&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Take Back Your Search
&lt;/h2&gt;

&lt;p&gt;To get started with Hister check out the following links:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/asciimoo/hister/releases" rel="noopener noreferrer"&gt;Download Hister&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://addons.mozilla.org/en-US/firefox/addon/hister/" rel="noopener noreferrer"&gt;Download Firefox Extension&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://chromewebstore.google.com/detail/hister/cciilamhchpmbdnniabclekddabkifhb" rel="noopener noreferrer"&gt;Download Chrome Extension&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Future Development
&lt;/h3&gt;

&lt;p&gt;I'm actively developing Hister with these goals:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Improve usability&lt;/li&gt;
&lt;li&gt;Add automatic indexing capabilities based on the index and opened results&lt;/li&gt;
&lt;li&gt;Find a secure and privacy respecting way to connect local Hister's to a distributed search engine&lt;/li&gt;
&lt;li&gt;Export search results&lt;/li&gt;
&lt;li&gt;Advanced analytics and search insights&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Hister is open source (AGPLv3) and welcomes contributions!&lt;/p&gt;

&lt;h3&gt;
  
  
  Ways to Contribute
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;🐛 Report bugs and suggest features on &lt;a href="https://github.com/asciimoo/hister/issues" rel="noopener noreferrer"&gt;GitHub Issues&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;💻 Submit pull requests (check out &lt;a href="https://github.com/asciimoo/hister/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22" rel="noopener noreferrer"&gt;good first issues&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;📖 Improve documentation&lt;/li&gt;
&lt;li&gt;🎨 Design better UI/UX&lt;/li&gt;
&lt;li&gt;🌍 Translate to other languages&lt;/li&gt;
&lt;li&gt;⭐ Star the repo and spread the word!&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Have questions or feedback? Open an issue on &lt;a href="https://github.com/asciimoo/hister" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; or reach out to &lt;a href="https://github.com/asciimoo" rel="noopener noreferrer"&gt;@asciimoo&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>go</category>
      <category>searchengine</category>
      <category>indexer</category>
      <category>search</category>
    </item>
    <item>
      <title>How to Scrape Instagram Profiles</title>
      <dc:creator>Adam Tauber</dc:creator>
      <pubDate>Mon, 13 Nov 2017 00:00:00 +0000</pubDate>
      <link>https://forem.com/asciimoo/how-to-scrape-instagram-profiles-4gm</link>
      <guid>https://forem.com/asciimoo/how-to-scrape-instagram-profiles-4gm</guid>
      <description>

&lt;p&gt;Scraping can be tedious work especially if the target site isn't just a standard static HTML page. Plenty of modern sites have JavaScript only UIs where extracting content is not always trivial. Instagram is one of these websites, so I would like to show you how it is possible to write a scraper relatively fast to get images from Instagram. I'm using &lt;a href="http://go-colly.org/"&gt;Colly&lt;/a&gt;, a scraping framework for Golang. The full working example can be found &lt;a href="http://go-colly.org/docs/examples/instagram/"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;Information gathering&lt;/h2&gt;

&lt;p&gt;First, if we view the source code of a profile page (e.g. &lt;a href="https://instagram.com/instagram"&gt;https://instagram.com/instagram&lt;/a&gt;), we can see a bunch of JavaScript code inside the &lt;code&gt;body&lt;/code&gt; tag instead of static HTML tags. Let's take a closer look at it. We can see that the first &lt;code&gt;script&lt;/code&gt; is just a variable declaration where a huge JSON is assigned to a single variable (&lt;code&gt;window._sharedData&lt;/code&gt;). This JSON can be easily extracted from the &lt;code&gt;script&lt;/code&gt; tag by finding the first &lt;code&gt;{&lt;/code&gt; character and getting the whole content after it:&lt;/p&gt;



&lt;div class="highlight"&gt;&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;jsonData&lt;/span&gt;&lt;span class="x"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="x"&gt; &lt;/span&gt;&lt;span class="n"&gt;scriptContent&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;strings&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Index&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;scriptContent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="x"&gt; &lt;/span&gt;&lt;span class="s"&gt;"{"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="x"&gt; &lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="x"&gt; &lt;/span&gt;&lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;scriptContent&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="x"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Note that because it is a JavaScript variable declaration it has a trailing semicolon what we have to cut off to get a valid JSON. That's why the example above ends with &lt;code&gt;len(scriptContent)-1&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The formatted view of the extracted JSON reveals all the information we are looking for. The JSON contains information about a user's images and some metadata of the profile (e.g. the profile ID is &lt;code&gt;25025320&lt;/code&gt;). There is an interesting part of the metadata called &lt;code&gt;page_info&lt;/code&gt;:&lt;/p&gt;



&lt;div class="highlight"&gt;&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"page_info"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="s2"&gt;"has_next_page"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="s2"&gt;"end_cursor"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"AQBiQhGRC6c6f-YOxdU0ApaAvotN4zI601ymkAtQ8SutdWz2n-bKFCkv51PMAoi9im3tNDTFLyhV969z8a6JnAkQMzHbYVwNI4Ke7jbk99nvFA"&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Probably, the value of &lt;code&gt;end_cursor&lt;/code&gt;' is the attribute of the URL to get the next page when &lt;code&gt;has_next_page&lt;/code&gt; is &lt;code&gt;true&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Format JSONs with the handy &lt;a href="https://github.com/stedolan/jq"&gt;jq&lt;/a&gt; command line tool&lt;/p&gt;

&lt;h3&gt;Paging&lt;/h3&gt;

&lt;p&gt;The next page of the user profile is retrieved by an AJAX call, so we have to use the browser's Network Inspector to find out what is required to fetch it. Network Inspector shows a long and cryptic URL which has two GET parameters &lt;code&gt;query_id&lt;/code&gt; and &lt;code&gt;variables&lt;/code&gt;:&lt;/p&gt;



&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://www.instagram.com/graphql/query/?query_id=17888483320059182&amp;amp;variables=%7B%22id%22%3A%2225025320%22%2C%22first%22%3A12%2C%22after%22%3A%22AQBiQhGRC6c6f-YOxdU0ApaAvotN4zI601ymkAtQ8SutdWz2n-bKFCkv51PMAoi9im3tNDTFLyhV969z8a6JnAkQMzHbYVwNI4Ke7jbk99nvFA%22%7D
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;It seems like Instagram uses a &lt;a href="https://en.wikipedia.org/wiki/GraphQL"&gt;GraphQL&lt;/a&gt; API and the value of &lt;code&gt;variables&lt;/code&gt; GET parameter is an URL encoded value. We can decode it with a single line of Python code:&lt;/p&gt;



&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ python -c 'import urlparse;print(urlparse.parse_qs("variables=%7B%22id%22%3A%2225025320%22%2C%22first%22%3A12%2C%22after%22%3A%22AQBiQhGRC6c6f-YOxdU0ApaAvotN4zI601ymkAtQ8SutdWz2n-bKFCkv51PMAoi9im3tNDTFLyhV969z8a6JnAkQMzHbYVwNI4Ke7jbk99nvFA%22%7D")["variables"][0])'
{"id":"25025320","first":12,"after":"AQBiQhGRC6c6f-YOxdU0ApaAvotN4zI601ymkAtQ8SutdWz2n-bKFCkv51PMAoi9im3tNDTFLyhV969z8a6JnAkQMzHbYVwNI4Ke7jbk99nvFA"}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;As you can see it is a JSON object and the value of the &lt;code&gt;after&lt;/code&gt; attribute is the same as the value of the &lt;code&gt;end_cursor&lt;/code&gt; and &lt;code&gt;id&lt;/code&gt; is the ID of the profile.&lt;/p&gt;

&lt;p&gt;The only unknown information in the next page URL is the &lt;code&gt;query_id&lt;/code&gt; GET parameter. The HTML source code does not contain it, nor the cookies or response headers. After a little bit of digging it can be found in a static JS file included in the main page and seems it is a constant value.&lt;/p&gt;

&lt;p&gt;The format of the response is also JSON but the structure is different from what we've found on the main page. This JSON contains the same information as the previous one, however we cannot use the same method to extract data due to structural differences.&lt;/p&gt;

&lt;h2&gt;Building the scraper&lt;/h2&gt;

&lt;p&gt;The information gathering phase clearly shows that we need four building blocks to be able to fetch all images found on an Instagram profile. Let's do it using Colly.&lt;/p&gt;

&lt;h3&gt;Extract and parse JSON from the main page&lt;/h3&gt;

&lt;p&gt;To extract content from HTML we need a new &lt;code&gt;Collector&lt;/code&gt; which has a HTML callback to extract the JSON data from the &lt;code&gt;script&lt;/code&gt; element. Specifying this callback and when it must be called can be done in &lt;code&gt;OnHTML&lt;/code&gt; function of &lt;code&gt;Collector&lt;/code&gt;.&lt;br&gt;
The JSON can be easily converted to native Go structure using &lt;code&gt;json.Unmarshal&lt;/code&gt; from the standard library.&lt;/p&gt;



&lt;div class="highlight"&gt;&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="x"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="x"&gt; &lt;/span&gt;&lt;span class="n"&gt;colly&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewCollector&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="x"&gt;

&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;OnHTML&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"body &amp;gt; script:first-of-type"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="x"&gt; &lt;/span&gt;&lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="x"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;colly&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;HTMLElement&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="x"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="x"&gt;
    &lt;/span&gt;&lt;span class="c"&gt;// find JSON string&lt;/span&gt;&lt;span class="x"&gt;
    &lt;/span&gt;&lt;span class="n"&gt;jsonData&lt;/span&gt;&lt;span class="x"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="x"&gt; &lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Text&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;strings&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Index&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="x"&gt; &lt;/span&gt;&lt;span class="s"&gt;"{"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="x"&gt; &lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="x"&gt; &lt;/span&gt;&lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="x"&gt;

    &lt;/span&gt;&lt;span class="c"&gt;// parse JSON&lt;/span&gt;&lt;span class="x"&gt;
    &lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="x"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="x"&gt; &lt;/span&gt;&lt;span class="k"&gt;struct&lt;/span&gt;&lt;span class="x"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="x"&gt;
       &lt;/span&gt;&lt;span class="n"&gt;EntryData&lt;/span&gt;&lt;span class="x"&gt; &lt;/span&gt;&lt;span class="k"&gt;struct&lt;/span&gt;&lt;span class="x"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="x"&gt;
           &lt;/span&gt;&lt;span class="n"&gt;ProfilePage&lt;/span&gt;&lt;span class="x"&gt; &lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="k"&gt;struct&lt;/span&gt;&lt;span class="x"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="x"&gt;
               &lt;/span&gt;&lt;span class="n"&gt;User&lt;/span&gt;&lt;span class="x"&gt; &lt;/span&gt;&lt;span class="k"&gt;struct&lt;/span&gt;&lt;span class="x"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="x"&gt;
                   &lt;/span&gt;&lt;span class="n"&gt;Id&lt;/span&gt;&lt;span class="x"&gt;    &lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="x"&gt; &lt;/span&gt;&lt;span class="s"&gt;`json:"id"`&lt;/span&gt;&lt;span class="x"&gt;
                   &lt;/span&gt;&lt;span class="n"&gt;Media&lt;/span&gt;&lt;span class="x"&gt; &lt;/span&gt;&lt;span class="k"&gt;struct&lt;/span&gt;&lt;span class="x"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="x"&gt;
                       &lt;/span&gt;&lt;span class="n"&gt;Nodes&lt;/span&gt;&lt;span class="x"&gt; &lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="k"&gt;struct&lt;/span&gt;&lt;span class="x"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="x"&gt;
                           &lt;/span&gt;&lt;span class="n"&gt;ImageURL&lt;/span&gt;&lt;span class="x"&gt;     &lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="x"&gt; &lt;/span&gt;&lt;span class="s"&gt;`json:"display_src"`&lt;/span&gt;&lt;span class="x"&gt;
                           &lt;/span&gt;&lt;span class="n"&gt;ThumbnailURL&lt;/span&gt;&lt;span class="x"&gt; &lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="x"&gt; &lt;/span&gt;&lt;span class="s"&gt;`json:"thumbnail_src"`&lt;/span&gt;&lt;span class="x"&gt;
                           &lt;/span&gt;&lt;span class="n"&gt;IsVideo&lt;/span&gt;&lt;span class="x"&gt;      &lt;/span&gt;&lt;span class="kt"&gt;bool&lt;/span&gt;&lt;span class="x"&gt;   &lt;/span&gt;&lt;span class="s"&gt;`json:"is_video"`&lt;/span&gt;&lt;span class="x"&gt;
                           &lt;/span&gt;&lt;span class="n"&gt;Date&lt;/span&gt;&lt;span class="x"&gt;         &lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="x"&gt;    &lt;/span&gt;&lt;span class="s"&gt;`json:"date"`&lt;/span&gt;&lt;span class="x"&gt;
                           &lt;/span&gt;&lt;span class="n"&gt;Dimensions&lt;/span&gt;&lt;span class="x"&gt;   &lt;/span&gt;&lt;span class="k"&gt;struct&lt;/span&gt;&lt;span class="x"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="x"&gt;
                               &lt;/span&gt;&lt;span class="n"&gt;Width&lt;/span&gt;&lt;span class="x"&gt;  &lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="x"&gt; &lt;/span&gt;&lt;span class="s"&gt;`json:"width"`&lt;/span&gt;&lt;span class="x"&gt;
                               &lt;/span&gt;&lt;span class="n"&gt;Height&lt;/span&gt;&lt;span class="x"&gt; &lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="x"&gt; &lt;/span&gt;&lt;span class="s"&gt;`json:"height"`&lt;/span&gt;&lt;span class="x"&gt;
                           &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="x"&gt;
                       &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="x"&gt;
                       &lt;/span&gt;&lt;span class="n"&gt;PageInfo&lt;/span&gt;&lt;span class="x"&gt; &lt;/span&gt;&lt;span class="n"&gt;pageInfo&lt;/span&gt;&lt;span class="x"&gt; &lt;/span&gt;&lt;span class="s"&gt;`json:"page_info"`&lt;/span&gt;&lt;span class="x"&gt;
                   &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="x"&gt; &lt;/span&gt;&lt;span class="s"&gt;`json:"media"`&lt;/span&gt;&lt;span class="x"&gt;
               &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="x"&gt; &lt;/span&gt;&lt;span class="s"&gt;`json:"user"`&lt;/span&gt;&lt;span class="x"&gt;
           &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="x"&gt; &lt;/span&gt;&lt;span class="s"&gt;`json:"ProfilePage"`&lt;/span&gt;&lt;span class="x"&gt;
       &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="x"&gt; &lt;/span&gt;&lt;span class="s"&gt;`json:"entry_data"`&lt;/span&gt;&lt;span class="x"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}{}&lt;/span&gt;&lt;span class="x"&gt;
    &lt;/span&gt;&lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="x"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="x"&gt; &lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Unmarshal&lt;/span&gt;&lt;span class="p"&gt;([]&lt;/span&gt;&lt;span class="kt"&gt;byte&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;jsonData&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="x"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="x"&gt;
    &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="x"&gt; &lt;/span&gt;&lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="x"&gt; &lt;/span&gt;&lt;span class="o"&gt;!=&lt;/span&gt;&lt;span class="x"&gt; &lt;/span&gt;&lt;span class="no"&gt;nil&lt;/span&gt;&lt;span class="x"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="x"&gt;
        &lt;/span&gt;&lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Fatal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="x"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="x"&gt;

    &lt;/span&gt;&lt;span class="c"&gt;// enumerate images&lt;/span&gt;&lt;span class="x"&gt;
    &lt;/span&gt;&lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="x"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="x"&gt; &lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;EntryData&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ProfilePage&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="x"&gt;
    &lt;/span&gt;&lt;span class="n"&gt;actualUserId&lt;/span&gt;&lt;span class="x"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="x"&gt; &lt;/span&gt;&lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;User&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Id&lt;/span&gt;&lt;span class="x"&gt;
    &lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="x"&gt; &lt;/span&gt;&lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="x"&gt; &lt;/span&gt;&lt;span class="n"&gt;obj&lt;/span&gt;&lt;span class="x"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="x"&gt; &lt;/span&gt;&lt;span class="k"&gt;range&lt;/span&gt;&lt;span class="x"&gt; &lt;/span&gt;&lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;User&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Media&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Nodes&lt;/span&gt;&lt;span class="x"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="x"&gt;
        &lt;/span&gt;&lt;span class="c"&gt;// skip videos&lt;/span&gt;&lt;span class="x"&gt;
        &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="x"&gt; &lt;/span&gt;&lt;span class="n"&gt;obj&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;IsVideo&lt;/span&gt;&lt;span class="x"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="x"&gt;
            &lt;/span&gt;&lt;span class="k"&gt;continue&lt;/span&gt;&lt;span class="x"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="x"&gt;
        &lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Visit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;obj&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ImageURL&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="x"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="x"&gt;
    &lt;/span&gt;&lt;span class="c"&gt;// ...&lt;/span&gt;&lt;span class="x"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="x"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;h3&gt;Create and visit next page URLs&lt;/h3&gt;

&lt;p&gt;The format of the next page URL is fixed, so a format string can be declared which accepts the changing &lt;code&gt;id&lt;/code&gt; and &lt;code&gt;after&lt;/code&gt; parameters.&lt;/p&gt;



&lt;div class="highlight"&gt;&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;const&lt;/span&gt;&lt;span class="x"&gt; &lt;/span&gt;&lt;span class="n"&gt;nextPageURLTemplate&lt;/span&gt;&lt;span class="x"&gt; &lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="x"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="x"&gt; &lt;/span&gt;&lt;span class="s"&gt;`https://www.instagram.com/graphql/query/?query_id=17888483320059182&amp;amp;variables={"id":"%s","first":12,"after":"%s"}`&lt;/span&gt;&lt;span class="x"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;h3&gt;Parse next page JSONs&lt;/h3&gt;

&lt;p&gt;This is pretty much the same as the conversion of the main page's JSON except these responses have some different attribute names (e.g. the image url is &lt;code&gt;display_url&lt;/code&gt; instead of &lt;code&gt;display_src&lt;/code&gt;).&lt;/p&gt;

&lt;h3&gt;Download and save images extracted from JSONs&lt;/h3&gt;

&lt;p&gt;After requesting images from Instagram using the &lt;code&gt;Visit&lt;/code&gt; function, responses can be handled in &lt;code&gt;OnResponse&lt;/code&gt;. It requires a callback as a parameter which is called after the response has arrived. To select responses which include images, we should filter based on &lt;code&gt;Content-Type&lt;/code&gt; HTTP header. If it is image, it must be saved.&lt;/p&gt;



&lt;div class="highlight"&gt;&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;OnResponse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="x"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;colly&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="x"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="x"&gt;
    &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="x"&gt; &lt;/span&gt;&lt;span class="n"&gt;strings&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Index&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Headers&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Content-Type"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="x"&gt; &lt;/span&gt;&lt;span class="s"&gt;"image"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="x"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="x"&gt; &lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="x"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="x"&gt;
        &lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Save&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;outputDir&lt;/span&gt;&lt;span class="x"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="x"&gt; &lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FileName&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;&lt;span class="x"&gt;
        &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="x"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="x"&gt;
    &lt;/span&gt;&lt;span class="c"&gt;// handle further response types...&lt;/span&gt;&lt;span class="x"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="x"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;h2&gt;Epilogue&lt;/h2&gt;

&lt;p&gt;Scraping JS-only sites isn't always trivial, but can be handled without headless browsers and client side code execution to achieve great performance. This scraper example downloads approximately 1000 images a minute on a single thread over a regular home Internet connection.&lt;/p&gt;

&lt;p&gt;It can be tweaked further to handle videos and extract meta information.&lt;/p&gt;


</description>
      <category>scraping</category>
      <category>tutorial</category>
      <category>go</category>
      <category>instagram</category>
    </item>
  </channel>
</rss>
