<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: AlwaysPrimeDev</title>
    <description>The latest articles on Forem by AlwaysPrimeDev (@alwaysprimedev).</description>
    <link>https://forem.com/alwaysprimedev</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F625880%2F19bc46ba-3aaf-4534-b39e-6e96164e8a8e.png</url>
      <title>Forem: AlwaysPrimeDev</title>
      <link>https://forem.com/alwaysprimedev</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/alwaysprimedev"/>
    <language>en</language>
    <item>
      <title>I Built a LinkedIn Profile Scraper on Apify for Public Profiles, Company Enrichment, and Lead Research</title>
      <dc:creator>AlwaysPrimeDev</dc:creator>
      <pubDate>Sun, 29 Mar 2026 11:54:32 +0000</pubDate>
      <link>https://forem.com/alwaysprimedev/i-built-a-linkedin-profile-scraper-on-apify-for-public-profiles-company-enrichment-and-lead-1la5</link>
      <guid>https://forem.com/alwaysprimedev/i-built-a-linkedin-profile-scraper-on-apify-for-public-profiles-company-enrichment-and-lead-1la5</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwgpwyol6cgsnqaphgf8o.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwgpwyol6cgsnqaphgf8o.png" alt=" " width="800" height="475"&gt;&lt;/a&gt;&lt;br&gt;
Public LinkedIn data is still one of the most useful inputs for lead generation, recruiting, founder sourcing, and&lt;br&gt;
market research.&lt;/p&gt;

&lt;p&gt;The problem is that many LinkedIn scraping workflows have too much friction:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;they depend on cookies&lt;/li&gt;
&lt;li&gt;they break the moment setup is slightly wrong&lt;/li&gt;
&lt;li&gt;they return shallow profile data&lt;/li&gt;
&lt;li&gt;they make you wait until the whole run finishes before you can use anything&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I wanted something simpler.&lt;/p&gt;

&lt;p&gt;So I built a LinkedIn Profile Scraper on Apify that works with public profile URLs, does not require LinkedIn cookies,&lt;br&gt;
and returns structured profile data plus company enrichment and best-effort contact discovery from public company&lt;br&gt;
websites.&lt;/p&gt;
&lt;h2&gt;
  
  
  What the actor does
&lt;/h2&gt;

&lt;p&gt;You pass in one or more public LinkedIn profile URLs like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"profileUrls"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"https://www.linkedin.com/in/williamhgates"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"https://www.linkedin.com/in/satyanadella"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For each public profile, the actor can return:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;full name, headline, summary, location&lt;/li&gt;
&lt;li&gt;followers and connections&lt;/li&gt;
&lt;li&gt;current role and company&lt;/li&gt;
&lt;li&gt;work experience and education&lt;/li&gt;
&lt;li&gt;recent posts and articles&lt;/li&gt;
&lt;li&gt;company LinkedIn URL, website, industry, and size&lt;/li&gt;
&lt;li&gt;best-effort email candidates discovered from public company pages&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhad3jnype2b9r1zqc1vd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhad3jnype2b9r1zqc1vd.png" alt=" " width="800" height="470"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;That makes it useful not only for scraping profiles, but for building enriched lead or research datasets.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why I built it this way
&lt;/h2&gt;

&lt;p&gt;The main design goal was low-friction enrichment.&lt;/p&gt;

&lt;p&gt;Instead of asking users to manage session cookies, I focused on publicly accessible profile pages. Then I extended the&lt;br&gt;
output beyond the profile itself:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;company details are enriched from public LinkedIn company pages&lt;/li&gt;
&lt;li&gt;email candidates are discovered from public company website pages like /contact, /about, and /team&lt;/li&gt;
&lt;li&gt;successful profiles are streamed into the Apify dataset as soon as they finish&lt;/li&gt;
&lt;li&gt;failed items are kept out of the main result dataset so the output stays clean&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That last point matters more than people think. If you are enriching hundreds of profiles, you usually do not want to&lt;br&gt;
wait for the entire batch before the first usable results appear.&lt;/p&gt;

&lt;h2&gt;
  
  
  Technical notes
&lt;/h2&gt;

&lt;p&gt;The actor is written in Go and uses concurrent workers, retry handling, request timeouts, and proxy support. On the&lt;br&gt;
parsing side, it combines HTML selectors with JSON-LD extraction to get more reliable structured data from public&lt;br&gt;
pages.&lt;/p&gt;

&lt;p&gt;On the Apify side, I wanted the actor to feel like a production tool, not just a script:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;minimal input&lt;/li&gt;
&lt;li&gt;progressive dataset output&lt;/li&gt;
&lt;li&gt;export to JSON, CSV, or Excel&lt;/li&gt;
&lt;li&gt;easy connection to webhooks, Make, Zapier, n8n, Airtable, Google Sheets, or a CRM&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Good use cases
&lt;/h2&gt;

&lt;p&gt;This actor is a good fit if you need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;recruiter snapshots of public profiles&lt;/li&gt;
&lt;li&gt;enriched prospect data for outbound&lt;/li&gt;
&lt;li&gt;founder and operator sourcing&lt;/li&gt;
&lt;li&gt;company and talent mapping&lt;/li&gt;
&lt;li&gt;quick public-profile research pipelines inside Apify&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Compliance note
&lt;/h2&gt;

&lt;p&gt;This actor is intended for publicly visible LinkedIn data only. It is not meant to bypass authentication walls or&lt;br&gt;
access private profile data. As always, make sure your usage complies with applicable rules, laws, and internal&lt;br&gt;
policies.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final thought
&lt;/h2&gt;

&lt;p&gt;I did not want to build “just another scraper.” I wanted an Apify actor that turns a public LinkedIn profile URL into&lt;br&gt;
usable structured research data with as little setup as possible.&lt;/p&gt;

&lt;p&gt;If that matches your workflow, you can try the actor here:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://apify.com/alwaysprimedev/linkedin-profile-scraper" rel="noopener noreferrer"&gt;[Click here]&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you want, I can also share the implementation details or write a follow-up post about the parsing and enrichment&lt;br&gt;
pipeline behind it.&lt;/p&gt;

</description>
      <category>apify</category>
      <category>webscraping</category>
      <category>go</category>
      <category>automation</category>
    </item>
    <item>
      <title>How I Built an Instagram Profile Scraper in Go and Shipped It to Apify</title>
      <dc:creator>AlwaysPrimeDev</dc:creator>
      <pubDate>Wed, 18 Mar 2026 21:19:05 +0000</pubDate>
      <link>https://forem.com/alwaysprimedev/how-i-built-an-instagram-profile-scraper-in-go-and-shipped-it-to-apify-35d1</link>
      <guid>https://forem.com/alwaysprimedev/how-i-built-an-instagram-profile-scraper-in-go-and-shipped-it-to-apify-35d1</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwq87qyesgpqu40xjj615.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwq87qyesgpqu40xjj615.png" alt=" " width="800" height="491"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I recently built a small Instagram profile scraper in Go, packaged it as an Apify Actor, and published it so other people can use it without maintaining the infrastructure themselves.&lt;/p&gt;

&lt;p&gt;The goal was simple: fetch public Instagram profile data by username and return clean, automation-friendly JSON. I did&lt;br&gt;
not want browser automation, heavy dependencies, or deeply nested output that becomes painful to use in datasets, exports, or pipelines.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem
&lt;/h2&gt;

&lt;p&gt;A lot of scraping projects work, but they are hard to operationalize.&lt;/p&gt;

&lt;p&gt;They rely on full browser stacks, break on minor changes, or return raw payloads that still need another transformation layer before they become useful. For profile lookups, I wanted something much lighter:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;input: one or more Instagram usernames&lt;/li&gt;
&lt;li&gt;output: structured profile data&lt;/li&gt;
&lt;li&gt;deployment: packaged for Apify&lt;/li&gt;
&lt;li&gt;operations: proxy-ready and resilient to partial failures&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The approach
&lt;/h2&gt;

&lt;p&gt;I built the Actor in pure Go with no external dependencies beyond the standard library.&lt;/p&gt;

&lt;p&gt;Instead of browser automation, the scraper makes a direct request to Instagram’s web profile endpoint and sends the headers that Instagram expects for that request. That keeps the runtime small and fast, which is a good fit for an Apify Actor.&lt;/p&gt;

&lt;p&gt;The Actor accepts either a legacy &lt;code&gt;username&lt;/code&gt; field or a &lt;code&gt;usernames&lt;/code&gt; array, normalizes the input, strips &lt;code&gt;@&lt;/code&gt;, and removes duplicates. That makes it easier to use both manually and from automations.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the scraper returns
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj8x8jjvisxe13j5bzwmo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj8x8jjvisxe13j5bzwmo.png" alt=" " width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The Actor extracts and normalizes the most useful profile fields, including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;username and internal Instagram ID&lt;/li&gt;
&lt;li&gt;full name and biography&lt;/li&gt;
&lt;li&gt;follower, following, and post counts&lt;/li&gt;
&lt;li&gt;profile picture URLs&lt;/li&gt;
&lt;li&gt;private, verified, business, and professional flags&lt;/li&gt;
&lt;li&gt;related profiles&lt;/li&gt;
&lt;li&gt;latest posts&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The &lt;code&gt;latestPosts&lt;/code&gt; section is where I spent more time than expected. I did not want to return only a shortcode and a caption. I wanted each post to be immediately useful, so I included things like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;caption text&lt;/li&gt;
&lt;li&gt;hashtags and mentions parsed from the caption&lt;/li&gt;
&lt;li&gt;likes and comments count&lt;/li&gt;
&lt;li&gt;dimensions&lt;/li&gt;
&lt;li&gt;image URLs&lt;/li&gt;
&lt;li&gt;tagged users&lt;/li&gt;
&lt;li&gt;child posts for carousel content&lt;/li&gt;
&lt;li&gt;normalized timestamps&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That way, the Actor output is already useful for lead generation, competitor monitoring, influencer research, and internal dashboards.&lt;/p&gt;

&lt;h2&gt;
  
  
  Making it practical for Apify
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fps0ssmkmnmhqjyoi94zy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fps0ssmkmnmhqjyoi94zy.png" alt=" " width="800" height="489"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Building the scraper itself was only half the task. The other half was productizing it.&lt;/p&gt;

&lt;p&gt;I added:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;an Apify input schema for usernames&lt;/li&gt;
&lt;li&gt;a dataset schema for cleaner output browsing&lt;/li&gt;
&lt;li&gt;a Docker build so the Actor can run consistently&lt;/li&gt;
&lt;li&gt;dataset push logic so each profile is saved directly to the Apify dataset&lt;/li&gt;
&lt;li&gt;proxy support for more reliable requests at scale&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One implementation detail I care about is failure handling. If one username is invalid or unavailable, the whole run should not fail. The Actor skips missing profiles and continues processing the rest. It only fails the run on actual technical errors such as network or dataset write failures.&lt;/p&gt;

&lt;p&gt;That matters in production much more than it seems during local development.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I learned
&lt;/h2&gt;

&lt;p&gt;A few lessons stood out while building this:&lt;/p&gt;

&lt;p&gt;First, scraping is only part of the value. Data shape matters just as much. A flat, predictable output is more valuable than a huge raw JSON blob.&lt;/p&gt;

&lt;p&gt;Second, operational details matter early. Timeouts, proxy support, and partial-failure handling are not “later” concerns if you want to publish a usable product.&lt;/p&gt;

&lt;p&gt;Third, packaging changes how you think. Once I decided to publish the scraper on Apify, I had to think less like a developer running a script and more like someone maintaining a small API product.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final result
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdvqhl2rilwbcimcne8y9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdvqhl2rilwbcimcne8y9.png" alt=" " width="800" height="490"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The result is a lightweight Instagram Profile Scraper Actor in Go that can fetch one or many public profiles and return structured output ready for datasets and automations.&lt;/p&gt;

&lt;p&gt;If you want to try it without building your own pipeline, you can check it out here:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You can check it here: &lt;a href="https://apify.com/alwaysprimedev/instagram-profile-scraper" rel="noopener noreferrer"&gt;Link&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you are building scraping tools yourself, my main advice is this: optimize for usable output, not just successful requests. That is usually what makes the difference between a side script and a product.&lt;/p&gt;

</description>
      <category>automation</category>
      <category>go</category>
      <category>showdev</category>
      <category>webscraping</category>
    </item>
  </channel>
</rss>
