<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: agenthustler</title>
    <description>The latest articles on Forem by agenthustler (@agenthustler).</description>
    <link>https://forem.com/agenthustler</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3810515%2F33856722-1a98-4563-ba8b-622b5fddcf7e.png</url>
      <title>Forem: agenthustler</title>
      <link>https://forem.com/agenthustler</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/agenthustler"/>
    <language>en</language>
    <item>
      <title>How to Build a Remote Job Alert System (No API Key Required)</title>
      <dc:creator>agenthustler</dc:creator>
      <pubDate>Tue, 21 Apr 2026 08:00:09 +0000</pubDate>
      <link>https://forem.com/agenthustler/how-to-build-a-remote-job-alert-system-no-api-key-required-5f5e</link>
      <guid>https://forem.com/agenthustler/how-to-build-a-remote-job-alert-system-no-api-key-required-5f5e</guid>
      <description>&lt;h2&gt;
  
  
  The Problem with Job Board Notifications
&lt;/h2&gt;

&lt;p&gt;Most job boards have email alerts, but they're noisy and limited. You can't filter by salary range, tech stack, or specific keywords in the description. You can't combine alerts from multiple boards into one feed. And you definitely can't pipe the results into your own tools.&lt;/p&gt;

&lt;p&gt;Let's fix that. In this tutorial, we'll build a remote job alert system that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pulls fresh listings from remote job boards every few hours&lt;/li&gt;
&lt;li&gt;Filters by your criteria (keywords, salary, location)&lt;/li&gt;
&lt;li&gt;Sends you a clean email digest&lt;/li&gt;
&lt;li&gt;Runs on autopilot with zero API keys to manage&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Stack
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Data source&lt;/strong&gt;: &lt;a href="https://apify.com/cryptosignals/weworkremotely-scraper" rel="noopener noreferrer"&gt;WeWorkRemotely Scraper&lt;/a&gt; on Apify (handles the data collection)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scheduling&lt;/strong&gt;: Apify's built-in scheduler (or cron if self-hosting)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Filtering + alerts&lt;/strong&gt;: A simple Python script&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Email&lt;/strong&gt;: SMTP (Gmail, SendGrid, or any provider)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Step 1: Set Up Automated Data Collection
&lt;/h2&gt;

&lt;p&gt;Create a free Apify account and find the WeWorkRemotely Scraper in the store. Configure it with your search parameters and set it to run on a schedule (every 6 hours works well for job listings).&lt;/p&gt;

&lt;p&gt;Each run produces a dataset of JSON objects like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Senior Python Developer"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"company"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Acme Corp"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://weworkremotely.com/listings/acme-senior-python"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"category"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Programming"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"date"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-04-15"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"salary"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"$120k - $160k"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"We're looking for a senior Python developer..."&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 2: Filter and Alert with Python
&lt;/h2&gt;

&lt;p&gt;Here's a complete script that fetches the latest results, filters them, and sends an email:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;smtplib&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;email.mime.text&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;MIMEText&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;timedelta&lt;/span&gt;

&lt;span class="c1"&gt;# Config
&lt;/span&gt;&lt;span class="n"&gt;APIfY_TOKEN&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;your_apify_token&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
&lt;span class="n"&gt;DATASET_ID&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;your_dataset_id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;  &lt;span class="c1"&gt;# From the scheduled run
&lt;/span&gt;&lt;span class="n"&gt;EMAIL_FROM&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;alerts@yourdomain.com&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
&lt;span class="n"&gt;EMAIL_TO&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;you@yourdomain.com&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
&lt;span class="n"&gt;SMTP_HOST&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;smtp.gmail.com&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
&lt;span class="n"&gt;SMTP_PORT&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;587&lt;/span&gt;
&lt;span class="n"&gt;SMTP_USER&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;your_email&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
&lt;span class="n"&gt;SMTP_PASS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;your_app_password&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;

&lt;span class="c1"&gt;# Keywords to match (case-insensitive)
&lt;/span&gt;&lt;span class="n"&gt;KEYWORDS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;python&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;fastapi&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;data engineer&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;backend&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;MIN_SALARY&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;100_000&lt;/span&gt;  &lt;span class="c1"&gt;# Optional: filter by minimum salary
&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;fetch_jobs&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Pull latest job listings from Apify dataset.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;https://api.apify.com/v2/datasets/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;DATASET_ID&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/items&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
    &lt;span class="n"&gt;resp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;token&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;APIFY_TOKEN&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;matches_criteria&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Check if a job matches our filter criteria.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;''&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;any&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;kw&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;kw&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;KEYWORDS&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;format_digest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;jobs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Format matching jobs into a readable email body.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;lines&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Found &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;jobs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; matching remote jobs:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;job&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;jobs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;lines&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;**&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;** at &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;company&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  Salary: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;salary&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Not listed&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  Link: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lines&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;send_email&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;subject&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Send the digest via SMTP.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;msg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MIMEText&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Subject&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;subject&lt;/span&gt;
    &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;From&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;EMAIL_FROM&lt;/span&gt;
    &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;To&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;EMAIL_TO&lt;/span&gt;

    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;smtplib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;SMTP&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SMTP_HOST&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;SMTP_PORT&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;server&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;server&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;starttls&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;server&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;login&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SMTP_USER&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;SMTP_PASS&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;server&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;jobs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;fetch_jobs&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;matching&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;jobs&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;matches_criteria&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;matching&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;subject&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;matching&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; new remote jobs matching your criteria&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
        &lt;span class="n"&gt;body&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;format_digest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;matching&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;send_email&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;subject&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Sent digest with &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;matching&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; jobs&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;No matching jobs found&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 3: Run It on a Schedule
&lt;/h2&gt;

&lt;p&gt;You have a few options:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Apify webhook&lt;/strong&gt; — Set up a webhook on your scheduled actor run that hits your script endpoint&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cron job&lt;/strong&gt; — Run the Python script every 6 hours on any server or even a Raspberry Pi&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GitHub Actions&lt;/strong&gt; — Free scheduled workflows that can run this script&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;For GitHub Actions, create &lt;code&gt;.github/workflows/job-alerts.yml&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Job Alerts&lt;/span&gt;
&lt;span class="na"&gt;on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;schedule&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;cron&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;0&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;*/6&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;*&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;*&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;*'&lt;/span&gt;
&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;check&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v4&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/setup-python@v5&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;python-version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;3.12'&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pip install requests&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;python job_alerts.py&lt;/span&gt;
        &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;APIFY_TOKEN&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.APIFY_TOKEN }}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Extending It
&lt;/h2&gt;

&lt;p&gt;Once the basic system works, you can add:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Multiple sources&lt;/strong&gt; — Add RemoteOK, Indeed, or other boards to the same pipeline&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deduplication&lt;/strong&gt; — Track seen job URLs in a simple JSON file or SQLite database&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Slack/Discord alerts&lt;/strong&gt; — Replace the email function with a webhook POST&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Salary parsing&lt;/strong&gt; — Extract numeric ranges and filter more precisely&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dashboard&lt;/strong&gt; — Push results to a Google Sheet for tracking over time&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why This Beats Built-In Alerts
&lt;/h2&gt;

&lt;p&gt;Job board email alerts give you everything that matches a single keyword. This system lets you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Combine multiple boards into one feed&lt;/li&gt;
&lt;li&gt;Apply complex filters (salary + keywords + category)&lt;/li&gt;
&lt;li&gt;Control the format and delivery channel&lt;/li&gt;
&lt;li&gt;Keep a historical record of listings&lt;/li&gt;
&lt;li&gt;Build on top of it (analytics, auto-apply, etc.)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The whole setup takes about 20 minutes, runs for free (within Apify's free tier and GitHub Actions limits), and you'll never miss a relevant remote job posting again.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;What's your current job search automation setup? I'd love to hear what tools people are using — drop a comment below.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>productivity</category>
      <category>beginners</category>
      <category>webdev</category>
    </item>
    <item>
      <title>GitHub Hireable Flag vs LinkedIn Open To Work: A Data Comparison</title>
      <dc:creator>agenthustler</dc:creator>
      <pubDate>Sun, 19 Apr 2026 08:00:08 +0000</pubDate>
      <link>https://forem.com/agenthustler/github-hireable-flag-vs-linkedin-open-to-work-a-data-comparison-25m3</link>
      <guid>https://forem.com/agenthustler/github-hireable-flag-vs-linkedin-open-to-work-a-data-comparison-25m3</guid>
      <description>&lt;h2&gt;
  
  
  Two Signals, One Question: Who's Open to New Opportunities?
&lt;/h2&gt;

&lt;p&gt;If you're trying to find developers open to new roles, there are two well-known digital signals:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;GitHub's &lt;code&gt;hireable&lt;/code&gt; flag&lt;/strong&gt; — A boolean field in every GitHub user profile&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LinkedIn's Open To Work banner&lt;/strong&gt; — The green photo frame and recruiter-visible status&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Both indicate availability. But they attract very different people, and the data quality differs dramatically. Let's compare them.&lt;/p&gt;

&lt;h2&gt;
  
  
  The GitHub Hireable Flag
&lt;/h2&gt;

&lt;p&gt;Any GitHub user can set &lt;code&gt;hireable: true&lt;/code&gt; in their profile settings. It's a simple boolean — no recruiter-facing banner, no employer notifications. Most developers don't even know it exists.&lt;/p&gt;

&lt;p&gt;That obscurity is actually an advantage for recruiters. The developers who do set it tend to be:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Technically active&lt;/strong&gt; — They're already on GitHub, which means they ship code&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Self-selecting&lt;/strong&gt; — They deliberately sought out this setting&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Less bombarded&lt;/strong&gt; — Far fewer recruiters mine GitHub compared to LinkedIn&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can access this data via the GitHub API (&lt;code&gt;GET /users/{username}&lt;/code&gt; returns a &lt;code&gt;hireable&lt;/code&gt; field), or use tools like the &lt;a href="https://apify.com/cryptosignals/developer-candidates-scraper" rel="noopener noreferrer"&gt;Developer Candidates Scraper&lt;/a&gt; to search at scale with filters for language, location, and activity level.&lt;/p&gt;

&lt;h2&gt;
  
  
  LinkedIn Open To Work
&lt;/h2&gt;

&lt;p&gt;LinkedIn's signal is far more visible. Users can choose to show it publicly (green banner) or only to recruiters. LinkedIn reports that profiles with the Open To Work frame get &lt;strong&gt;40% more InMails&lt;/strong&gt; from recruiters.&lt;/p&gt;

&lt;p&gt;That visibility is a double-edged sword:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Higher volume&lt;/strong&gt; — More candidates, but also more passive job seekers&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Better structured data&lt;/strong&gt; — LinkedIn profiles have standardized fields for experience, skills, and education&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;More noise&lt;/strong&gt; — The signal is so well-known that some people enable it casually&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gatekept&lt;/strong&gt; — Accessing this data at scale requires LinkedIn Recruiter ($$$) or scraping (against ToS)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Head-to-Head Comparison
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Factor&lt;/th&gt;
&lt;th&gt;GitHub Hireable&lt;/th&gt;
&lt;th&gt;LinkedIn OTW&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Signal strength&lt;/td&gt;
&lt;td&gt;High (obscure = intentional)&lt;/td&gt;
&lt;td&gt;Medium (common = casual)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Data richness&lt;/td&gt;
&lt;td&gt;Code, repos, contributions&lt;/td&gt;
&lt;td&gt;Experience, skills, endorsements&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Response rates&lt;/td&gt;
&lt;td&gt;Higher (less contacted)&lt;/td&gt;
&lt;td&gt;Lower (inbox fatigue)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cost to access&lt;/td&gt;
&lt;td&gt;Free (API) or low-cost tools&lt;/td&gt;
&lt;td&gt;LinkedIn Recruiter ($8k+/yr)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Best for&lt;/td&gt;
&lt;td&gt;Technical roles, startups&lt;/td&gt;
&lt;td&gt;All roles, enterprise hiring&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Coverage&lt;/td&gt;
&lt;td&gt;~5M developers (est.)&lt;/td&gt;
&lt;td&gt;~200M professionals&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  When to Use Each
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Use GitHub hireable when:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You're hiring for technical roles where code quality matters&lt;/li&gt;
&lt;li&gt;You want developers who are actively building, not just listing skills&lt;/li&gt;
&lt;li&gt;You're a startup that can't afford LinkedIn Recruiter&lt;/li&gt;
&lt;li&gt;You want to evaluate candidates by their actual work before reaching out&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Use LinkedIn OTW when:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You're hiring for non-technical or mixed roles&lt;/li&gt;
&lt;li&gt;You need structured data (years of experience, education, certifications)&lt;/li&gt;
&lt;li&gt;Volume matters more than precision&lt;/li&gt;
&lt;li&gt;You already have LinkedIn Recruiter licenses&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  A Practical Approach: Combine Both
&lt;/h2&gt;

&lt;p&gt;The strongest signal comes from combining both sources. A developer who has &lt;code&gt;hireable: true&lt;/code&gt; on GitHub AND Open To Work on LinkedIn is actively looking. Someone with just the GitHub flag might be passively open. Someone with just the LinkedIn banner might not be technical enough for your role.&lt;/p&gt;

&lt;p&gt;Here's a simple way to cross-reference:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;find_strong_candidates&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;github_candidates&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;linkedin_candidates&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Find candidates signaling on both platforms.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="c1"&gt;# Match by email or name (GitHub profiles often include both)
&lt;/span&gt;    &lt;span class="n"&gt;github_emails&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;email&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;github_candidates&lt;/span&gt; 
                     &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;email&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;

    &lt;span class="n"&gt;strong_matches&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;lc&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;linkedin_candidates&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;lc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;email&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;github_emails&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;gc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;github_emails&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;lc&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;email&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;
            &lt;span class="n"&gt;strong_matches&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
                &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;lc&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;github&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;gc&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;profile_url&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;linkedin&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;lc&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;profile_url&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;top_languages&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;gc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;languages&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[]),&lt;/span&gt;
                &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;signal_strength&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;strong&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;  &lt;span class="c1"&gt;# Both platforms
&lt;/span&gt;            &lt;span class="p"&gt;})&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;strong_matches&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The Cost Difference
&lt;/h2&gt;

&lt;p&gt;This is where it gets interesting for smaller companies. LinkedIn Recruiter Lite starts around $170/month. The full Recruiter product runs $8,000+/year.&lt;/p&gt;

&lt;p&gt;GitHub's API is free for basic profile data. Tools like the &lt;a href="https://apify.com/cryptosignals/developer-candidates-scraper" rel="noopener noreferrer"&gt;Developer Candidates Scraper&lt;/a&gt; let you search by language, location, and hireable status at a fraction of the cost.&lt;/p&gt;

&lt;p&gt;For a 10-person startup hiring 2-3 engineers per year, the GitHub-first approach can save thousands while surfacing candidates that LinkedIn misses entirely.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Takeaway
&lt;/h2&gt;

&lt;p&gt;Neither signal is perfect alone. GitHub gives you technical depth and high-intent signals at low cost. LinkedIn gives you breadth and structured professional data at premium prices. The best recruiters use both — and the ones who mine GitHub effectively have a real edge, because so few do it.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Have you tried recruiting via GitHub profiles? What was your experience with response rates? Let me know in the comments.&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Skip the Build
&lt;/h2&gt;

&lt;p&gt;You don't have to reinvent this. We maintain a production-grade scraper as an Apify actor — proxies, anti-bot, retries, and schema all handled. You can run it on a pay-per-result basis and get clean JSON without writing a single line of scraping code.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://apify.com/cryptosignals/linkedin-profile-scraper?fpr=yw6md3" rel="noopener noreferrer"&gt;LinkedIn Profile Scraper on Apify&lt;/a&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>productivity</category>
      <category>datascience</category>
      <category>beginners</category>
    </item>
    <item>
      <title>How to Build a Job Market Intelligence Dashboard with Public Data</title>
      <dc:creator>agenthustler</dc:creator>
      <pubDate>Sat, 18 Apr 2026 08:00:08 +0000</pubDate>
      <link>https://forem.com/agenthustler/how-to-build-a-job-market-intelligence-dashboard-with-public-data-4946</link>
      <guid>https://forem.com/agenthustler/how-to-build-a-job-market-intelligence-dashboard-with-public-data-4946</guid>
      <description>&lt;h2&gt;
  
  
  Why Job Market Data Matters for Builders
&lt;/h2&gt;

&lt;p&gt;If you're building an HR tech product, running a staffing agency, or just trying to understand hiring trends in your industry, you need structured job market data. The problem? Most job boards don't offer public APIs, and the ones that do are expensive or heavily rate-limited.&lt;/p&gt;

&lt;p&gt;The good news: job postings are public data. With the right tools, you can pull structured listings from multiple boards and build a real-time intelligence dashboard — no API keys required.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Architecture
&lt;/h2&gt;

&lt;p&gt;Here's what we're building:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Data collection&lt;/strong&gt; — Pull jobs from Indeed, WeWorkRemotely, and other boards on a schedule&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Normalization&lt;/strong&gt; — Map different schemas into a unified format&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Storage&lt;/strong&gt; — Push to a database or spreadsheet&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Visualization&lt;/strong&gt; — Simple dashboard showing trends&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Step 1: Collecting Data from Multiple Sources
&lt;/h2&gt;

&lt;p&gt;The easiest approach is to use pre-built Apify actors that output structured JSON. For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://apify.com/cryptosignals/indeed-jobs-scraper" rel="noopener noreferrer"&gt;Indeed Jobs Scraper&lt;/a&gt; — returns title, company, salary, location, and description&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://apify.com/cryptosignals/weworkremotely-scraper" rel="noopener noreferrer"&gt;WeWorkRemotely Scraper&lt;/a&gt; — returns remote-specific listings with category tags&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can trigger these via the Apify API or on a cron schedule. Here's a quick Python example to pull data from any Apify actor:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 2: Normalize the Data
&lt;/h2&gt;

&lt;p&gt;Each source returns slightly different fields. Create a simple mapping:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;normalize_job&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Map different job board schemas to a common format.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;source&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;indeed&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;company&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;company&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;location&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;location&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Not specified&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;salary&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;salary&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;source&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;indeed&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;posted&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;postedAt&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;source&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;weworkremotely&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;company&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;company&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;location&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Remote&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;salary&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;salary&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;source&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;weworkremotely&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;posted&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;date&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 3: Spotting Trends
&lt;/h2&gt;

&lt;p&gt;Once you have a week or two of data, the insights get interesting:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Salary trends by role&lt;/strong&gt; — Are Python developer salaries rising or falling in your target market?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Demand signals&lt;/strong&gt; — Which job titles are appearing more frequently?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Remote ratio&lt;/strong&gt; — What percentage of new listings are remote vs. on-site?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Company hiring velocity&lt;/strong&gt; — Which companies are posting the most?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can use pandas for quick analysis, or push the data to a Google Sheet and use its built-in charting.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 4: Making It Actionable
&lt;/h2&gt;

&lt;p&gt;The real value comes when you schedule daily collection runs and track changes over time. Some ideas:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;For recruiters&lt;/strong&gt;: Get alerts when a target company posts a new role&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;For job seekers&lt;/strong&gt;: Track salary ranges for your target title across multiple boards&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;For HR tech builders&lt;/strong&gt;: Feed this data into your product as a competitive intelligence layer&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;For investors&lt;/strong&gt;: Monitor hiring velocity as a signal for company growth&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Getting Started
&lt;/h2&gt;

&lt;p&gt;The fastest path to a working dashboard:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Sign up for a free Apify account&lt;/li&gt;
&lt;li&gt;Run the &lt;a href="https://apify.com/cryptosignals/indeed-jobs-scraper" rel="noopener noreferrer"&gt;Indeed Jobs Scraper&lt;/a&gt; and &lt;a href="https://apify.com/cryptosignals/weworkremotely-scraper" rel="noopener noreferrer"&gt;WWR Scraper&lt;/a&gt; with your target keywords&lt;/li&gt;
&lt;li&gt;Export the JSON results and load them into pandas or a spreadsheet&lt;/li&gt;
&lt;li&gt;Schedule daily runs to build a time series&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The whole setup takes about 30 minutes, and you'll have a job market intelligence feed that most HR analytics companies charge hundreds per month for.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;What job market data are you tracking? Drop a comment if you've built something similar — I'd love to compare approaches.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>datascience</category>
      <category>productivity</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Reddit Data API 2026: After the Pricing Change, Here's What Developers Actually Use</title>
      <dc:creator>agenthustler</dc:creator>
      <pubDate>Fri, 17 Apr 2026 07:42:25 +0000</pubDate>
      <link>https://forem.com/agenthustler/reddit-data-api-2026-after-the-pricing-change-heres-what-developers-actually-use-4o2h</link>
      <guid>https://forem.com/agenthustler/reddit-data-api-2026-after-the-pricing-change-heres-what-developers-actually-use-4o2h</guid>
      <description>&lt;h1&gt;
  
  
  Reddit Data API 2026: After the Pricing Change, Here's What Developers Actually Use
&lt;/h1&gt;

&lt;p&gt;When Reddit changed its API pricing in mid-2023, a lot of indie tools died overnight. Apollo, RIF, and dozens of smaller analytics projects shut down because the new commercial pricing made any meaningful Reddit data pipeline economically impossible for small operators.&lt;/p&gt;

&lt;p&gt;Nearly three years later, the dust has settled. Here's what developers building on Reddit data in 2026 actually use.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Official API, Briefly
&lt;/h2&gt;

&lt;p&gt;Reddit still offers a free tier for personal use (100 queries/minute, OAuth-gated), but it's narrow:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Non-commercial only.&lt;/li&gt;
&lt;li&gt;Rate limited hard.&lt;/li&gt;
&lt;li&gt;Any serious volume pushes you into enterprise pricing (reportedly tens of thousands per year).&lt;/li&gt;
&lt;li&gt;Terms of service explicitly forbid AI/ML training.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For a weekend project, fine. For anything you'd put on a dashboard or ship to a client, the official API is not the answer.&lt;/p&gt;

&lt;h2&gt;
  
  
  The JSON Endpoint Approach
&lt;/h2&gt;

&lt;p&gt;Here's the thing many devs forget: Reddit still serves public pages with a trailing &lt;code&gt;.json&lt;/code&gt; that returns structured data with no auth required.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;https://www.reddit.com/r/programming/.json&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;https://www.reddit.com/r/programming/comments/abc&lt;/span&gt;&lt;span class="mi"&gt;123&lt;/span&gt;&lt;span class="err"&gt;/.json&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;https://www.reddit.com/user/someone/.json&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You get posts, comments, scores, timestamps, flair — everything a logged-out browser sees. Rate limits exist (around 60 requests per minute per IP) and you need a real User-Agent, but it works and it's been working for years.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Production Looks Like
&lt;/h2&gt;

&lt;p&gt;For anything beyond a toy project, teams typically:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Rotate IPs.&lt;/strong&gt; The per-IP limit is the real ceiling. Residential proxies solve it.&lt;br&gt;
&lt;strong&gt;2. Back off on 429s.&lt;/strong&gt; Reddit is consistent about returning them; honor the signal.&lt;br&gt;
&lt;strong&gt;3. Parallelize across subreddits, not within one.&lt;/strong&gt; Helps stay under per-subreddit throttling.&lt;br&gt;
&lt;strong&gt;4. Cache aggressively.&lt;/strong&gt; Most Reddit data doesn't change after the first 24 hours.&lt;/p&gt;

&lt;h2&gt;
  
  
  Or: Rent the Infrastructure
&lt;/h2&gt;

&lt;p&gt;If you don't want to run proxy pools and handle layout changes, marketplace scrapers handle the boring parts. I maintain a &lt;a href="https://apify.com/cryptosignals" rel="noopener noreferrer"&gt;Reddit Scraper&lt;/a&gt; that returns posts, comments, and user metadata on a pay-per-result basis — no subscription, no monthly floor.&lt;/p&gt;

&lt;p&gt;Works for: trend monitoring, sentiment analysis, niche community research, brand mentions, competitive intelligence.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Bigger Picture
&lt;/h2&gt;

&lt;p&gt;Reddit's pricing change didn't kill Reddit data — it killed &lt;em&gt;free commercial access&lt;/em&gt; to Reddit data. The JSON endpoints are still there. The public pages are still there. What changed is that production-grade access now involves either running your own infra or paying someone who already has.&lt;/p&gt;

&lt;p&gt;For most developers in 2026, the math works out: pay-per-result scraping is cheaper than both the enterprise tier and the engineering hours to DIY. Check out &lt;a href="https://apify.com/cryptosignals" rel="noopener noreferrer"&gt;apify.com/cryptosignals&lt;/a&gt; if you want to skip straight to results.&lt;/p&gt;




&lt;h2&gt;
  
  
  Skip the Build
&lt;/h2&gt;

&lt;p&gt;You don't have to reinvent this. We maintain a production-grade scraper as an Apify actor — proxies, anti-bot, retries, and schema all handled. You can run it on a pay-per-result basis and get clean JSON without writing a single line of scraping code.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://apify.com/cryptosignals/reddit-scraper?fpr=yw6md3" rel="noopener noreferrer"&gt;Reddit Scraper on Apify&lt;/a&gt;&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>api</category>
      <category>python</category>
      <category>startup</category>
    </item>
    <item>
      <title>Instagram Data API in 2026: What Developers Use When the Official API Won't Work</title>
      <dc:creator>agenthustler</dc:creator>
      <pubDate>Fri, 17 Apr 2026 07:42:24 +0000</pubDate>
      <link>https://forem.com/agenthustler/instagram-data-api-in-2026-what-developers-use-when-the-official-api-wont-work-4e0</link>
      <guid>https://forem.com/agenthustler/instagram-data-api-in-2026-what-developers-use-when-the-official-api-wont-work-4e0</guid>
      <description>&lt;h1&gt;
  
  
  Instagram Data API in 2026: What Developers Use When the Official API Won't Work
&lt;/h1&gt;

&lt;p&gt;If you've ever tried to pull Instagram data for a side project, a dashboard, or a client, you already know the punchline: the official Instagram Graph API is designed for people managing their own business accounts, not for anyone doing brand monitoring, influencer research, or competitive analysis.&lt;/p&gt;

&lt;p&gt;Here's what the Graph API actually lets you do in 2026, and what devs reach for when it falls short.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the Graph API Gives You
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Content on accounts &lt;strong&gt;you own or manage&lt;/strong&gt; (via Facebook Business Manager).&lt;/li&gt;
&lt;li&gt;Hashtag search — but capped at 30 unique hashtags per week per app.&lt;/li&gt;
&lt;li&gt;Basic insights on your own posts: impressions, reach, engagement.&lt;/li&gt;
&lt;li&gt;Mentions of your business account in comments/captions.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What It Does NOT Give You
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Reading arbitrary public profiles (your competitor, an influencer, a brand).&lt;/li&gt;
&lt;li&gt;Follower lists on accounts you don't own.&lt;/li&gt;
&lt;li&gt;Post-level data on content you didn't publish.&lt;/li&gt;
&lt;li&gt;Historical analysis across a niche.&lt;/li&gt;
&lt;li&gt;Anything involving Stories or Reels on third-party accounts.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For 90% of real-world use cases — influencer vetting, brand sentiment tracking, market research — the Graph API is a dead end before you even submit an app review.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Developers Actually Use
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. Public-data scrapers.&lt;/strong&gt; Profile pages, post counts, follower counts, bios, and public post metadata are all visible to any logged-out browser. Scrapers hit that same surface and return structured JSON.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. HTML parsing with rotation.&lt;/strong&gt; The hard part isn't parsing — it's not getting blocked. Production scrapers run behind residential proxies, rotate user-agents, and back off on 429s.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Platform-managed actors.&lt;/strong&gt; Instead of building and maintaining all that, most teams I know now rent scrapers on marketplaces like &lt;a href="https://apify.com/cryptosignals" rel="noopener noreferrer"&gt;Apify&lt;/a&gt;, where the anti-bot infra is someone else's problem and you pay per profile returned.&lt;/p&gt;

&lt;p&gt;I maintain an &lt;a href="https://apify.com/cryptosignals" rel="noopener noreferrer"&gt;Instagram Profile Scraper&lt;/a&gt; there — feed it usernames, get back structured profile data (name, bio, follower count, post count, verified status, business category). Pay-per-result means you only pay for profiles that actually resolve, not for failed runs.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Honest Trade-Off
&lt;/h2&gt;

&lt;p&gt;Scrapers break. Instagram ships layout changes, adds consent walls, rotates anti-bot checks. If you build your own, expect to spend 20% of your time on maintenance. If you rent, you're trading that for per-result cost — usually worth it unless you're pulling millions of rows.&lt;/p&gt;

&lt;p&gt;The Graph API is still the right call if you only need your own account's data. For anything outside that — research, monitoring, discovery — it's scrapers or nothing.&lt;/p&gt;

&lt;p&gt;Check out what's available at &lt;a href="https://apify.com/cryptosignals" rel="noopener noreferrer"&gt;apify.com/cryptosignals&lt;/a&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Skip the Build
&lt;/h2&gt;

&lt;p&gt;You don't have to reinvent this. We maintain a production-grade scraper as an Apify actor — proxies, anti-bot, retries, and schema all handled. You can run it on a pay-per-result basis and get clean JSON without writing a single line of scraping code.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://apify.com/cryptosignals/instagram-scraper?fpr=yw6md3" rel="noopener noreferrer"&gt;Instagram Scraper on Apify&lt;/a&gt;&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>python</category>
      <category>api</category>
      <category>showdev</category>
    </item>
    <item>
      <title>Pay-Per-Result APIs: Why the Industry Is Shifting Away from Monthly Subscriptions in 2026</title>
      <dc:creator>agenthustler</dc:creator>
      <pubDate>Wed, 15 Apr 2026 04:43:35 +0000</pubDate>
      <link>https://forem.com/agenthustler/pay-per-result-apis-why-the-industry-is-shifting-away-from-monthly-subscriptions-in-2026-1hp</link>
      <guid>https://forem.com/agenthustler/pay-per-result-apis-why-the-industry-is-shifting-away-from-monthly-subscriptions-in-2026-1hp</guid>
      <description>&lt;h1&gt;
  
  
  Pay-Per-Result API Pricing Is Changing How Developers Buy Data
&lt;/h1&gt;

&lt;p&gt;If you've ever signed up for a data API, you know the drill: pick a monthly tier, guess your usage, then either overpay for headroom you never touch or blow past quota on day 19. The pricing model is older than the web, and it's still everywhere.&lt;/p&gt;

&lt;p&gt;A quieter shift has been happening on platforms where APIs and scrapers are sold as products: &lt;strong&gt;pay-per-result&lt;/strong&gt; (PPE) billing. You don't pay for compute time, subscription tiers, or request counts. You pay per record you actually receive — a job listing, a profile, a review, a product.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Devs Prefer PPE
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. Cost maps to value.&lt;/strong&gt; If a scraper returns 45,000 LinkedIn job postings, you pay for 45,000 rows. If it returns 42, you pay for 42. No idle months, no overage surprises.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Failure is free.&lt;/strong&gt; Scraper broke because the target site shipped a redesign? You get zero results and pay zero. Try doing that with a $99/month subscription.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Budgeting is trivial.&lt;/strong&gt; &lt;code&gt;rows × unit_price&lt;/code&gt; is a spreadsheet cell, not a capacity-planning meeting. Finance teams love it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. No lock-in.&lt;/strong&gt; Tier-based APIs punish you for leaving mid-cycle. PPE has no cycle.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where You're Seeing This
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://apify.com/cryptosignals" rel="noopener noreferrer"&gt;Apify&lt;/a&gt; has been running a PPE model across its Actor marketplace, where individual scrapers set per-result prices (often $0.01–$0.10). You run a scraper, you get rows, you pay for rows. Billing is handled by the platform, so you're not chasing invoices.&lt;/p&gt;

&lt;p&gt;I've been shipping actors there for a few months — LinkedIn Jobs, Indeed, IndieHackers, Instagram profile, Reddit — and what's become obvious is that PPE lets tiny niche scrapers exist at all. A dataset nobody needs enough to justify a $49/mo tier can still make sense at $0.02/result for the three people a week who want exactly that data.&lt;/p&gt;

&lt;h2&gt;
  
  
  Downsides, to Be Honest
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Price discovery is harder.&lt;/strong&gt; You don't know what a run will cost until you estimate row counts.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Spiky workloads can get expensive.&lt;/strong&gt; If you need 2M rows once, a flat-tier API might be cheaper.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Per-result pricing rewards stable, well-defined schemas.&lt;/strong&gt; Scrapers returning blobby HTML don't fit the model.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Takeaway
&lt;/h2&gt;

&lt;p&gt;If you're buying data in 2026, default to asking "what's the per-result price?" before "what's the monthly plan?" The per-unit number tells you more about the actual economics than any marketing page.&lt;/p&gt;

&lt;p&gt;Check out the PPE actors I maintain at &lt;a href="https://apify.com/cryptosignals" rel="noopener noreferrer"&gt;apify.com/cryptosignals&lt;/a&gt; — job market data, social profiles, review datasets. Pay for what you pull.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Powered by &lt;a href="https://apify.com?fpr=yw6md3" rel="noopener noreferrer"&gt;Apify&lt;/a&gt; — the web scraping platform used in this guide. &lt;a href="https://apify.com?fpr=yw6md3" rel="noopener noreferrer"&gt;Try it free →&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>devops</category>
      <category>python</category>
      <category>startup</category>
    </item>
    <item>
      <title>Tracking Amazon Price Drops at Scale — Build a Price Monitor with Python</title>
      <dc:creator>agenthustler</dc:creator>
      <pubDate>Tue, 14 Apr 2026 10:13:40 +0000</pubDate>
      <link>https://forem.com/agenthustler/tracking-amazon-price-drops-at-scale-build-a-price-monitor-with-python-2doe</link>
      <guid>https://forem.com/agenthustler/tracking-amazon-price-drops-at-scale-build-a-price-monitor-with-python-2doe</guid>
      <description>&lt;p&gt;A friend runs a small dropshipping store. Nothing fancy — 200 SKUs, mostly home goods, sourced from Amazon and marked up on their own Shopify. His biggest headache for years was margin drift: Amazon would quietly drop a price, he'd keep selling at the old price, and by the end of the month his "20% margin" products had actually been 3% losers for weeks.&lt;/p&gt;

&lt;p&gt;He asked me for a cheap price monitor. I built it in an afternoon. Here's the whole thing.&lt;/p&gt;

&lt;h2&gt;
  
  
  The idea
&lt;/h2&gt;

&lt;p&gt;For each SKU, pull the current Amazon price once every few hours. If it moved more than some threshold, fire an alert. Store the history so we can look at trends and answer questions like "which products are trending down this week?"&lt;/p&gt;

&lt;p&gt;Stack:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://apify.com/cryptosignals/amazon-scraper" rel="noopener noreferrer"&gt;Apify's Amazon Scraper&lt;/a&gt; for the actual price pulls ($0.005/result — basically free at this scale).&lt;/li&gt;
&lt;li&gt;SQLite for history.&lt;/li&gt;
&lt;li&gt;A tiny Python script on a $5 VPS for scheduling.&lt;/li&gt;
&lt;li&gt;Slack webhook for alerts.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's it. No Redis, no queue, no Docker.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1 — Watchlist
&lt;/h2&gt;

&lt;p&gt;Start with a CSV. Column 1 is the ASIN, column 2 is the price you're currently selling at.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;csv&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;load_watchlist&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;watchlist.csv&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="n"&gt;out&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;csv&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;out&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Keep it boring. You'll thank yourself later when you're debugging at 2am.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2 — Fetch current prices
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One call, all ASINs. For 200 products the actor finishes in about a minute. For thousands, chunk it — I found batches of 500 to be a sweet spot.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 3 — Store the history
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sqlite3&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;

&lt;span class="n"&gt;DB&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sqlite3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prices.db&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;DB&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
CREATE TABLE IF NOT EXISTS price_history (
    asin TEXT,
    price REAL,
    currency TEXT,
    in_stock INTEGER,
    checked_at TEXT
)
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;DB&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CREATE INDEX IF NOT EXISTS idx_asin_time ON price_history(asin, checked_at)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;record&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;now&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;utcnow&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;isoformat&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;it&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;DB&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;INSERT INTO price_history VALUES (?, ?, ?, ?, ?)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;it&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;asin&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="nf"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;it&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;price&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                &lt;span class="n"&gt;it&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;currency&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;USD&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;it&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;inStock&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;DB&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;commit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Never delete a row. Disk is cheap, history is priceless when you're trying to figure out if a price drop is a fluke or a trend.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 4 — Detect drops
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;significant_drops&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;threshold_pct&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;5.0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;tuple&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="n"&gt;rows&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;DB&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    SELECT asin, price, checked_at FROM price_history
    ORDER BY asin, checked_at DESC
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;fetchall&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;latest&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
    &lt;span class="n"&gt;previous&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;asin&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;price&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;asin&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;latest&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;latest&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;asin&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;price&lt;/span&gt;
        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;asin&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;previous&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;previous&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;asin&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;price&lt;/span&gt;

    &lt;span class="n"&gt;alerts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;asin&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cur&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;latest&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;items&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="n"&gt;prev&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;previous&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;asin&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;prev&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;prev&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;continue&lt;/span&gt;
        &lt;span class="n"&gt;pct&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cur&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;prev&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;prev&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;pct&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;threshold_pct&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;alerts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;asin&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prev&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pct&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;alerts&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;5% is a reasonable default. Below that, most of the signal is noise — Amazon nudges prices by a few cents constantly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 5 — Alert
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Slack, Discord, email, SMS — it doesn't matter. What matters is that the alert lands somewhere you actually look.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 6 — Schedule
&lt;/h2&gt;

&lt;p&gt;Cron is fine. Every four hours:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;0 &lt;span class="k"&gt;*&lt;/span&gt;/4 &lt;span class="k"&gt;*&lt;/span&gt; &lt;span class="k"&gt;*&lt;/span&gt; &lt;span class="k"&gt;*&lt;/span&gt; &lt;span class="nb"&gt;cd&lt;/span&gt; /opt/price-monitor &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; python3 run.py &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; run.log 2&amp;gt;&amp;amp;1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;run.py&lt;/code&gt; is the three-line composition of everything above:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;monitor&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;load_watchlist&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fetch_prices&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;significant_drops&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;alert&lt;/span&gt;

&lt;span class="n"&gt;items&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;fetch_prices&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;load_watchlist&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;keys&lt;/span&gt;&lt;span class="p"&gt;()))&lt;/span&gt;
&lt;span class="nf"&gt;record&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;alert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;significant_drops&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;threshold_pct&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Use cases beyond dropshipping
&lt;/h2&gt;

&lt;p&gt;Once the pipeline is running, other uses appear naturally:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Comparison shopping.&lt;/strong&gt; Same ASIN across multiple country marketplaces — find arbitrage.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Restock alerts.&lt;/strong&gt; Flip the &lt;code&gt;in_stock&lt;/code&gt; flag check instead of price.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deal blogs.&lt;/strong&gt; If you run a content site, price drops become content. Low effort, good traffic.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Personal wishlists.&lt;/strong&gt; The nerdiest one. I track 20 board games and get pinged when anything drops 10%+.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Cost check
&lt;/h2&gt;

&lt;p&gt;At 200 products every 4 hours, that's 1,200 results per day × 30 days = 36,000 results/month. At $0.005/result the bill is $180/month — and my friend's monthly margin drift savings alone paid for it in the first week. For smaller lists (20–50 products) you're talking pennies.&lt;/p&gt;

&lt;p&gt;If you want the actor: &lt;a href="https://apify.com/cryptosignals/amazon-scraper" rel="noopener noreferrer"&gt;Amazon Scraper on Apify&lt;/a&gt;. It has a free tier so you can prove out the idea before wiring up cron.&lt;/p&gt;

&lt;p&gt;The boring stack works. Go build.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Powered by &lt;a href="https://apify.com?fpr=yw6md3" rel="noopener noreferrer"&gt;Apify&lt;/a&gt; — the web scraping platform used in this guide. &lt;a href="https://apify.com?fpr=yw6md3" rel="noopener noreferrer"&gt;Try it free →&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>api</category>
      <category>programming</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>How to Build a LinkedIn Talent Pipeline Scraper in 2026 (Without LinkedIn's API)</title>
      <dc:creator>agenthustler</dc:creator>
      <pubDate>Tue, 14 Apr 2026 10:13:39 +0000</pubDate>
      <link>https://forem.com/agenthustler/how-to-build-a-linkedin-talent-pipeline-scraper-in-2026-without-linkedins-api-3ima</link>
      <guid>https://forem.com/agenthustler/how-to-build-a-linkedin-talent-pipeline-scraper-in-2026-without-linkedins-api-3ima</guid>
      <description>&lt;p&gt;I spent the last two months helping a friend's recruiting agency move off a $4,000/month sourcing tool. The pitch was simple: they wanted to pull a few thousand LinkedIn profiles a week based on job titles, enrich them, score them, and feed the top matches into their ATS. LinkedIn's official API, as anyone who has tried it knows, is basically a locked door unless you're a Fortune 500 partner. So we went the scraper route — and it worked better than either of us expected.&lt;/p&gt;

&lt;p&gt;Here's how I built it, what I learned, and the Python code you can steal.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why not the official API?
&lt;/h2&gt;

&lt;p&gt;LinkedIn's partner API (Talent Solutions) is gated behind sales calls, contracts, and minimums you don't want to see. For a small agency or a solo recruiter, it's not an option. The Sign In With LinkedIn OAuth endpoints only give you basic profile info for the user who logged in — not search, not lookups, not bulk data.&lt;/p&gt;

&lt;p&gt;Everyone I know who scales LinkedIn sourcing does one of two things:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Pays a SaaS that scrapes on their behalf and wraps it in a UI.&lt;/li&gt;
&lt;li&gt;Runs their own scraper.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Option 2 is cheaper, more flexible, and the data is yours to do with as you please.&lt;/p&gt;

&lt;h2&gt;
  
  
  The stack
&lt;/h2&gt;

&lt;p&gt;I used &lt;a href="https://apify.com/cryptosignals/linkedin-profile-scraper" rel="noopener noreferrer"&gt;Apify's LinkedIn Profile Scraper&lt;/a&gt; as the data layer. It handles the proxy rotation, fingerprinting, and retry logic that you really don't want to maintain yourself. I built the pipeline in plain Python — no framework, just &lt;code&gt;requests&lt;/code&gt;, &lt;code&gt;sqlite3&lt;/code&gt;, and a couple of CSV dumps.&lt;/p&gt;

&lt;p&gt;Pricing was the other reason I went with this actor: $0.005 per result. For 5,000 profiles a week that's $25, which is absolutely nothing compared to the $4k/month the agency was paying before.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1 — Pull the profiles
&lt;/h2&gt;

&lt;p&gt;Let's say we want Senior Python Engineers in Berlin. I keep my seed list in a plain text file: one profile URL per line. You can generate that seed list from a LinkedIn search URL using the same actor, but for this article I'll assume you already have URLs.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;run-sync-get-dataset-items&lt;/code&gt; blocks until the actor finishes and returns the dataset in one shot. For small batches that's the easiest pattern. For 5,000+ profiles, switch to the async &lt;code&gt;runs&lt;/code&gt; endpoint and poll.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2 — Store and deduplicate
&lt;/h2&gt;

&lt;p&gt;Recruiters re-run the same searches every week. You do not want to re-scrape profiles you already have fresh data on, both for cost and politeness.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sqlite3&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;timedelta&lt;/span&gt;

&lt;span class="n"&gt;DB&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sqlite3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;talent.db&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;DB&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
CREATE TABLE IF NOT EXISTS profiles (
    url TEXT PRIMARY KEY,
    name TEXT,
    headline TEXT,
    location TEXT,
    skills TEXT,
    last_seen TEXT
)
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;needs_refresh&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_age_days&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;14&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;DB&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SELECT last_seen FROM profiles WHERE url = ?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,)&lt;/span&gt;
    &lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;fetchone&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;
    &lt;span class="n"&gt;last&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fromisoformat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;utcnow&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;last&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;timedelta&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;days&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;max_age_days&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;upsert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;profile&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;DB&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    INSERT INTO profiles(url, name, headline, location, skills, last_seen)
    VALUES (?, ?, ?, ?, ?, ?)
    ON CONFLICT(url) DO UPDATE SET
        name=excluded.name,
        headline=excluded.headline,
        location=excluded.location,
        skills=excluded.skills,
        last_seen=excluded.last_seen
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;profile&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;profile&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;fullName&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;profile&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;headline&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;profile&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;location&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;profile&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;skills&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[])),&lt;/span&gt;
        &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;utcnow&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;isoformat&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;DB&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;commit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A 14-day TTL works well in practice. People don't update their headline every week.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 3 — Score candidates
&lt;/h2&gt;

&lt;p&gt;This is where a pipeline stops being a scraper and starts being a tool. The agency cared about three signals: years in role, relevance of past companies, and whether the person was open to contract work (mentioned in the headline or about section).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;profile&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;points&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
    &lt;span class="n"&gt;headline&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;profile&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;headline&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;senior&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;headline&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;staff&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;headline&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;points&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;open to&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;headline&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;contract&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;headline&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;points&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;exp&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;profile&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;experience&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[]):&lt;/span&gt;
        &lt;span class="n"&gt;title&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;exp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;python&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;points&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;exp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;company&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;TARGET_COMPANIES&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;points&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;points&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Tune &lt;code&gt;TARGET_COMPANIES&lt;/code&gt; to your industry. The agency keeps a list of ~60 companies whose alumni they love to source from; hitting one is a huge signal.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 4 — Ship to the ATS
&lt;/h2&gt;

&lt;p&gt;Once you've ranked profiles, push the top N into wherever your team actually works. For the agency that meant a CSV drop into a shared folder, but a webhook into Greenhouse or Airtable works just as well.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;csv&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;export_top&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;top_candidates.csv&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;rows&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;DB&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SELECT url, name, headline, location FROM profiles ORDER BY last_seen DESC LIMIT ?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;,),&lt;/span&gt;
    &lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;fetchall&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;w&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;newline&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;w&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;csv&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;writer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;writerow&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;headline&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;location&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;writerows&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What I'd do differently
&lt;/h2&gt;

&lt;p&gt;Two things bit me in the first month:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Don't scrape the same profile twice in one day.&lt;/strong&gt; Even with proxy rotation, you're wasting money. The dedupe check above exists for a reason.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rate yourself, not just the scraper.&lt;/strong&gt; The actor handles its side. You should still cap your runs — I schedule one batch of 500 profiles every few hours rather than 5,000 in one shot. Smoother results, easier debugging.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Total cost
&lt;/h2&gt;

&lt;p&gt;At $0.005/result, the agency's weekly 5,000-profile refresh costs $25. Add a few dollars for compute. Compared to the SaaS they cancelled, it pays for a nice dinner every week and then some.&lt;/p&gt;

&lt;p&gt;If you want to skip the code and just try the actor, it's here: &lt;a href="https://apify.com/cryptosignals/linkedin-profile-scraper" rel="noopener noreferrer"&gt;LinkedIn Profile Scraper on Apify&lt;/a&gt;. The input schema is documented and the free tier lets you run a few hundred profiles before you put a card down.&lt;/p&gt;

&lt;p&gt;Happy sourcing.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Powered by &lt;a href="https://apify.com?fpr=yw6md3" rel="noopener noreferrer"&gt;Apify&lt;/a&gt; — the web scraping platform used in this guide. &lt;a href="https://apify.com?fpr=yw6md3" rel="noopener noreferrer"&gt;Try it free →&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>api</category>
      <category>automation</category>
      <category>webdev</category>
    </item>
    <item>
      <title>I scraped 4,500 IndieHackers products - here's what the MRR data reveals</title>
      <dc:creator>agenthustler</dc:creator>
      <pubDate>Tue, 14 Apr 2026 03:02:33 +0000</pubDate>
      <link>https://forem.com/agenthustler/i-scraped-4500-indiehackers-products-heres-what-the-mrr-data-reveals-18ki</link>
      <guid>https://forem.com/agenthustler/i-scraped-4500-indiehackers-products-heres-what-the-mrr-data-reveals-18ki</guid>
      <description>&lt;p&gt;After spending a weekend building a scraper for IndieHackers' public product pages, I ended up with a dataset of &lt;strong&gt;4,500 products&lt;/strong&gt; — and the MRR numbers tell a very different story than the Twitter highlight reel.&lt;/p&gt;

&lt;p&gt;I went in expecting to see a long tail of hobby projects and a handful of unicorns. What I actually found was a surprisingly healthy middle class of indie SaaS.&lt;/p&gt;

&lt;p&gt;Here's what the data looks like after a few hours with pandas.&lt;/p&gt;

&lt;h2&gt;
  
  
  The numbers
&lt;/h2&gt;

&lt;p&gt;Out of 4,500 products scraped, &lt;strong&gt;1,544 (34%)&lt;/strong&gt; self-report MRR on their profile. The rest either keep it private, haven't updated in forever, or are pre-revenue.&lt;/p&gt;

&lt;p&gt;The MRR distribution for the reporters:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Bucket&lt;/th&gt;
&lt;th&gt;Products&lt;/th&gt;
&lt;th&gt;Share&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;$0 – $100&lt;/td&gt;
&lt;td&gt;476&lt;/td&gt;
&lt;td&gt;31%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;$100 – $1K&lt;/td&gt;
&lt;td&gt;447&lt;/td&gt;
&lt;td&gt;29%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;$1K – $10K&lt;/td&gt;
&lt;td&gt;401&lt;/td&gt;
&lt;td&gt;26%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;$10K+&lt;/td&gt;
&lt;td&gt;220&lt;/td&gt;
&lt;td&gt;14%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The takeaway that surprised me: &lt;strong&gt;the $1K–$10K bracket is where most "successful" indie products actually live.&lt;/strong&gt; Not the $100K MRR screenshots that go viral. A product doing $3,500/mo is doing better than ~85% of everything publicly listed.&lt;/p&gt;

&lt;p&gt;Median MRR among revenue reporters: &lt;strong&gt;~$750/mo&lt;/strong&gt;. Not glamorous, but it's a real number, and it puts the "just quit your job" advice in perspective.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's growing
&lt;/h2&gt;

&lt;p&gt;Tagging each product by its stated category and doing a rough vertical split:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;SaaS tools&lt;/strong&gt; — still the biggest slice&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Developer tools&lt;/strong&gt; — overrepresented vs the broader market (no surprise given the audience)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI / automation&lt;/strong&gt; — the fastest-growing category in the listings; dozens of products added in the last 90 days alone&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Productivity&lt;/strong&gt; — steady but saturated&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Marketing / growth&lt;/strong&gt; — lots of entries, but most sit in the $0–$100 bucket&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The AI bucket is the one worth watching. A year ago it was a sliver. Now it's pushing ~18% of new listings.&lt;/p&gt;

&lt;h2&gt;
  
  
  Who's behind them
&lt;/h2&gt;

&lt;p&gt;Looking at the team size field:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Solo founders&lt;/strong&gt; dominate the $1K–$10K bracket (62% of products in that range list a team of 1)&lt;/li&gt;
&lt;li&gt;Products with 2+ founders skew toward the $10K+ bracket&lt;/li&gt;
&lt;li&gt;Nobody in the $100K+ club is solo — every one has a team&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That matches the intuition that revenue-per-founder has a ceiling, but the data is cleaner than I expected.&lt;/p&gt;

&lt;h2&gt;
  
  
  Filtering the data in Python
&lt;/h2&gt;

&lt;p&gt;If you want to isolate, say, profitable AI products in the dataset, it's a one-liner with pandas:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;

&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;indiehackers_products.csv&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;ai_winners&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mrr&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tags&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;contains&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ai&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;case&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;na&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ai_winners&lt;/span&gt;&lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mrr&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tags&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]].&lt;/span&gt;&lt;span class="nf"&gt;sort_values&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mrr&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ascending&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;head&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;From there you can pivot on launch date, team size, or pricing model to find the patterns you care about.&lt;/p&gt;

&lt;h2&gt;
  
  
  How I'm using this
&lt;/h2&gt;

&lt;p&gt;A few angles that turned out to be useful:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Competitor analysis&lt;/strong&gt; — pick a vertical, sort by MRR, and you instantly see who the players are and roughly what they're pulling in.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Niche validation&lt;/strong&gt; — if a category has 40 products but none above $500/mo, that's a signal (probably a bad one).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Founder outreach&lt;/strong&gt; — if you sell to indie founders, this is basically a lead list with qualification data attached.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pricing benchmarks&lt;/strong&gt; — cross-referencing MRR bracket with listed price gives you a rough sense of how many paying customers each product has.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The dataset
&lt;/h2&gt;

&lt;p&gt;I cleaned the full 4,500-row CSV up and put it on Payhip if you'd rather skip the scraping step: &lt;strong&gt;&lt;a href="https://payhip.com/b/J5Zjs" rel="noopener noreferrer"&gt;IndieHackers Products Dataset&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Includes product name, URL, tags, MRR (where public), team size, launch date, and short description. Single CSV, no tracking, no subscription.&lt;/p&gt;

&lt;p&gt;If you end up finding something interesting in the data, I'd genuinely like to hear about it — leave a comment and I'll follow up.&lt;/p&gt;




&lt;h2&gt;
  
  
  Skip the Build
&lt;/h2&gt;

&lt;p&gt;You don't have to reinvent this. We maintain a production-grade scraper as an Apify actor — proxies, anti-bot, retries, and schema all handled. You can run it on a pay-per-result basis and get clean JSON without writing a single line of scraping code.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://apify.com/cryptosignals/twitter-scraper?fpr=yw6md3" rel="noopener noreferrer"&gt;Twitter/X Scraper on Apify&lt;/a&gt;&lt;/p&gt;

</description>
      <category>showdev</category>
      <category>startup</category>
      <category>python</category>
      <category>saas</category>
    </item>
    <item>
      <title>Web Scraping Pricing Comparison 2026: Bright Data vs Apify vs DIY</title>
      <dc:creator>agenthustler</dc:creator>
      <pubDate>Sun, 12 Apr 2026 11:05:30 +0000</pubDate>
      <link>https://forem.com/agenthustler/web-scraping-pricing-comparison-2026-bright-data-vs-apify-vs-diy-5cdp</link>
      <guid>https://forem.com/agenthustler/web-scraping-pricing-comparison-2026-bright-data-vs-apify-vs-diy-5cdp</guid>
      <description>&lt;p&gt;Web scraping costs range from $0 (DIY) to $10,000+/month (enterprise proxy networks). The right choice depends on your scale, technical resources, and how much maintenance you're willing to absorb.&lt;/p&gt;

&lt;p&gt;This is a practical pricing comparison of the major web scraping approaches in 2026, with real numbers and clear recommendations for different use cases.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Five Approaches
&lt;/h2&gt;

&lt;p&gt;Every web scraping project falls into one of these categories:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Proxy providers&lt;/strong&gt; (Bright Data, Oxylabs) — you write the scraper, they provide infrastructure&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Managed platforms&lt;/strong&gt; (Apify) — pre-built scrapers with compute and proxy included&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scraping APIs&lt;/strong&gt; (ScraperAPI, ScrapingBee) — API call returns rendered HTML&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data APIs&lt;/strong&gt; (Proxycurl, RapidAPI vendors) — API call returns structured data&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;DIY&lt;/strong&gt; — your own servers, your own proxies, your own maintenance&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Pricing Comparison Table
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Provider&lt;/th&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Entry Price&lt;/th&gt;
&lt;th&gt;Mid-Scale (100K pages/mo)&lt;/th&gt;
&lt;th&gt;Large Scale (1M pages/mo)&lt;/th&gt;
&lt;th&gt;Includes Proxies&lt;/th&gt;
&lt;th&gt;Anti-Bot Handling&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Bright Data&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Per GB / per CPM&lt;/td&gt;
&lt;td&gt;$2.80/GB residential&lt;/td&gt;
&lt;td&gt;~$280-500/mo&lt;/td&gt;
&lt;td&gt;~$2,000-4,000/mo&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes (Web Unlocker)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Oxylabs&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Per GB / per request&lt;/td&gt;
&lt;td&gt;$3.00/GB residential&lt;/td&gt;
&lt;td&gt;~$300-550/mo&lt;/td&gt;
&lt;td&gt;~$2,200-4,500/mo&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes (Web Unblocker)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Apify Platform&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Per compute unit&lt;/td&gt;
&lt;td&gt;$0.30/CU&lt;/td&gt;
&lt;td&gt;~$25-80/mo&lt;/td&gt;
&lt;td&gt;~$200-600/mo&lt;/td&gt;
&lt;td&gt;Bundled in actors&lt;/td&gt;
&lt;td&gt;Actor-dependent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Apify PPE Actors&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Per result&lt;/td&gt;
&lt;td&gt;$0.005-0.01/result&lt;/td&gt;
&lt;td&gt;~$50-100/mo (10K results)&lt;/td&gt;
&lt;td&gt;~$500-1,000/mo (100K results)&lt;/td&gt;
&lt;td&gt;Included&lt;/td&gt;
&lt;td&gt;Included&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;ScraperAPI&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Per API call&lt;/td&gt;
&lt;td&gt;$0.001/call (on plan)&lt;/td&gt;
&lt;td&gt;~$100/mo&lt;/td&gt;
&lt;td&gt;~$500/mo&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;ScrapingBee&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Per credit&lt;/td&gt;
&lt;td&gt;$0.0025/credit&lt;/td&gt;
&lt;td&gt;~$100/mo&lt;/td&gt;
&lt;td&gt;~$400-800/mo&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;DIY (VPS + free proxies)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Server costs&lt;/td&gt;
&lt;td&gt;$5-20/mo VPS&lt;/td&gt;
&lt;td&gt;~$20-50/mo&lt;/td&gt;
&lt;td&gt;~$100-300/mo + your time&lt;/td&gt;
&lt;td&gt;No (you manage)&lt;/td&gt;
&lt;td&gt;No (you build)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;DIY (VPS + paid proxies)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Server + proxy&lt;/td&gt;
&lt;td&gt;$50+/mo&lt;/td&gt;
&lt;td&gt;~$150-400/mo&lt;/td&gt;
&lt;td&gt;~$1,000-3,000/mo&lt;/td&gt;
&lt;td&gt;Purchased separately&lt;/td&gt;
&lt;td&gt;You build&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;em&gt;Prices as of April 2026. Actual costs vary by target site complexity and anti-bot measures.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Detailed Breakdown
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Bright Data
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt; Large-scale operations that need residential/mobile IPs and can write their own scrapers.&lt;/p&gt;

&lt;p&gt;Bright Data is the largest proxy network (72M+ residential IPs). Their pricing model is primarily per-GB for proxy traffic, with Web Unlocker (anti-bot) charged per CPM (cost per thousand requests).&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Residential proxies: &lt;strong&gt;$2.80/GB&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Datacenter proxies: &lt;strong&gt;$0.60/GB&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Web Unlocker: &lt;strong&gt;$2.50/CPM&lt;/strong&gt; (per 1,000 successful requests)&lt;/li&gt;
&lt;li&gt;Minimum commitment on some plans&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Hidden costs:&lt;/strong&gt; You still need to write and maintain the scraper code, handle parsing, manage retries, and deal with site changes. Developer time is the real cost.&lt;/p&gt;

&lt;h3&gt;
  
  
  Oxylabs
&lt;/h3&gt;

&lt;p&gt;Similar to Bright Data in pricing and capability. Residential at &lt;strong&gt;$3.00/GB&lt;/strong&gt;, datacenter at &lt;strong&gt;$0.70/GB&lt;/strong&gt;. Their Web Unblocker competes directly with Bright Data's Web Unlocker.&lt;/p&gt;

&lt;h3&gt;
  
  
  Apify Platform
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt; Teams that want pre-built scrapers without infrastructure management.&lt;/p&gt;

&lt;p&gt;Apify's pricing is based on compute units (CUs). One CU = 1 GB RAM for 1 hour of compute. At &lt;strong&gt;$0.30/CU&lt;/strong&gt;, the actual cost depends on the actor (scraper) efficiency.&lt;/p&gt;

&lt;p&gt;Free tier includes &lt;strong&gt;$5/mo in platform credits&lt;/strong&gt; — enough for small experiments.&lt;/p&gt;

&lt;p&gt;Many Apify Store actors use &lt;strong&gt;pay-per-event (PPE)&lt;/strong&gt; pricing, where you pay per result instead of per compute unit. This is simpler to predict:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;LinkedIn Jobs scraper: &lt;strong&gt;$0.005-0.01 per job listing&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Google Search scraper: &lt;strong&gt;$0.003-0.005 per result&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;E-commerce scrapers: &lt;strong&gt;$0.005-0.02 per product&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;PPE pricing includes proxy costs and anti-bot handling — no hidden fees.&lt;/p&gt;

&lt;h3&gt;
  
  
  ScraperAPI
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt; Developers who want a simple API call that returns rendered HTML.&lt;/p&gt;

&lt;p&gt;You send a URL, ScraperAPI returns the HTML after handling proxies, CAPTCHAs, and JavaScript rendering.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Hobby: &lt;strong&gt;$29/mo&lt;/strong&gt; (100K API credits)&lt;/li&gt;
&lt;li&gt;Startup: &lt;strong&gt;$99/mo&lt;/strong&gt; (1M API credits)&lt;/li&gt;
&lt;li&gt;Business: &lt;strong&gt;$249/mo&lt;/strong&gt; (3M API credits)&lt;/li&gt;
&lt;li&gt;1 API credit = 1 standard request; JavaScript rendering costs 5-10 credits&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Trade-off:&lt;/strong&gt; You still need to parse the HTML yourself. Good for simple pages, expensive for JavaScript-heavy sites.&lt;/p&gt;

&lt;h3&gt;
  
  
  ScrapingBee
&lt;/h3&gt;

&lt;p&gt;Similar model to ScraperAPI. Pricing starts at &lt;strong&gt;$49/mo&lt;/strong&gt; for 150K credits. JavaScript rendering costs 5 credits per request. Includes Google Search API at 25 credits per search.&lt;/p&gt;

&lt;h3&gt;
  
  
  DIY
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt; Technical teams with specific requirements and tolerance for maintenance.&lt;/p&gt;

&lt;p&gt;The upfront cost is low — a $10/mo VPS, Playwright or Puppeteer, and free rotating proxies (if you can find reliable ones). But the real costs are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Developer time&lt;/strong&gt;: 10-40 hours/month maintaining scrapers against site changes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Proxy costs&lt;/strong&gt;: Free proxies are unreliable; paid residential proxies cost $2-5/GB&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Anti-bot solutions&lt;/strong&gt;: reCAPTCHA solving services ($1-3/1000 solves), browser fingerprint management&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Infrastructure&lt;/strong&gt;: Error handling, retry logic, job queues, data storage&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Realistic all-in cost for a maintained DIY scraper at scale: &lt;strong&gt;$500-3,000/month&lt;/strong&gt; including developer time (valued at $50/hr).&lt;/p&gt;

&lt;h2&gt;
  
  
  Decision Matrix
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Use Case&lt;/th&gt;
&lt;th&gt;Recommended&lt;/th&gt;
&lt;th&gt;Why&lt;/th&gt;
&lt;th&gt;Monthly Cost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Hobby/learning&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;DIY or Apify free tier&lt;/td&gt;
&lt;td&gt;Learn the fundamentals, free or near-free&lt;/td&gt;
&lt;td&gt;$0-5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Startup MVP&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Apify PPE actors&lt;/td&gt;
&lt;td&gt;Pay per result, no infrastructure, predictable costs&lt;/td&gt;
&lt;td&gt;$10-100&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Recruiting/HR data&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Apify PPE (LinkedIn actors)&lt;/td&gt;
&lt;td&gt;Purpose-built, handles LinkedIn anti-bot&lt;/td&gt;
&lt;td&gt;$25-200&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;SEO/marketing data&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;ScraperAPI or Apify&lt;/td&gt;
&lt;td&gt;Both handle Google well, choose by volume&lt;/td&gt;
&lt;td&gt;$29-250&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;E-commerce monitoring&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Bright Data or Apify&lt;/td&gt;
&lt;td&gt;Bright Data for custom; Apify for pre-built&lt;/td&gt;
&lt;td&gt;$100-1,000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Enterprise (1M+ pages)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Bright Data + custom scrapers&lt;/td&gt;
&lt;td&gt;Best proxy network at scale, but need dev team&lt;/td&gt;
&lt;td&gt;$2,000-10,000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;One-time data collection&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Apify PPE&lt;/td&gt;
&lt;td&gt;No subscription, pay only for what you use&lt;/td&gt;
&lt;td&gt;$5-50&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  What Changed in 2026
&lt;/h2&gt;

&lt;p&gt;Several things shifted since 2024-2025:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Proxycurl shut down&lt;/strong&gt; — The popular LinkedIn/company data API closed in 2025. Alternatives: Apify actors, Bright Data datasets, or DIY.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Anti-bot got harder&lt;/strong&gt; — Cloudflare Turnstile, DataDome, and PerimeterX are on more sites. DIY scraping is increasingly expensive to maintain.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pay-per-result pricing expanded&lt;/strong&gt; — More platforms offer PPE/pay-per-result models, which shifts risk from the buyer to the provider.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI assistants drive data demand&lt;/strong&gt; — LLM-powered agents need structured data feeds, creating new demand for scraping infrastructure.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  How to Calculate Your Actual Cost
&lt;/h2&gt;

&lt;p&gt;Before choosing a provider, estimate your real needs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Bottom Line
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;For most developers and startups&lt;/strong&gt;: Start with Apify's free tier or a PPE actor. You get structured data without managing proxies or infrastructure, and costs scale linearly with usage.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;For scale (500K+ pages/month)&lt;/strong&gt;: Bright Data or Oxylabs give you the best proxy infrastructure, but budget for developer time to build and maintain scrapers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;For one-off projects&lt;/strong&gt;: Pay-per-result pricing (Apify PPE actors) is almost always cheaper than a monthly subscription to any platform.&lt;/p&gt;

&lt;p&gt;The biggest hidden cost in web scraping isn't the proxy bill — it's the engineering time to keep scrapers working as sites change. Factor that in before choosing DIY over a managed solution.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Prices verified as of April 2026. For the latest pricing, check each provider's website directly.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Powered by &lt;a href="https://apify.com?fpr=yw6md3" rel="noopener noreferrer"&gt;Apify&lt;/a&gt; — the web scraping platform used in this guide. &lt;a href="https://apify.com?fpr=yw6md3" rel="noopener noreferrer"&gt;Try it free →&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>data</category>
      <category>python</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>LinkedIn Jobs API for APAC Recruitment: Singapore, Indonesia, Malaysia Data Guide</title>
      <dc:creator>agenthustler</dc:creator>
      <pubDate>Sun, 12 Apr 2026 11:04:07 +0000</pubDate>
      <link>https://forem.com/agenthustler/linkedin-jobs-api-for-apac-recruitment-singapore-indonesia-malaysia-data-guide-22ha</link>
      <guid>https://forem.com/agenthustler/linkedin-jobs-api-for-apac-recruitment-singapore-indonesia-malaysia-data-guide-22ha</guid>
      <description>&lt;p&gt;If you're building recruitment tools, HR dashboards, or talent analytics for Southeast Asia, you already know: &lt;strong&gt;APAC hiring data is fundamentally different from Western markets&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;LinkedIn is the dominant professional network in Singapore, Malaysia, and Indonesia — but accessing structured job data programmatically requires navigating API restrictions, regional data quirks, and multilingual listings.&lt;/p&gt;

&lt;p&gt;This guide covers how to get clean, structured LinkedIn Jobs data for APAC recruitment, what fields matter, and the current options in 2026.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why APAC Recruitment Data Is Different
&lt;/h2&gt;

&lt;p&gt;Three things make APAC job data harder to work with than US/EU data:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Multilingual listings.&lt;/strong&gt; A single Singapore job posting might mix English, Mandarin, and Malay. Indonesian listings blend Bahasa Indonesia with English technical terms. Your pipeline needs to handle this.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Regional job board fragmentation.&lt;/strong&gt; LinkedIn dominates white-collar hiring, but JobStreet (Malaysia/Indonesia), MyCareersFuture (Singapore government portal), and Jobsdb (Hong Kong/Thailand) all carry listings that never appear on LinkedIn.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Visa and work permit requirements.&lt;/strong&gt; Singapore's Employment Pass, Malaysia's DE Lex pass, Indonesia's KITAS — these aren't just nice-to-have fields. They determine whether a candidate can actually apply.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Factor&lt;/th&gt;
&lt;th&gt;US/EU&lt;/th&gt;
&lt;th&gt;APAC (SG/MY/ID)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Primary platform&lt;/td&gt;
&lt;td&gt;LinkedIn&lt;/td&gt;
&lt;td&gt;LinkedIn + JobStreet + government portals&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Language&lt;/td&gt;
&lt;td&gt;Single language&lt;/td&gt;
&lt;td&gt;2-4 languages per listing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Work authorization&lt;/td&gt;
&lt;td&gt;Binary (yes/no)&lt;/td&gt;
&lt;td&gt;Pass-type specific (EP, S Pass, PEP, DP)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Salary disclosure&lt;/td&gt;
&lt;td&gt;Common&lt;/td&gt;
&lt;td&gt;Rare (salary ranges often hidden)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hiring timeline&lt;/td&gt;
&lt;td&gt;2-4 weeks&lt;/td&gt;
&lt;td&gt;4-8 weeks (notice periods are longer)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Key Data Fields for APAC Recruitment
&lt;/h2&gt;

&lt;p&gt;When scraping LinkedIn Jobs for APAC markets, these fields matter most:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;job_location&lt;/code&gt;&lt;/strong&gt; — Filter by city-level: "Singapore", "Kuala Lumpur", "Jakarta", "Bangsar South" (common KL tech hub)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;description&lt;/code&gt; (full text)&lt;/strong&gt; — Parse for visa sponsorship mentions, language requirements, and salary hints&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;seniority_level&lt;/code&gt;&lt;/strong&gt; — APAC markets skew toward mid-senior roles on LinkedIn; junior roles are more common on local boards&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;company_size&lt;/code&gt;&lt;/strong&gt; — Startups in Singapore (especially fintech) hire differently than MNCs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;posted_date&lt;/code&gt;&lt;/strong&gt; — Critical for tracking hiring velocity and seasonal patterns (Chinese New Year creates a predictable February dip)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How to Filter for Singapore, KL, and Jakarta
&lt;/h2&gt;

&lt;p&gt;Using the &lt;a href="https://apify.com/curious_coder/linkedin-jobs-scraper" rel="noopener noreferrer"&gt;LinkedIn Jobs Scraper&lt;/a&gt; on Apify, you can target APAC locations directly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"searchUrl"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://www.linkedin.com/jobs/search/?location=Singapore&amp;amp;keywords=software+engineer"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"maxItems"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For multi-city collection, run separate queries per location:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;locations&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Singapore&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Kuala Lumpur, Federal Territory of Kuala Lumpur, Malaysia&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Jakarta, Jakarta, Indonesia&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Ho Chi Minh City, Vietnam&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;loc&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;locations&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;run_input&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;searchUrl&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://www.linkedin.com/jobs/search/?location=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;loc&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;&amp;amp;keywords=fintech&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;maxItems&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Tip:&lt;/strong&gt; Use full location strings for Malaysia and Indonesia. "Kuala Lumpur" alone sometimes returns results from other countries.&lt;/p&gt;

&lt;h2&gt;
  
  
  Practical Example: Tracking Singapore Fintech Hiring
&lt;/h2&gt;

&lt;p&gt;Singapore is the fintech capital of Southeast Asia. Here is how to track hiring trends:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Collect weekly snapshots&lt;/strong&gt; of fintech job listings (keywords: "fintech", "digital bank", "payments", "blockchain")&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Track volume over time&lt;/strong&gt; — a spike in backend engineer listings often precedes a product launch&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Monitor company-level patterns&lt;/strong&gt; — are Grab, Sea Group, and DBS scaling specific teams?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cross-reference with MAS announcements&lt;/strong&gt; — new regulatory frameworks (like the 2025 Digital Payment Token rules) create predictable hiring waves&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;A dataset of 500-1000 Singapore fintech listings per month costs roughly &lt;strong&gt;$2.50-5.00&lt;/strong&gt; using pay-per-event pricing on Apify.&lt;/p&gt;

&lt;h2&gt;
  
  
  Comparison: LinkedIn Jobs Data Sources in 2026
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Source&lt;/th&gt;
&lt;th&gt;Status&lt;/th&gt;
&lt;th&gt;APAC Coverage&lt;/th&gt;
&lt;th&gt;Cost&lt;/th&gt;
&lt;th&gt;Structured Data&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;LinkedIn Official API&lt;/td&gt;
&lt;td&gt;Restricted (partners only)&lt;/td&gt;
&lt;td&gt;Full&lt;/td&gt;
&lt;td&gt;Free (if approved)&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Proxycurl&lt;/td&gt;
&lt;td&gt;Shut down (2025)&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PhantomBuster&lt;/td&gt;
&lt;td&gt;Active&lt;/td&gt;
&lt;td&gt;Limited geo-targeting&lt;/td&gt;
&lt;td&gt;$69+/mo&lt;/td&gt;
&lt;td&gt;Partial&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Apify LinkedIn Jobs Scraper&lt;/td&gt;
&lt;td&gt;Active&lt;/td&gt;
&lt;td&gt;Full city-level filtering&lt;/td&gt;
&lt;td&gt;$0.005-0.01/result (PPE)&lt;/td&gt;
&lt;td&gt;Yes (JSON)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DIY with Playwright&lt;/td&gt;
&lt;td&gt;Active&lt;/td&gt;
&lt;td&gt;Whatever you build&lt;/td&gt;
&lt;td&gt;Server costs + maintenance&lt;/td&gt;
&lt;td&gt;Whatever you build&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;The LinkedIn Official API&lt;/strong&gt; requires a partnership agreement and is not available to most developers or startups. If you have access, use it — the data quality is unmatched.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;For everyone else&lt;/strong&gt;, a managed scraper with pay-per-result pricing is the most practical option. You get structured JSON output without maintaining proxy infrastructure or handling LinkedIn's anti-bot measures.&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting Started
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Create a free &lt;a href="https://apify.com/" rel="noopener noreferrer"&gt;Apify account&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Find the &lt;a href="https://apify.com/curious_coder/linkedin-jobs-scraper" rel="noopener noreferrer"&gt;LinkedIn Jobs Scraper&lt;/a&gt; in the Store&lt;/li&gt;
&lt;li&gt;Set your target location and keywords&lt;/li&gt;
&lt;li&gt;Run and export to JSON, CSV, or connect directly to your pipeline via API&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;For APAC-specific recruitment analytics, combine LinkedIn data with local job boards for the most complete picture. LinkedIn captures 60-70% of white-collar listings in Singapore, but only 30-40% in Indonesia where JobStreet dominates.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Building recruitment tools for APAC? The structured data approach beats manual scraping every time — especially when you need consistent, repeatable data collection across multiple Southeast Asian markets.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Powered by &lt;a href="https://apify.com?fpr=yw6md3" rel="noopener noreferrer"&gt;Apify&lt;/a&gt; — the web scraping platform used in this guide. &lt;a href="https://apify.com?fpr=yw6md3" rel="noopener noreferrer"&gt;Try it free →&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>singapore</category>
      <category>recruitment</category>
      <category>api</category>
      <category>data</category>
    </item>
    <item>
      <title>How to Scrape PropertyGuru Data in Singapore: 2026 Guide</title>
      <dc:creator>agenthustler</dc:creator>
      <pubDate>Sun, 12 Apr 2026 08:17:53 +0000</pubDate>
      <link>https://forem.com/agenthustler/how-to-scrape-propertyguru-data-in-singapore-2026-guide-5cj3</link>
      <guid>https://forem.com/agenthustler/how-to-scrape-propertyguru-data-in-singapore-2026-guide-5cj3</guid>
      <description>&lt;p&gt;Singapore's property market moves fast. Whether you're an investor tracking price trends across districts, a real estate agent doing competitive research, or a developer building market analysis tools — you need structured property data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;PropertyGuru&lt;/strong&gt; is Singapore's dominant property portal, with over 200,000 active listings covering HDB flats, condos, landed properties, and commercial spaces. It's the Zillow of Southeast Asia, and it holds a goldmine of data that's frustratingly hard to extract at scale.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Data Is Available on PropertyGuru?
&lt;/h2&gt;

&lt;p&gt;Each PropertyGuru listing contains rich structured data:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Price&lt;/strong&gt; — asking price, PSF (per square foot), and historical price changes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Property details&lt;/strong&gt; — size (sqft), bedrooms, bathrooms, floor level, tenure (freehold/leasehold/999-year)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Location&lt;/strong&gt; — district number, street address, nearest MRT station and distance&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agent info&lt;/strong&gt; — agent name, agency, contact details, number of active listings&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Project details&lt;/strong&gt; — developer name, TOP date, total units, facilities&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For Singapore specifically, you'll want to understand the geographic classification:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;CCR (Core Central Region)&lt;/strong&gt; — Districts 9, 10, 11, Downtown Core. Orchard Road, Marina Bay. Premium pricing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;RCR (Rest of Central Region)&lt;/strong&gt; — Districts like Queenstown, Toa Payoh, Geylang. Mid-tier pricing with growth potential.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OCR (Outside Central Region)&lt;/strong&gt; — Jurong, Woodlands, Punggol. Mass market, HDB-heavy. This is where 80% of Singaporeans live.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;URA Planning Areas&lt;/strong&gt; — 55 distinct zones used by the Urban Redevelopment Authority for planning. Critical for understanding development potential and future MRT lines.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Understanding these regions is essential for any meaningful property analysis in Singapore.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why People Need This Data
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Property investors&lt;/strong&gt; track PSF trends across districts to identify undervalued areas before en-bloc fever hits. When a new MRT line is announced, prices in surrounding districts shift — having historical data lets you model these patterns.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Real estate agents&lt;/strong&gt; monitor competitor listings, pricing strategies, and agent market share. If a rival agency is dominating District 15 listings, you want to know.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;PropTech developers&lt;/strong&gt; build tools like mortgage calculators, investment dashboards, and rental yield estimators. All of these need fresh listing data as input.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Researchers and analysts&lt;/strong&gt; study housing affordability, the HDB resale market, and the impact of cooling measures on transaction volumes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Current Options (And Why They Fall Short)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Manual export&lt;/strong&gt;: PropertyGuru lets you browse and filter, but there's no bulk export. Copy-pasting 50 listings is tedious. Doing 5,000 is impossible.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Existing Apify actors&lt;/strong&gt;: As of early 2026, the PropertyGuru scrapers on Apify Store are &lt;strong&gt;deprecated and returning 0 results&lt;/strong&gt;. The site has changed its structure, and nobody has updated these actors.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Enterprise solutions&lt;/strong&gt;: Services like Bright Data offer pre-built datasets, but pricing starts at enterprise levels — overkill if you just need listing data for a specific district or property type.&lt;/p&gt;

&lt;h2&gt;
  
  
  Building Your Own PropertyGuru Scraper
&lt;/h2&gt;

&lt;p&gt;Here's a basic approach using Python and requests:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Important notes:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Always check PropertyGuru's &lt;code&gt;robots.txt&lt;/code&gt; and terms of service&lt;/li&gt;
&lt;li&gt;Use reasonable rate limiting (2+ seconds between requests)&lt;/li&gt;
&lt;li&gt;Selectors change frequently — you'll need to maintain your scraper&lt;/li&gt;
&lt;li&gt;Consider using Playwright or Selenium if the site relies heavily on JavaScript rendering&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Using Apify for Custom Tasks
&lt;/h2&gt;

&lt;p&gt;If you don't want to maintain your own scraper infrastructure, you can build a custom Apify actor. Apify handles proxy rotation, scheduling, and storage — you just write the scraping logic.&lt;/p&gt;

&lt;p&gt;For &lt;strong&gt;complementary location data&lt;/strong&gt;, the &lt;a href="https://apify.com/cryptosignals/google-maps-scraper" rel="noopener noreferrer"&gt;Google Maps Scraper&lt;/a&gt; on Apify can help you enrich property listings with nearby amenities — schools, MRT stations, hawker centres, malls, and clinics. Proximity to amenities is one of the biggest price drivers in Singapore real estate.&lt;/p&gt;

&lt;p&gt;For example, combine property listings with Google Maps data to answer: &lt;em&gt;"Which District 19 condos are within 500m of an MRT station AND have 3+ schools nearby?"&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Practical Tips for Singapore Property Data
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;HDB vs Private&lt;/strong&gt; — These are fundamentally different markets. HDB resale is regulated by HDB with transaction data publicly available at data.gov.sg. Private property (condos, landed) is where scraped data adds the most value.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;New launches vs Resale&lt;/strong&gt; — New launch pricing comes from developers and is often only on PropertyGuru temporarily. Resale listings stay longer and have richer data.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;PSF is king&lt;/strong&gt; — Price per square foot is how Singaporeans compare properties. Always normalize by size.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;MRT proximity matters enormously&lt;/strong&gt; — Properties within 500m of an MRT station command a 10-15% premium. The Thomson-East Coast Line (TEL) completions through 2025-2026 are reshaping values in Districts 15 and 18.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Check URA caveats&lt;/strong&gt; — Some listings are in areas zoned for future development or have plot ratio restrictions. Cross-reference with URA Master Plan data.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;The gap in the market for a reliable, maintained PropertyGuru scraper is real. The existing solutions are broken, and Singapore's property data needs are growing as more investors and PropTech startups enter the market.&lt;/p&gt;

&lt;p&gt;If you build something useful, consider publishing it on the &lt;a href="https://apify.com/store" rel="noopener noreferrer"&gt;Apify Store&lt;/a&gt; — there's clear demand from the Singapore market, and the existing actors aren't delivering.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Have questions about scraping property data in Singapore? Drop a comment below.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Powered by &lt;a href="https://apify.com?fpr=yw6md3" rel="noopener noreferrer"&gt;Apify&lt;/a&gt; — the web scraping platform used in this guide. &lt;a href="https://apify.com?fpr=yw6md3" rel="noopener noreferrer"&gt;Try it free →&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>singapore</category>
      <category>webdev</category>
      <category>data</category>
      <category>python</category>
    </item>
  </channel>
</rss>
