<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: George Kioko</title>
    <description>The latest articles on Forem by George Kioko (@the_aientrepreneur_7ae85).</description>
    <link>https://forem.com/the_aientrepreneur_7ae85</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3819055%2Fd9abfd38-f5cf-4c9c-bb04-30b1ea57dd40.jpg</url>
      <title>Forem: George Kioko</title>
      <link>https://forem.com/the_aientrepreneur_7ae85</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/the_aientrepreneur_7ae85"/>
    <language>en</language>
    <item>
      <title>How to Build a RAG Pipeline from YouTube Videos (Without an API)</title>
      <dc:creator>George Kioko</dc:creator>
      <pubDate>Thu, 16 Apr 2026 06:43:05 +0000</pubDate>
      <link>https://forem.com/the_aientrepreneur_7ae85/how-to-build-a-rag-pipeline-from-youtube-videos-without-an-api-4f73</link>
      <guid>https://forem.com/the_aientrepreneur_7ae85/how-to-build-a-rag-pipeline-from-youtube-videos-without-an-api-4f73</guid>
      <description>&lt;p&gt;If you validate emails with regex, you are checking if a string looks like an email. You are not checking if anyone will receive your message.&lt;/p&gt;

&lt;p&gt;I learned this the hard way. 12% bounce rate on a 5,000 email campaign. Sender reputation tanked. Half the "valid" emails were dead mailboxes that regex happily approved.&lt;/p&gt;

&lt;p&gt;here is what actually works and why the difference matters.&lt;/p&gt;

&lt;h2&gt;
  
  
  What regex checks
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This tells you the string has an @ symbol, a dot, and some characters around them. That is it.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;totally.fake.person@nonexistent-domain-12345.com&lt;/code&gt; passes regex. Nobody will ever receive that email.&lt;/p&gt;

&lt;h2&gt;
  
  
  What SMTP verification checks
&lt;/h2&gt;

&lt;p&gt;SMTP verification actually talks to the mail server. It connects on port 25, introduces itself with EHLO, then asks "would you accept mail for this address?" The server responds with a code:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;250 = yes, this mailbox exists&lt;/li&gt;
&lt;li&gt;550 = no, user unknown&lt;/li&gt;
&lt;li&gt;252 = catch all (accepts everything)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is the same protocol your email client uses to send mail. You are just stopping before actually delivering anything.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 5 layers of real validation
&lt;/h2&gt;

&lt;p&gt;I built an email validator that runs 5 checks in sequence:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 1: Format check.&lt;/strong&gt; Yes, regex. But just as a first filter to reject obvious garbage.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 2: Domain check.&lt;/strong&gt; Does the domain have MX records? If there are no mail servers configured, nobody is receiving email there. &lt;code&gt;dig MX example.com&lt;/code&gt; tells you instantly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 3: Disposable detection.&lt;/strong&gt; Is this mailinator, guerrillamail, tempmail? I maintain a list of 400+ disposable domains. These addresses work for about 10 minutes then disappear.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 4: Role detection.&lt;/strong&gt; admin@, info@, support@ are role addresses. They usually go to a shared inbox that nobody monitors for cold outreach.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 5: SMTP handshake.&lt;/strong&gt; The real check. Connect to the MX server, ask if the mailbox exists. This catches the dead addresses that everything else misses.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this looks like in practice
&lt;/h2&gt;

&lt;p&gt;Input: &lt;code&gt;hello@stripe.com&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"email"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"hello@stripe.com"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"valid"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"format_valid"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mx_found"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"smtp_check"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"valid"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"is_disposable"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"is_free"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"is_role_based"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"domain"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"stripe.com"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mx_records"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"priority"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"exchange"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"aspmx.l.google.com"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"score"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Input: &lt;code&gt;test@mailinator.com&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"email"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"test@mailinator.com"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"valid"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"format_valid"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mx_found"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"smtp_check"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"is_disposable"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"score"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;40&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"reason"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Disposable email address"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The disposable check caught it before we even bothered with SMTP.&lt;/p&gt;

&lt;h2&gt;
  
  
  The bounce rate difference
&lt;/h2&gt;

&lt;p&gt;Before SMTP validation: 12% bounce rate, emails landing in spam, sender score dropping.&lt;/p&gt;

&lt;p&gt;After: under 2% bounce rate. Same email copy, same sending infrastructure. The only change was filtering out dead addresses before hitting send.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cost comparison
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Method&lt;/th&gt;
&lt;th&gt;Cost per 1,000 emails&lt;/th&gt;
&lt;th&gt;Accuracy&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Regex only&lt;/td&gt;
&lt;td&gt;Free&lt;/td&gt;
&lt;td&gt;~60% (misses dead mailboxes)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ZeroBounce&lt;/td&gt;
&lt;td&gt;$1.60&lt;/td&gt;
&lt;td&gt;~95%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hunter.io&lt;/td&gt;
&lt;td&gt;$3.00&lt;/td&gt;
&lt;td&gt;~93%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NeverBounce&lt;/td&gt;
&lt;td&gt;$3.00&lt;/td&gt;
&lt;td&gt;~96%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;This API&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$2.00&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~98% (real SMTP)&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;

&lt;p&gt;The validator is on Apify Store with a free tier: &lt;a href="https://apify.com/george.the.developer/email-validator-api" rel="noopener noreferrer"&gt;Email Validator API&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Also available on &lt;a href="https://rapidapi.com/georgethedeveloper3046" rel="noopener noreferrer"&gt;RapidAPI&lt;/a&gt; if you prefer REST.&lt;/p&gt;

&lt;h3&gt;
  
  
  Python quickstart
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;apify_client&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ApifyClient&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ApifyClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_TOKEN&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;run&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;actor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;george.the.developer/email-validator-api&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;run_input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;email&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hello@stripe.com&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dataset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;defaultDatasetId&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]).&lt;/span&gt;&lt;span class="nf"&gt;list_items&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Valid: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;valid&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;, Score: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;score&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  curl
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="s2"&gt;"https://george-the-developer--email-validator-api.apify.actor/validate?email=hello@stripe.com"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer YOUR_TOKEN"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I build data tools. 57 actors on Apify Store, 869 users. Follow the build log at &lt;a href="https://x.com/ai_in_it" rel="noopener noreferrer"&gt;@ai_in_it on X&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>python</category>
    </item>
    <item>
      <title>Why Regex Email Validation Is Lying to You (And What Actually Works)</title>
      <dc:creator>George Kioko</dc:creator>
      <pubDate>Wed, 15 Apr 2026 09:34:37 +0000</pubDate>
      <link>https://forem.com/the_aientrepreneur_7ae85/why-regex-email-validation-is-lying-to-you-and-what-actually-works-451p</link>
      <guid>https://forem.com/the_aientrepreneur_7ae85/why-regex-email-validation-is-lying-to-you-and-what-actually-works-451p</guid>
      <description>&lt;p&gt;If you validate emails with regex, you are checking if a string looks like an email. You are not checking if anyone will receive your message.&lt;/p&gt;

&lt;p&gt;I learned this the hard way. 12% bounce rate on a 5,000 email campaign. Sender reputation tanked. Half the "valid" emails were dead mailboxes that regex happily approved.&lt;/p&gt;

&lt;p&gt;here is what actually works and why the difference matters.&lt;/p&gt;

&lt;h2&gt;
  
  
  What regex checks
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This tells you the string has an @ symbol, a dot, and some characters around them. That is it.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;totally.fake.person@nonexistent-domain-12345.com&lt;/code&gt; passes regex. Nobody will ever receive that email.&lt;/p&gt;

&lt;h2&gt;
  
  
  What SMTP verification checks
&lt;/h2&gt;

&lt;p&gt;SMTP verification actually talks to the mail server. It connects on port 25, introduces itself with EHLO, then asks "would you accept mail for this address?" The server responds with a code:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;250 = yes, this mailbox exists&lt;/li&gt;
&lt;li&gt;550 = no, user unknown&lt;/li&gt;
&lt;li&gt;252 = catch all (accepts everything)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is the same protocol your email client uses to send mail. You are just stopping before actually delivering anything.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 5 layers of real validation
&lt;/h2&gt;

&lt;p&gt;I built an email validator that runs 5 checks in sequence:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 1: Format check.&lt;/strong&gt; Yes, regex. But just as a first filter to reject obvious garbage.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 2: Domain check.&lt;/strong&gt; Does the domain have MX records? If there are no mail servers configured, nobody is receiving email there. &lt;code&gt;dig MX example.com&lt;/code&gt; tells you instantly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 3: Disposable detection.&lt;/strong&gt; Is this mailinator, guerrillamail, tempmail? I maintain a list of 400+ disposable domains. These addresses work for about 10 minutes then disappear.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 4: Role detection.&lt;/strong&gt; admin@, info@, support@ are role addresses. They usually go to a shared inbox that nobody monitors for cold outreach.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 5: SMTP handshake.&lt;/strong&gt; The real check. Connect to the MX server, ask if the mailbox exists. This catches the dead addresses that everything else misses.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this looks like in practice
&lt;/h2&gt;

&lt;p&gt;Input: &lt;code&gt;hello@stripe.com&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"email"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"hello@stripe.com"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"valid"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"format_valid"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mx_found"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"smtp_check"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"valid"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"is_disposable"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"is_free"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"is_role_based"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"domain"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"stripe.com"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mx_records"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"priority"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"exchange"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"aspmx.l.google.com"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"score"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Input: &lt;code&gt;test@mailinator.com&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"email"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"test@mailinator.com"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"valid"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"format_valid"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mx_found"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"smtp_check"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"is_disposable"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"score"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;40&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"reason"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Disposable email address"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The disposable check caught it before we even bothered with SMTP.&lt;/p&gt;

&lt;h2&gt;
  
  
  The bounce rate difference
&lt;/h2&gt;

&lt;p&gt;Before SMTP validation: 12% bounce rate, emails landing in spam, sender score dropping.&lt;/p&gt;

&lt;p&gt;After: under 2% bounce rate. Same email copy, same sending infrastructure. The only change was filtering out dead addresses before hitting send.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cost comparison
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Method&lt;/th&gt;
&lt;th&gt;Cost per 1,000 emails&lt;/th&gt;
&lt;th&gt;Accuracy&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Regex only&lt;/td&gt;
&lt;td&gt;Free&lt;/td&gt;
&lt;td&gt;~60% (misses dead mailboxes)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ZeroBounce&lt;/td&gt;
&lt;td&gt;$1.60&lt;/td&gt;
&lt;td&gt;~95%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hunter.io&lt;/td&gt;
&lt;td&gt;$3.00&lt;/td&gt;
&lt;td&gt;~93%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NeverBounce&lt;/td&gt;
&lt;td&gt;$3.00&lt;/td&gt;
&lt;td&gt;~96%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;This API&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$2.00&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~98% (real SMTP)&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;

&lt;p&gt;The validator is on Apify Store with a free tier: &lt;a href="https://apify.com/george.the.developer/email-validator-api" rel="noopener noreferrer"&gt;Email Validator API&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Also available on &lt;a href="https://rapidapi.com/georgethedeveloper3046" rel="noopener noreferrer"&gt;RapidAPI&lt;/a&gt; if you prefer REST.&lt;/p&gt;

&lt;h3&gt;
  
  
  Python quickstart
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;apify_client&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ApifyClient&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ApifyClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_TOKEN&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;run&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;actor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;george.the.developer/email-validator-api&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;run_input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;email&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hello@stripe.com&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dataset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;defaultDatasetId&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]).&lt;/span&gt;&lt;span class="nf"&gt;list_items&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Valid: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;valid&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;, Score: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;score&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  curl
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="s2"&gt;"https://george-the-developer--email-validator-api.apify.actor/validate?email=hello@stripe.com"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer YOUR_TOKEN"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I build data tools. 57 actors on Apify Store, 869 users. Follow the build log at &lt;a href="https://x.com/ai_in_it" rel="noopener noreferrer"&gt;@ai_in_it on X&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>tutorial</category>
    </item>
    <item>
      <title>I Built an AI Powered Influencer Finder That Costs Almost Nothing to Run</title>
      <dc:creator>George Kioko</dc:creator>
      <pubDate>Wed, 15 Apr 2026 02:59:48 +0000</pubDate>
      <link>https://forem.com/the_aientrepreneur_7ae85/i-built-an-ai-powered-influencer-finder-that-costs-almost-nothing-to-run-47g1</link>
      <guid>https://forem.com/the_aientrepreneur_7ae85/i-built-an-ai-powered-influencer-finder-that-costs-almost-nothing-to-run-47g1</guid>
      <description>&lt;p&gt;Most influencer discovery tools charge $200-500/month. I built one that costs me cheap to run and finds real influencer profiles with names, follower counts, bios, and emails across Instagram, TikTok, and YouTube.&lt;/p&gt;

&lt;p&gt;Here's exactly how it works, what broke along the way, and the architecture that finally made it reliable.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;A brand asked me to find 50 fitness micro-influencers on Instagram with contact info. The options were:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Upfluence&lt;/strong&gt;: $478/month minimum&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Modash&lt;/strong&gt;: $299/month&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Manual research&lt;/strong&gt;: 3 hours on Instagram, copy-pasting into a spreadsheet&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I figured I could automate this for pennies.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Architecture (What Actually Works)
&lt;/h2&gt;

&lt;p&gt;After three failed approaches, here's what stuck:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Google SERP search (via Apify GOOGLE_SERP proxy)
  -&amp;gt; Extract social profile URLs from search results
    -&amp;gt; HTTP fetch each profile (via Apify residential proxy + Googlebot UA)
      -&amp;gt; Parse OG meta tags for real names, follower counts, bios
        -&amp;gt; Output structured data
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key insight: &lt;strong&gt;you don't need to render Instagram pages in a browser.&lt;/strong&gt; Instagram serves complete Open Graph meta tags to Googlebot. A simple HTTP GET with the right User-Agent through a residential proxy returns everything you need.&lt;/p&gt;

&lt;p&gt;For example, fetching &lt;code&gt;https://www.instagram.com/kayla_itsines/&lt;/code&gt; with a Googlebot header returns:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;og:title&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;KAYLA&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;ITSINES&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;(@kayla_itsines).&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Instagram&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;photos&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;and&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;videos"&lt;/span&gt;
&lt;span class="na"&gt;og:description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;16M&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Followers,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;845&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Following,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;8,977&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Posts"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Real name, follower count, post count. No browser. No login. No CAPTCHA.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Broke (And How I Fixed It)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Attempt 1: Puppeteer + Apify Proxy&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Used PuppeteerCrawler to search Google and visit profiles. Google CAPTCHA'd me. Instagram detected headless Chrome. Got 0 results.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Attempt 2: crawl4ai on VPS (direct IP)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Deployed crawl4ai (real Chromium) on a cheap Contabo VPS. Worked for normal sites but Google and Instagram both blocked the datacenter IP. 0 results again.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Attempt 3: crawl4ai + Apify proxy pipeline&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The fix: route crawl4ai's browser traffic through Apify's proxy pool.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Google searches go through &lt;code&gt;GOOGLE_SERP&lt;/code&gt; proxy group (designed for Google)&lt;/li&gt;
&lt;li&gt;Instagram profile fetches go through &lt;code&gt;RESIDENTIAL&lt;/code&gt; proxy group (residential IPs)&lt;/li&gt;
&lt;li&gt;Use a lightweight HTTP fetch endpoint (no browser needed for profile pages)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is what finally worked consistently.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Gemma 4 Enhancement
&lt;/h2&gt;

&lt;p&gt;The VPS also runs Google's Gemma 4 (2B parameter model) via Ollama. When the regex-based profile extraction from SERP results misses something, Gemma acts as an intelligent fallback:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"Given these Google search results, extract all Instagram profile URLs, 
usernames, display names, and follower counts. Return JSON."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With &lt;code&gt;think: false&lt;/code&gt; (disabling chain-of-thought reasoning), Gemma responds in 3-5 seconds instead of 60. For simple classification tasks, the thinking overhead isn't worth it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Real Results
&lt;/h2&gt;

&lt;p&gt;Running "beauty" niche on Instagram, 5 results requested:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Username&lt;/th&gt;
&lt;th&gt;Real Name&lt;/th&gt;
&lt;th&gt;Followers&lt;/th&gt;
&lt;th&gt;Source&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;@mikaylajmakeup&lt;/td&gt;
&lt;td&gt;Mikayla Jane Nogueira&lt;/td&gt;
&lt;td&gt;3M&lt;/td&gt;
&lt;td&gt;og_meta_enriched&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;@ericataylor2347&lt;/td&gt;
&lt;td&gt;Erica Taylor&lt;/td&gt;
&lt;td&gt;2M&lt;/td&gt;
&lt;td&gt;og_meta_enriched&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;@darcybylauren&lt;/td&gt;
&lt;td&gt;lauren janelle&lt;/td&gt;
&lt;td&gt;189K&lt;/td&gt;
&lt;td&gt;og_meta_enriched&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;@amandaensing&lt;/td&gt;
&lt;td&gt;Amanda Ensing&lt;/td&gt;
&lt;td&gt;1M&lt;/td&gt;
&lt;td&gt;og_meta_enriched&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;@jamiegenevieve&lt;/td&gt;
&lt;td&gt;Jamie Genevieve&lt;/td&gt;
&lt;td&gt;1M&lt;/td&gt;
&lt;td&gt;og_meta_enriched&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;All real names (not just handles), all real follower counts, all in about 2 minutes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cost Breakdown
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Monthly Cost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Contabo VPS (6 vCPU, 12GB RAM)&lt;/td&gt;
&lt;td&gt;Under $15&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Apify Creator Plan&lt;/td&gt;
&lt;td&gt;$1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Apify proxy usage&lt;/td&gt;
&lt;td&gt;~$2-5 per 1000 searches&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~$11-14/month&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Compare that to $200-500/month for commercial influencer tools.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Code
&lt;/h2&gt;

&lt;p&gt;The full source is on GitHub: &lt;a href="https://github.com/the-ai-entrepreneur-ai-hub/influencer-marketing-intel" rel="noopener noreferrer"&gt;influencer-marketing-intel&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Or try it directly on Apify (no code needed): &lt;a href="https://apify.com/george.the.developer/influencer-marketing-intel" rel="noopener noreferrer"&gt;Influencer Marketing Intelligence&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Input:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"niche"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"beauty"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"platforms"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"instagram"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"tiktok"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"youtube"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"maxResults"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"followerRange"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"micro_10k_100k"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output: structured JSON with username, displayName, estimatedFollowers, bio, contactEmails, nicheTags, profileUrl for each influencer found.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd Do Differently
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Start with the OG meta approach from day one.&lt;/strong&gt; I wasted weeks trying to make Puppeteer work on Instagram. The Googlebot UA trick was the breakthrough.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Don't fight anti-bot systems, route around them.&lt;/strong&gt; Residential proxies cost pennies and save hours of debugging.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Local LLMs for extraction are underrated.&lt;/strong&gt; Gemma 4 on a VPS replaces brittle regex patterns. When Instagram changes their HTML structure, Gemma adapts. Regex doesn't.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;




&lt;p&gt;&lt;em&gt;I build scraping tools 57 actors on Apify Store, 869 users. If you have a data problem that needs automating, I probably already built the tool.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Follow the build log: &lt;a href="https://x.com/ai_in_it" rel="noopener noreferrer"&gt;@ai_in_it on X&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
    </item>
    <item>
      <title>2 Users Pay Me More Than 353 Users: The Pricing Lesson That Changed Everything</title>
      <dc:creator>George Kioko</dc:creator>
      <pubDate>Tue, 14 Apr 2026 07:38:09 +0000</pubDate>
      <link>https://forem.com/the_aientrepreneur_7ae85/2-users-pay-me-more-than-353-users-the-pricing-lesson-that-changed-everything-35pn</link>
      <guid>https://forem.com/the_aientrepreneur_7ae85/2-users-pay-me-more-than-353-users-the-pricing-lesson-that-changed-everything-35pn</guid>
      <description>&lt;p&gt;I have 48 actors running on Apify. Same platform, same developer, same tech stack. Two of those actors tell completely different stories about how software makes money.&lt;/p&gt;

&lt;p&gt;My LinkedIn Employee Scraper has 353 users. It runs thousands of times per month. It charges $0.005 per profile scraped. Total monthly revenue from all those users and all those runs? About $9.&lt;/p&gt;

&lt;p&gt;My Google Maps Lead Intel actor has 2 users. Two. They run it about 22 times per month between them, paying roughly $25 per run. Monthly revenue? Around $540.&lt;/p&gt;

&lt;p&gt;That is a 60x difference in revenue per user. Same platform. Same developer. Same billing system.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Makes Google Maps Worth $25 a Run
&lt;/h2&gt;

&lt;p&gt;The LinkedIn scraper returns raw data. Names, titles, company info. It does one thing and does it well, but developers treat it like a commodity. They plug it into their own pipelines and expect it to cost almost nothing. At $0.005 per profile, it basically does.&lt;/p&gt;

&lt;p&gt;Google Maps Lead Intel returns something different. For every business it finds, you get validated email addresses, a lead score based on 12 online presence signals, Google Ads detection, website tech stack analysis, social media profiles, and review sentiment. It is not scraping. It is intelligence.&lt;/p&gt;

&lt;p&gt;The two users paying $25 per run are lead generation agencies. One services appointment setting clients across 15 metro areas. The other runs local SEO audits. For both of them, a single $25 run replaces 3 to 4 hours of manual research that would cost $200+ if done by a VA.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Buyer Problem
&lt;/h2&gt;

&lt;p&gt;Here is what I missed for months: the LinkedIn scraper attracts developers. Developers are price sensitive. They can build their own scraper given enough time, so they benchmark your tool against their hourly rate. If your scraper costs more than 20 minutes of their time to build, they will build it themselves.&lt;/p&gt;

&lt;p&gt;The Google Maps actor attracts agencies. Agency buyers think in terms of client value, not engineering time. If their client pays $1,500/month for lead gen services and your tool costs $25 per market, that is a rounding error in their margin. They do not negotiate. They do not churn. They run it more as they sign more clients.&lt;/p&gt;

&lt;p&gt;Same platform. Totally different buyer psychology.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Actually Changed
&lt;/h2&gt;

&lt;p&gt;The technical shift was not dramatic. I stopped returning raw JSON blobs and started returning enriched, scored, validated output. Specifically:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Raw Google Maps results became leads with quality scores&lt;/li&gt;
&lt;li&gt;Guessed emails became validated emails with deliverability checks&lt;/li&gt;
&lt;li&gt;Basic business info became competitive intelligence with ad spend signals&lt;/li&gt;
&lt;li&gt;Flat data became actionable reports that agencies could forward to clients&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The pricing shift followed naturally. When your output saves someone 4 hours of work and costs them $25, you are not competing on data volume. You are competing on time saved and decision quality.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Numbers I Wish I Knew Earlier
&lt;/h2&gt;

&lt;p&gt;353 users at $0.005/run = roughly $9/month. Those users submit support tickets, request features, and compare you to 6 other LinkedIn scrapers in the Apify Store.&lt;/p&gt;

&lt;p&gt;2 users at $25/run = roughly $540/month. Those users send you "thank you" messages and ask if you can build them something custom.&lt;/p&gt;

&lt;p&gt;If I could go back and rebuild my portfolio from scratch, I would build fewer tools and make each one solve a complete problem for a specific buyer. Not "scrape this website" but "find me qualified leads in this market with contact info I can trust."&lt;/p&gt;

&lt;h2&gt;
  
  
  The Takeaway
&lt;/h2&gt;

&lt;p&gt;Stop counting users. Start counting revenue per user. Build for the buyer who measures your tool against the cost of the alternative, not against the cost of building it themselves. Package intelligence, not data.&lt;/p&gt;

&lt;p&gt;The developer who needs 10,000 LinkedIn profiles will always shop on price. The agency owner who needs 200 qualified leads by Friday will pay whatever gets it done.&lt;/p&gt;

&lt;p&gt;I know which buyer I am building for now.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Built in Nairobi. 48 actors in production. Questions? Drop them below.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>javascript</category>
      <category>saas</category>
      <category>api</category>
      <category>startup</category>
    </item>
    <item>
      <title>Why Local Lead Gen Agencies Are Paying $39 Per Google Maps Search (And Getting 10x ROI)</title>
      <dc:creator>George Kioko</dc:creator>
      <pubDate>Tue, 14 Apr 2026 01:10:47 +0000</pubDate>
      <link>https://forem.com/the_aientrepreneur_7ae85/why-local-lead-gen-agencies-are-paying-39-per-google-maps-search-and-getting-10x-roi-3g9a</link>
      <guid>https://forem.com/the_aientrepreneur_7ae85/why-local-lead-gen-agencies-are-paying-39-per-google-maps-search-and-getting-10x-roi-3g9a</guid>
      <description>&lt;p&gt;If you run a local lead gen agency, you already know the grind. Client says "find me 200 plumbers in Phoenix." You open Google Maps, start scrolling, copying names into a spreadsheet, checking websites, hunting for emails. Four hours later you have maybe 80 results and half the emails bounce.&lt;/p&gt;

&lt;p&gt;That's the old way. Here's why agencies are switching to a single API call instead.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Manual Research Problem
&lt;/h2&gt;

&lt;p&gt;A typical local business research session looks like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Search Google Maps for the category and city&lt;/li&gt;
&lt;li&gt;Click each result, grab the name, phone, address&lt;/li&gt;
&lt;li&gt;Visit the website to find an email&lt;/li&gt;
&lt;li&gt;Check if the business is actually active&lt;/li&gt;
&lt;li&gt;Score the lead based on reviews, website quality, ad presence&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;For one city and one category, that's 3 to 4 hours of work. If your client wants 10 cities, you're looking at a full work week just on data collection. Most agencies hire VAs at $15 to $25/hour for this, which means $200+ per market.&lt;/p&gt;

&lt;h2&gt;
  
  
  What One API Call Returns
&lt;/h2&gt;

&lt;p&gt;I built Google Maps Lead Intel specifically for agencies who need this data fast. You pass in a search query like "dentist Phoenix AZ" and it returns:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Business name, address, phone, website&lt;/li&gt;
&lt;li&gt;Email addresses (validated, not guessed)&lt;/li&gt;
&lt;li&gt;Google rating and review count&lt;/li&gt;
&lt;li&gt;Whether they run Google Ads&lt;/li&gt;
&lt;li&gt;Website tech stack and CMS&lt;/li&gt;
&lt;li&gt;Social media profiles&lt;/li&gt;
&lt;li&gt;A lead score based on online presence signals&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One call. 90 seconds. $39 per search area.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Math That Makes Agencies Switch
&lt;/h2&gt;

&lt;p&gt;Here's the comparison that keeps coming up in conversations with users:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Method&lt;/th&gt;
&lt;th&gt;Cost per market&lt;/th&gt;
&lt;th&gt;Time&lt;/th&gt;
&lt;th&gt;Data quality&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Manual VA research&lt;/td&gt;
&lt;td&gt;$200+&lt;/td&gt;
&lt;td&gt;4 hours&lt;/td&gt;
&lt;td&gt;Inconsistent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Google Maps Lead Intel&lt;/td&gt;
&lt;td&gt;$39&lt;/td&gt;
&lt;td&gt;90 seconds&lt;/td&gt;
&lt;td&gt;Validated emails, scored leads&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;That's an 80% cost reduction and you get better data. The validated emails alone save you from bounce rate problems that kill sender reputation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Who Actually Uses This
&lt;/h2&gt;

&lt;p&gt;Two agencies using Google Maps Lead Intel right now generate about $540/month between them on the platform. One runs it for appointment setting clients across 15 metro areas. The other uses it for local SEO audits where they need to show clients their competitive landscape.&lt;/p&gt;

&lt;p&gt;The pattern is the same: client pays $500 to $2,000/month for lead gen services, agency spends $39 per market on data, and the rest is margin.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why $39 Is Actually Cheap
&lt;/h2&gt;

&lt;p&gt;Think about what you're replacing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No VA training or management&lt;/li&gt;
&lt;li&gt;No spreadsheet cleanup&lt;/li&gt;
&lt;li&gt;No email verification tools (already validated)&lt;/li&gt;
&lt;li&gt;No waiting 4 hours per market&lt;/li&gt;
&lt;li&gt;No inconsistent data formats&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A VA doing this work costs $200+ per market. Even a junior employee at $20/hour spends 4 hours minimum, which is $80 before overhead. At $39 you're getting enriched, validated, scored data in 90 seconds.&lt;/p&gt;

&lt;h2&gt;
  
  
  Real Use Cases From Current Users
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Appointment setting agencies&lt;/strong&gt; run it for every new client market. Dentists, chiropractors, HVAC, plumbers. They load the results directly into their CRM and start outreach the same day.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Local SEO companies&lt;/strong&gt; use it to build competitive analyses. "Here are the 50 businesses ranking for your keyword, here's their review count, here's who runs ads." That report alone justifies a $1,000 monthly retainer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Commercial real estate&lt;/strong&gt; teams pull business data for tenant prospecting. They need to know who operates in specific areas, what their online presence looks like, and how to reach them.&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting Started
&lt;/h2&gt;

&lt;p&gt;The actor runs on Apify. You can test it from the Apify Store page with any search query. Results come back as structured JSON that drops straight into any CRM, spreadsheet, or automation tool.&lt;/p&gt;

&lt;p&gt;If you're spending more than 2 hours per week on manual Google Maps research, this pays for itself on the first run.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Try it&lt;/strong&gt;: &lt;a href="https://apify.com/georgethedeveloper/google-maps-lead-intel" rel="noopener noreferrer"&gt;Google Maps Lead Intel on Apify Store&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Built in Nairobi. 48 actors in production, 869 users, $211 revenue last month. Questions? Drop them below.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>marketing</category>
      <category>saas</category>
      <category>api</category>
      <category>business</category>
    </item>
    <item>
      <title>From 0 to 1,092 Visitors and 419 Actor Starts: What I Learned Building 53 APIs</title>
      <dc:creator>George Kioko</dc:creator>
      <pubDate>Mon, 13 Apr 2026 10:46:16 +0000</pubDate>
      <link>https://forem.com/the_aientrepreneur_7ae85/from-0-to-1092-visitors-and-419-actor-starts-what-i-learned-building-53-apis-45ng</link>
      <guid>https://forem.com/the_aientrepreneur_7ae85/from-0-to-1092-visitors-and-419-actor-starts-what-i-learned-building-53-apis-45ng</guid>
      <description>&lt;p&gt;Last month, 1,092 people visited my Apify Store page. 281 of them clicked through to an actor's input page. 419 hit the Start button and actually ran something. Those numbers are small by SaaS standards. But for a solo developer in Nairobi who did not know JavaScript 4 months ago, they represent something real.&lt;/p&gt;

&lt;p&gt;Here is what I learned watching that funnel take shape.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Funnel Nobody Tells You About
&lt;/h2&gt;

&lt;p&gt;Apify gives you analytics on every actor. The flow looks like this:&lt;/p&gt;

&lt;p&gt;1,092 page views. 281 input page views. 419 actor starts.&lt;/p&gt;

&lt;p&gt;That last number being higher than input page views confused me at first. It happens because returning users skip the description page and go straight to the input form. Repeat usage matters more than first impressions. Some users run a single actor 200+ times. One person ran my WHOIS lookup 262 times in a month.&lt;/p&gt;

&lt;p&gt;The lesson? Retention is baked into the product, not the marketing. If the tool works well on the first run, they come back without being asked.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where Traffic Actually Comes From
&lt;/h2&gt;

&lt;p&gt;I expected Twitter and dev.to articles to drive most of my growth. I was wrong.&lt;/p&gt;

&lt;p&gt;69% of my traffic comes from Apify Store search. People type "google scholar scraper" or "email validator" into the Apify marketplace and find my tools. This is basically app store SEO. The actors with the best README files, clear titles, and specific keywords rank higher.&lt;/p&gt;

&lt;p&gt;12% comes from Google. Those same README files index on Google, so people searching for niche data problems land on my actors directly. Two of my actors rank on the first page for their target keywords.&lt;/p&gt;

&lt;p&gt;8% comes from the Apify Console, meaning existing users discover new actors while browsing their dashboard.&lt;/p&gt;

&lt;p&gt;The remaining 11% is social media, articles, and direct links. All that tweeting and article writing moves the needle, but not as much as writing a good README.&lt;/p&gt;

&lt;h2&gt;
  
  
  The China and Singapore Surprise
&lt;/h2&gt;

&lt;p&gt;39% of my traffic comes from China and Singapore. I built every actor assuming my users would be in the US and Europe. Completely wrong.&lt;/p&gt;

&lt;p&gt;It makes sense in hindsight. Developers and data teams in Asia need the same tools. English language documentation works globally. And the Apify Store does not have geographic barriers.&lt;/p&gt;

&lt;p&gt;This changed how I think about naming and descriptions. I stopped using US specific references and started writing for a global audience. Small shift, big impact on discoverability.&lt;/p&gt;

&lt;h2&gt;
  
  
  From Scrapers to Solutions
&lt;/h2&gt;

&lt;p&gt;My early actors had names like "LinkedIn Employee Scraper" and "YouTube Transcript Extractor." They worked fine but competed with dozens of free alternatives.&lt;/p&gt;

&lt;p&gt;The actors that grew fastest were the ones I repositioned as solutions. "Google Maps Lead Intel" instead of "Google Maps Scraper." "Entity OSINT Analyzer" instead of "Entity Search." "Competitor Intelligence" instead of "LinkedIn Comparison Tool."&lt;/p&gt;

&lt;p&gt;Same code underneath. Different framing. The solution named actors attract buyers. The scraper named actors attract tire kickers who want everything free.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Power User Pattern
&lt;/h2&gt;

&lt;p&gt;Most users try an actor once. Maybe twice. Then they leave. But a small group of users runs the same actor hundreds of times. These are the users who pay for everything.&lt;/p&gt;

&lt;p&gt;My WHOIS lookup has one user with 262 runs. Google Scholar has someone at 230. AI Content Detector at 132. These power users are building automated pipelines that call my actors on a schedule or in bulk.&lt;/p&gt;

&lt;p&gt;The business model only works because of them. Optimizing for power users (faster response times, better error handling, higher rate limits) matters more than converting first time visitors.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Would Do Differently
&lt;/h2&gt;

&lt;p&gt;I wasted time on Reddit (got banned from r/webscraping for posting too aggressively), cold DMs (zero conversions), and Discord communities (untrackable results).&lt;/p&gt;

&lt;p&gt;If I started over today, I would spend 80% of my time writing great README files with clear keywords and 20% writing technical articles on dev.to and Hashnode. Everything else is noise.&lt;/p&gt;

&lt;p&gt;53 actors. 869 users. 1,092 page views. Built solo from Nairobi.&lt;/p&gt;

&lt;p&gt;The tools that grow are the ones that solve a specific problem well and show up when someone searches for that problem. That is the entire playbook.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;I build data APIs and scrapers on the Apify Store. If you need structured data from any website, check out my portfolio: &lt;a href="https://apify.com/george.the.developer" rel="noopener noreferrer"&gt;george.the.developer on Apify&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>javascript</category>
      <category>webdev</category>
      <category>api</category>
      <category>saas</category>
    </item>
    <item>
      <title>Two APIs I Built This Week That Cost Nothing to Run</title>
      <dc:creator>George Kioko</dc:creator>
      <pubDate>Sun, 12 Apr 2026 23:25:33 +0000</pubDate>
      <link>https://forem.com/the_aientrepreneur_7ae85/two-apis-i-built-this-week-that-cost-nothing-to-run-1cc</link>
      <guid>https://forem.com/the_aientrepreneur_7ae85/two-apis-i-built-this-week-that-cost-nothing-to-run-1cc</guid>
      <description>&lt;p&gt;Most APIs have a dirty secret in their pricing: the upstream service they call costs money, and that cost gets passed to you plus margin. LLM based APIs charge you for tokens. Geocoding APIs charge you for lookups. Data enrichment APIs charge you for the enrichment source.&lt;/p&gt;

&lt;p&gt;I wanted to build APIs where the underlying operation costs literally zero. Here are two I shipped this week.&lt;/p&gt;

&lt;h2&gt;
  
  
  API 1: DNS Record Checker
&lt;/h2&gt;

&lt;p&gt;Node.js ships with a built in &lt;code&gt;dns&lt;/code&gt; module. It can resolve A records, MX records, CNAME, TXT, NS, and more. No external API call needed. No third party service. The DNS resolution happens through the operating system's resolver, which is free.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;dns&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;dns/promises&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;records&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;dns&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;resolveAny&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;example.com&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="c1"&gt;// Returns A, AAAA, MX, TXT, NS, SOA records&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. Zero dependency, zero API cost, zero rate limits from upstream providers.&lt;/p&gt;

&lt;p&gt;The actor wraps this into a clean JSON API. Pass it a domain, get back every DNS record type with TTLs, priorities for MX records, and SPF/DKIM/DMARC validation. The whole thing runs on Apify's Standby infrastructure so it responds in under a second.&lt;/p&gt;

&lt;p&gt;Use cases that keep coming up: automated domain verification for SaaS onboarding, email deliverability checks (MX + SPF + DKIM in one call), security audits scanning for misconfigured DNS, and monitoring tools that alert when records change unexpectedly.&lt;/p&gt;

&lt;h2&gt;
  
  
  API 2: Sentiment Analysis
&lt;/h2&gt;

&lt;p&gt;The common approach to sentiment analysis is sending text to an LLM and paying per token. That works but it's expensive at scale and adds latency.&lt;/p&gt;

&lt;p&gt;Instead I used a word level lexicon approach. The API scores text using a pre built dictionary of ~7,000 words with known sentiment values. No LLM call. No external API. The scoring runs entirely in memory on the Node.js process.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Simplified version of the scoring logic&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;words&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reduce&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;word&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;sum&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;lexicon&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;word&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nx"&gt;words&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The result includes an overall sentiment score, confidence level, and breakdown of positive vs negative word matches. It handles negation ("not good" scores negative) and intensifiers ("very good" scores higher than "good").&lt;/p&gt;

&lt;p&gt;Is it as nuanced as GPT? No. But for brand monitoring, review analysis, social media tracking, and content moderation at scale, a deterministic lexicon approach that returns in 50ms beats a 2 second LLM call that costs 10x more.&lt;/p&gt;

&lt;h2&gt;
  
  
  The pattern worth noticing
&lt;/h2&gt;

&lt;p&gt;Both of these APIs follow the same principle: use what's already built into the runtime or ship a static dataset with the code. No external dependencies that cost money per call.&lt;/p&gt;

&lt;p&gt;This matters because of what I've seen with my existing domain tools. The WHOIS Lookup actor has power users running 262 lookups per user on average. Domain and DNS tools get embedded in automated workflows and run at high volume. When your per call cost is zero, your margin stays healthy no matter how much a single user hammers the API.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pricing
&lt;/h2&gt;

&lt;p&gt;DNS Record Checker: $0.003 per lookup. Sentiment Analysis: $0.003 per text analysis. Both running on Apify Standby mode for instant responses.&lt;/p&gt;

&lt;p&gt;The infrastructure cost is just Apify compute time. No upstream API bills eating into revenue.&lt;/p&gt;

&lt;p&gt;Try them on Apify:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://apify.com/george.the.developer/dns-record-checker" rel="noopener noreferrer"&gt;DNS Record Checker&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://apify.com/george.the.developer/sentiment-analysis-api" rel="noopener noreferrer"&gt;Sentiment Analysis API&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Built in Nairobi. 52 actors, zero external API costs on these two. Comments and questions welcome.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>webdev</category>
    </item>
    <item>
      <title>I Built a Tool That Shows Why Your Competitor's LinkedIn Posts Win</title>
      <dc:creator>George Kioko</dc:creator>
      <pubDate>Sun, 12 Apr 2026 23:23:56 +0000</pubDate>
      <link>https://forem.com/the_aientrepreneur_7ae85/i-built-a-tool-that-shows-why-your-competitors-linkedin-posts-win-2gm6</link>
      <guid>https://forem.com/the_aientrepreneur_7ae85/i-built-a-tool-that-shows-why-your-competitors-linkedin-posts-win-2gm6</guid>
      <description>&lt;p&gt;You post on LinkedIn three times a week. Your competitor posts twice. They get 3x your engagement. Why?&lt;/p&gt;

&lt;p&gt;I kept asking myself this question and never had a real answer beyond "their content is better." That's not useful. I wanted specifics. What topics are they covering that I'm not? What formats are they using? What posting patterns work for them?&lt;/p&gt;

&lt;p&gt;So I built a tool that runs the comparison automatically.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem nobody talks about
&lt;/h2&gt;

&lt;p&gt;Most LinkedIn "strategy" advice is generic. Post more videos. Use carousels. Write hooks. But that advice ignores the most important variable: your specific niche, your specific audience, your specific competitors.&lt;/p&gt;

&lt;p&gt;What works for a SaaS founder posting about product updates is completely different from what works for a recruiter sharing hiring tips. You need data from your actual competitive set, not broad averages.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the tool actually does
&lt;/h2&gt;

&lt;p&gt;You give it two LinkedIn company or creator URLs. It pulls recent posts from both accounts and runs a side by side analysis:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Topic gap analysis&lt;/strong&gt;: What themes does your competitor cover that you never touch? Maybe they post about industry news every Monday and get solid engagement, while you only post product updates.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Format comparison&lt;/strong&gt;: Are they using video, carousels, text posts, polls? The tool breaks down the format mix for both accounts and shows where the gaps are.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Engagement benchmarks&lt;/strong&gt;: Average likes, comments, and shares per post type. Not vanity metrics but actionable patterns like "their video posts get 2.3x the engagement of their text posts."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Posting cadence&lt;/strong&gt;: When do they post? How often? Is there a pattern to their highest performing content?&lt;/p&gt;

&lt;h2&gt;
  
  
  Real output example
&lt;/h2&gt;

&lt;p&gt;I tested it on two fintech companies. Here's what came back:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;TechCorp vs YourBrand — 30 day comparison

TechCorp: 100% video content, avg 847 reactions/post
YourBrand: 85% text, 15% carousel, avg 312 reactions/post

Topic gaps you're missing:
  - Industry commentary (TechCorp: 40% of posts, you: 0%)
  - Customer stories (TechCorp: 25% of posts, you: 5%)

Recommendation: Test 2 video posts/week covering
industry news. TechCorp's video format gets 2.3x
higher engagement than their own text posts.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's specific enough to actually change your content calendar.&lt;/p&gt;

&lt;h2&gt;
  
  
  How it works under the hood
&lt;/h2&gt;

&lt;p&gt;The actor uses Crawlee with Playwright to visit LinkedIn pages and extract post data. No API key needed, no Sales Navigator subscription. It runs on Apify's cloud infrastructure so you don't need to manage servers or proxies.&lt;/p&gt;

&lt;p&gt;The analysis logic compares engagement distributions, runs basic NLP for topic categorization, and generates the recommendations based on statistical gaps between the two accounts.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pricing
&lt;/h2&gt;

&lt;p&gt;$0.02 per comparison report. Run it weekly on your top 3 competitors and you're spending about $2.40/month for ongoing competitive intelligence. Compare that to social listening tools charging $200+/month.&lt;/p&gt;

&lt;p&gt;Try it on Apify: &lt;a href="https://apify.com/george.the.developer/linkedin-competitor-intelligence" rel="noopener noreferrer"&gt;LinkedIn Competitor Intelligence&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The first run takes about 60 seconds. You'll know exactly what to change before your next LinkedIn post.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Built in Nairobi. 52 actors in production, 746+ users. Questions welcome in the comments.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>saas</category>
    </item>
    <item>
      <title>I Built a TikTok Shop Product Finder That 30 People Actually Use</title>
      <dc:creator>George Kioko</dc:creator>
      <pubDate>Sat, 11 Apr 2026 15:39:59 +0000</pubDate>
      <link>https://forem.com/the_aientrepreneur_7ae85/i-built-a-tiktok-shop-product-finder-that-30-people-actually-use-2fog</link>
      <guid>https://forem.com/the_aientrepreneur_7ae85/i-built-a-tiktok-shop-product-finder-that-30-people-actually-use-2fog</guid>
      <description>&lt;p&gt;Six months ago I started building scrapers on Apify as a solo developer. Most of them got 2 or 3 users and collected dust. But the TikTok Shop scraper hit different. It now has about 30 active users and nearly 300 runs, which makes it one of my most popular tools.&lt;/p&gt;

&lt;p&gt;Here is the story of why I built it and what I learned.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Dropshipping Problem
&lt;/h2&gt;

&lt;p&gt;If you are in ecommerce or dropshipping, you have probably noticed TikTok Shop is where the action moved. Products go viral on TikTok and sell thousands of units before they even appear on Amazon. The affiliate program is printing money for creators who pick the right products early.&lt;/p&gt;

&lt;p&gt;But finding those trending products is a nightmare. TikTok does not have a public product API. Their website renders everything dynamically with heavy JavaScript. Traditional scraping tools break constantly because TikTok changes their frontend every few weeks.&lt;/p&gt;

&lt;p&gt;Most dropshippers end up scrolling TikTok manually for hours, screenshotting products, and hoping they picked a winner. That felt like a problem worth solving.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the Scraper Does
&lt;/h2&gt;

&lt;p&gt;The &lt;a href="https://apify.com/george.the.developer/tiktok-shop-scraper" rel="noopener noreferrer"&gt;TikTok Shop Scraper&lt;/a&gt; lets you search TikTok Shop by keyword or category and pulls back structured product data: titles, prices, ratings, review counts, seller info, and product URLs.&lt;/p&gt;

&lt;p&gt;You type in "kitchen gadgets" or "phone accessories" and get back a clean JSON dataset of what is actually selling on TikTok Shop right now. No manual scrolling. No screenshots. No guessing.&lt;/p&gt;

&lt;p&gt;The real value is in the numbers. When you can see that a specific garlic press has 4,200 reviews and a 4.8 star rating, you know it is moving units. Compare that to scrolling past it in a TikTok video and thinking "that looks popular, maybe."&lt;/p&gt;

&lt;h2&gt;
  
  
  Why TikTok Shop Data Is Hard to Get
&lt;/h2&gt;

&lt;p&gt;I will not pretend this was easy to build. TikTok is one of the hardest sites to scrape on the internet. Here is what makes it painful:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Anti bot detection&lt;/strong&gt;: TikTok runs aggressive fingerprinting. Simple HTTP requests get blocked instantly. You need a full browser with realistic behavior patterns.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dynamic rendering&lt;/strong&gt;: Product listings load through multiple API calls triggered by scroll events. You cannot just fetch the HTML and parse it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Frequent changes&lt;/strong&gt;: TikTok updates their frontend regularly. Selectors that worked last week break this week. I have had to push fixes multiple times just to keep up.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Rate limiting&lt;/strong&gt;: Hit them too fast and you get IP banned. The scraper uses smart proxy rotation and request throttling to stay under the radar.&lt;/p&gt;

&lt;p&gt;Building on Crawlee with Puppeteer handles most of this. The Apify platform provides the proxy infrastructure and browser management. But it still requires constant maintenance whenever TikTok ships a new update.&lt;/p&gt;

&lt;h2&gt;
  
  
  What 30 Users Taught Me
&lt;/h2&gt;

&lt;p&gt;The users break down into three groups:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dropshippers&lt;/strong&gt; (the majority) use it to find trending products before they saturate. They run searches weekly across 10 to 15 categories and look for products with high review counts but low competition on Amazon or Shopify.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;TikTok affiliates&lt;/strong&gt; use it to find products with active affiliate programs. If a product has good reviews and a decent commission rate, they create content around it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Market researchers&lt;/strong&gt; use it less frequently but in bigger batches. They pull data across entire categories to spot trends for brands or agencies.&lt;/p&gt;

&lt;p&gt;The most common feedback: "I used to spend 3 hours scrolling TikTok looking for products. Now I spend 10 minutes looking at the data."&lt;/p&gt;

&lt;h2&gt;
  
  
  The Numbers
&lt;/h2&gt;

&lt;p&gt;The scraper is still private (TikTok scrapers live in a gray area that makes me cautious about going fully public), but through the Apify Store it has organically attracted about 30 users and logged 294 runs.&lt;/p&gt;

&lt;p&gt;For context, that puts it in my top 4 actors by usage, behind only LinkedIn (623 runs), YouTube (327 runs), and CoinMarketCap (239 runs).&lt;/p&gt;

&lt;p&gt;I did zero marketing for it. People find it by searching the Apify Store for "tiktok shop" and there is almost nothing else there.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Would Tell New Builder
&lt;/h2&gt;

&lt;p&gt;If you are thinking about building scrapers or APIs, find the platform that everyone wants data from but nobody has built tools for yet. TikTok Shop was that for me. The demand was already there. I just had to show up.&lt;/p&gt;

&lt;p&gt;The product does not need to be perfect. It needs to solve a real problem that people are currently solving manually. Thirty users is not a massive number, but every one of them found me organically because the alternative was wasting hours on manual research.&lt;/p&gt;

&lt;p&gt;You can check it out on the &lt;a href="https://apify.com/george.the.developer/tiktok-shop-scraper" rel="noopener noreferrer"&gt;Apify Store&lt;/a&gt;. If you are in the dropshipping or TikTok affiliate space, it might save you some serious time.&lt;/p&gt;

</description>
      <category>saas</category>
    </item>
    <item>
      <title>OFAC Sanctions Screening for $0.01 Per Entity (No Enterprise Contract)</title>
      <dc:creator>George Kioko</dc:creator>
      <pubDate>Sat, 11 Apr 2026 15:39:26 +0000</pubDate>
      <link>https://forem.com/the_aientrepreneur_7ae85/ofac-sanctions-screening-for-001-per-entity-no-enterprise-contract-d8n</link>
      <guid>https://forem.com/the_aientrepreneur_7ae85/ofac-sanctions-screening-for-001-per-entity-no-enterprise-contract-d8n</guid>
      <description>&lt;p&gt;If you work in compliance, you already know the pain. Sanctions screening tools from the big vendors cost $10,000 to $50,000 per year. They require sales calls, procurement cycles, and a six month onboarding process before you can check a single name against the SDN list.&lt;/p&gt;

&lt;p&gt;I built an API that does it for a penny per entity. No contracts. No minimums. No sales demo.&lt;/p&gt;

&lt;h2&gt;
  
  
  What It Actually Checks
&lt;/h2&gt;

&lt;p&gt;The &lt;a href="https://apify.com/george.the.developer/ofac-sanctions-checker" rel="noopener noreferrer"&gt;OFAC Sanctions Checker&lt;/a&gt; queries three major sanctions databases in a single call:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;US OFAC SDN List&lt;/strong&gt; (Specially Designated Nationals) from the Treasury Department&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;EU Consolidated Sanctions List&lt;/strong&gt; from the European Commission&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;UN Security Council Sanctions&lt;/strong&gt; from the United Nations&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;You send a name (person or company), and it returns matches with confidence scores, match types, and the specific list where the hit was found. Fuzzy matching is included, so "Vladimir Putin" catches variations like "V. Putin" or transliterated spellings.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Matters Right Now
&lt;/h2&gt;

&lt;p&gt;Fintechs, crypto exchanges, and trade compliance teams all face the same problem: they need sanctions screening, but they are too small for Oracle or Dow Jones pricing. The regulatory pressure is real though. OFAC fines start at $50,000 per violation and go up to $20 million. The EU is equally aggressive.&lt;/p&gt;

&lt;p&gt;If you are a 10 person fintech processing cross border payments, you cannot skip screening. But you also cannot justify $30K/year for a tool you might query 500 times a month.&lt;/p&gt;

&lt;p&gt;At $0.01 per entity, 500 monthly checks costs you $5.&lt;/p&gt;

&lt;h2&gt;
  
  
  How It Works in Practice
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;https://api.apify.com/v2/acts/george.the.developer~ofac-sanctions-checker/run-sync-get-dataset-items&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;POST&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Content-Type&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;application/json&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Authorization&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Bearer YOUR_TOKEN&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;entityName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Acme Trading LLC&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;entityType&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;company&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;checkLists&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;SDN&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;EU&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;UN&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="c1"&gt;// Returns: matches, confidence scores, list sources&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Drop that into your KYC onboarding flow, your payment processing pipeline, or your trade compliance checks. Each call takes a few seconds and costs a penny.&lt;/p&gt;

&lt;h2&gt;
  
  
  Who Is Using This
&lt;/h2&gt;

&lt;p&gt;Two users are running it consistently right now, mostly in trade compliance workflows. The typical pattern is batch screening: uploading a list of counterparties before processing a shipment or wire transfer.&lt;/p&gt;

&lt;p&gt;One user runs it as part of a daily cron job, checking new customers against all three lists overnight. At their volume (roughly 200 entities per day), they are spending about $60 a month instead of $30,000 a year.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Compliance Gap Nobody Talks About
&lt;/h2&gt;

&lt;p&gt;There are thousands of small businesses that should be screening but are not. Import/export companies, money service businesses, crypto OTC desks, even law firms doing cross border transactions. They skip it because the enterprise tools are priced for banks with 10,000 employees.&lt;/p&gt;

&lt;p&gt;That is a real risk. OFAC enforcement does not care about your company size. The fines hit a 5 person MSB the same way they hit JPMorgan.&lt;/p&gt;

&lt;p&gt;A penny per check removes the excuse.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;

&lt;p&gt;The API is live on the &lt;a href="https://apify.com/george.the.developer/ofac-sanctions-checker" rel="noopener noreferrer"&gt;Apify Store&lt;/a&gt;. You can test it without a credit card. Pay per event pricing means you only get charged for actual checks, not seats or subscriptions.&lt;/p&gt;

&lt;p&gt;If you are building compliance tooling or running a fintech that needs sanctions screening without the enterprise markup, give it a shot. The worst case is you spend a dollar testing it.&lt;/p&gt;

</description>
      <category>saas</category>
    </item>
    <item>
      <title>The 5 APIs That Run 200+ Times Per User (And Why That Matters)</title>
      <dc:creator>George Kioko</dc:creator>
      <pubDate>Sat, 11 Apr 2026 06:07:17 +0000</pubDate>
      <link>https://forem.com/the_aientrepreneur_7ae85/the-5-apis-that-run-200-times-per-user-and-why-that-matters-5hi9</link>
      <guid>https://forem.com/the_aientrepreneur_7ae85/the-5-apis-that-run-200-times-per-user-and-why-that-matters-5hi9</guid>
      <description>&lt;p&gt;Most developer tools get used a handful of times. Someone finds your API, tries it on a test case, maybe runs it a dozen more times, then moves on. That is the normal pattern. Out of 38 actors I have running on Apify, most average 5 to 20 runs per user. Respectable numbers.&lt;/p&gt;

&lt;p&gt;But five of them break the pattern completely. These five average 100 to 260 runs per user. Not because of better marketing or a viral tweet. Because they solve problems that require bulk processing by design.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Numbers
&lt;/h2&gt;

&lt;p&gt;Here is the actual usage data from my Apify dashboard:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;API&lt;/th&gt;
&lt;th&gt;Runs Per User&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Domain WHOIS Lookup&lt;/td&gt;
&lt;td&gt;262&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Google Scholar Scraper&lt;/td&gt;
&lt;td&gt;230&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AI Content Detector&lt;/td&gt;
&lt;td&gt;132&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Website Tech Detector&lt;/td&gt;
&lt;td&gt;126&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Email Validator&lt;/td&gt;
&lt;td&gt;105&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Compare that to something like the LinkedIn Employee Scraper, which has 37 users but averages about 17 runs each. LinkedIn users grab the data they need and stop. WHOIS users feed in hundreds of domains every single session.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why These Five?
&lt;/h2&gt;

&lt;p&gt;The common thread is not the subject matter. It is the workflow. Every one of these tools plugs into a process where the user already has a list and needs to process all of it:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Domain WHOIS Lookup (262 runs/user):&lt;/strong&gt; Security researchers and domain investors run this on batches of suspicious domains. When a phishing campaign registers 10,000 domains with similar naming patterns, someone needs registrar data, creation dates, and nameservers for every single one. That is not a one time task. New domains appear daily.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Google Scholar Scraper (230 runs/user):&lt;/strong&gt; Academic researchers doing systematic literature reviews or bibliometric analysis. They need every paper matching a query, with citations, h index scores, and author profiles exported as structured JSON. One research project can require pulling data on thousands of papers across multiple search terms.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AI Content Detector (132 runs/user):&lt;/strong&gt; Content moderation teams, academic integrity offices, and publishers who need to scan entire content catalogs. Checking one essay at a time is pointless when you have 500 submissions or 2,000 product descriptions to verify. The bulk API call is the only thing that makes this practical.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Website Tech Detector (126 runs/user):&lt;/strong&gt; Sales development teams that need technology intelligence on their entire prospect list. If you are selling a React migration service, you need to know which of your 3,000 target companies still run Angular or jQuery. Feed in the list, get back frameworks, CDNs, analytics tools, CMS platforms in clean JSON.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Email Validator (105 runs/user):&lt;/strong&gt; Cold outreach operators who clean their lists before every campaign. A 5% bounce rate destroys your sender reputation, so smart operators validate 500 to 5,000 emails before hitting send. They do this before every single campaign, not once.&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Means for Builders
&lt;/h2&gt;

&lt;p&gt;The lesson is simple: if your API solves a problem that people encounter once, you need constant marketing to keep new users flowing in. If your API solves a problem that people encounter in batches, repeatedly, you get sticky users who come back on their own.&lt;/p&gt;

&lt;p&gt;None of these five APIs went viral. None of them got featured in a newsletter. The WHOIS lookup has 7 total users. But those 7 users have collectively run it 1,837 times. That is revenue without marketing spend.&lt;/p&gt;

&lt;p&gt;The best APIs are not the ones with the most users. They are the ones where each user cannot stop running them.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try Them
&lt;/h2&gt;

&lt;p&gt;All five are live on the Apify Store under my profile (george.the.developer), priced per call with no monthly subscription. Domain WHOIS at $0.005/lookup, Scholar at $0.004/paper, AI Detector at $0.003/text, Tech Detector at $0.005/site, Email Validator at $0.002/email.&lt;/p&gt;

&lt;p&gt;Built in Nairobi. 38 actors, 700+ users, 14,000+ total runs.&lt;/p&gt;

</description>
      <category>javascript</category>
      <category>webdev</category>
      <category>api</category>
      <category>saas</category>
    </item>
    <item>
      <title>Google Scholar Has No API Either. Here's What 5,000 Runs Taught Me</title>
      <dc:creator>George Kioko</dc:creator>
      <pubDate>Fri, 10 Apr 2026 17:53:24 +0000</pubDate>
      <link>https://forem.com/the_aientrepreneur_7ae85/google-scholar-has-no-api-either-heres-what-5000-runs-taught-me-24l6</link>
      <guid>https://forem.com/the_aientrepreneur_7ae85/google-scholar-has-no-api-either-heres-what-5000-runs-taught-me-24l6</guid>
      <description>&lt;p&gt;Google Scholar is the single most important search engine for academic research. Billions of papers indexed, citation counts, author profiles, related work links. And Google has never released an official API for it.&lt;/p&gt;

&lt;p&gt;Not deprecated. Not restricted. Just... never built one.&lt;/p&gt;

&lt;p&gt;If you want to programmatically search Google Scholar, grab paper titles, authors, citation counts, and PDF links, you are on your own. So I built an actor that does exactly that.&lt;/p&gt;

&lt;h2&gt;
  
  
  What It Pulls
&lt;/h2&gt;

&lt;p&gt;You give it a search query (like "transformer architecture attention mechanism") and it returns structured data:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Attention Is All You Need"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"authors"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"A Vaswani, N Shazeer, N Parmar..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"citationCount"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;112847&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"year"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2017"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://arxiv.org/abs/1706.03762"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"pdfUrl"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://arxiv.org/pdf/1706.03762"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"snippet"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"The dominant sequence transduction models are based on complex recurrent..."&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Paper titles, author lists, citation counts, publication year, direct links, and PDF URLs when available. Everything a researcher needs to build a literature review or track citations over time.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Numbers Tell a Story
&lt;/h2&gt;

&lt;p&gt;Here's where it gets interesting. The actor has &lt;strong&gt;22 users&lt;/strong&gt; and &lt;strong&gt;5,065 total runs&lt;/strong&gt;. Do the math on that ratio: 230 runs per user on average.&lt;/p&gt;

&lt;p&gt;These are not casual users clicking "Run" once to test it. These are power users running it at scale. Academics building citation databases. Research firms tracking publication trends across thousands of queries. AI companies monitoring new papers in their domain.&lt;/p&gt;

&lt;p&gt;That run to user ratio is the strongest signal I have that this tool solves a real problem. When someone runs your tool 200+ times, they have built it into a workflow.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Scholar Is Hard to Scrape
&lt;/h2&gt;

&lt;p&gt;Google Scholar is notoriously aggressive about blocking automated access. It will throw CAPTCHAs after just a handful of requests from the same IP. Most simple scraping scripts break within minutes.&lt;/p&gt;

&lt;p&gt;The actor handles this with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Proxy rotation across residential IPs&lt;/li&gt;
&lt;li&gt;Session management to maintain cookies between requests&lt;/li&gt;
&lt;li&gt;Randomized delays that mimic human browsing patterns&lt;/li&gt;
&lt;li&gt;Automatic retry logic when a request gets blocked&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I also had to deal with Google's inconsistent HTML. Scholar's markup changes subtly over time. Element class names shift, layout structures get tweaked. The parser needs regular maintenance to keep working.&lt;/p&gt;

&lt;h2&gt;
  
  
  Who Uses This
&lt;/h2&gt;

&lt;p&gt;Three main groups keep showing up:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Academics and PhD students&lt;/strong&gt; building systematic literature reviews. Instead of manually searching and copying results, they run batch queries and get structured data they can feed into reference managers or spreadsheets.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Research firms and think tanks&lt;/strong&gt; tracking publication trends. They want to know how many papers mention "large language models" per quarter, or which authors are publishing most frequently in a specific subfield.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AI and ML teams&lt;/strong&gt; monitoring state of the art. When a new paper drops with high early citation velocity, they want to know about it fast.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;

&lt;p&gt;The actor is on the Apify Store with pay per result pricing ($0.004 per paper): &lt;a href="https://apify.com/george.the.developer/google-scholar-scraper" rel="noopener noreferrer"&gt;Google Scholar Scraper&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you have ever copy pasted results from Google Scholar into a spreadsheet, this will save you hours. And if you are doing it at scale, it will save you from getting IP banned.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Built in Nairobi by George. 40+ actors, 5,000+ runs on Scholar alone.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>javascript</category>
      <category>webdev</category>
      <category>api</category>
      <category>research</category>
    </item>
  </channel>
</rss>
