Forem: George Kioko

Seven new Apify actors in two days: healthcare, AI infra, GovCon, real estate

George Kioko — Thu, 14 May 2026 01:01:06 +0000

Seven new Apify actors in two days

Two day sprint. Codex did the per-actor source code, I orchestrated and verified each one against real data. All seven live on Apify Store. Pay per event pricing activates 2026-05-26.

Each one targets a specific buyer category with a real budget. Not generic scrapers, not toy demos. Numbers on what each one does and which buyers should care.

1. Hospital Price Transparency MRF Normalizer

Store link

CMS enforcement of the hospital price transparency rule tightened in April 2026. Every US hospital publishes a machine readable file of negotiated rates by payer and procedure, but the files are gigantic and inconsistent. CMS v2 JSON, CMS v3 JSON, CSV with column drift between hospitals.

This actor wraps the parsing into one Standby API. Detects format automatically (gzip, JSON, CSV), streams the file, normalizes each row to one schema with payer, plan, billing code, code type, negotiated rate, methodology, expiration date.

Pricing: $0.50 per start, $0.002 per rate row, $0.02 per provider procedure payer bundle.

Verified live test: Cooper University Hospital CSV returned 13 normalized rows including Aetna Better Health at code 0042T (HCPCS) for $272.44 negotiated.

Buyers: health cost transparency platforms (Amino, Cedar, Healthcare Bluebook), employer benefits tools (Castlight, Garner, Transcarent), TPA and PPO operators (Zelis, MultiPlan).

2. MCP Server Registry and Security Scorer

Store link

Anthropic Model Context Protocol exploded past 22,000 servers indexed on Glama. Agent platforms admitting MCP servers do it by README and stars and hope.

This actor joins the official Anthropic registry with npm package metadata (downloads, first published date, maintainer count), GitHub repo data (stars, archived, last commit, open issues), and produces a deterministic risk score per server. Signals: missing repo, archived, stale 90 days, unknown publisher, no license, weird tool count, registered but never built, single maintainer npm, advisory match.

Score lands 0 to 100 in bands low, medium, high, critical. Same input always returns same score, no LLM in the loop.

Pricing: $0.50 per start, $0.025 per server profile, $0.15 per full security scan.

Verified live test: ac.inference.sh/mcp server flagged risk 70 (high) because no repo link, no license, unknown publisher, remote only transport.

Buyers: agent platform admission control (Docker MCP Toolkit, Cursor, Continue, Windsurf), enterprise AI governance teams, MCP marketplace ranking signals, dev tool risk dashboards.

3. FDA Warning Letter and Enforcement Monitor

Store link

FDA published warning letters on a static index page that everyone in regulated industries needs to read but nobody wants to scrape. The April 2026 GLP-1 and telehealth enforcement push made this even more urgent.

This actor pulls the full letter feed, classifies each letter by topic (GLP-1, telehealth, compounding, manufacturing, biologics, food, advertising, dietary supplement), extracts the recipient company and product line, and rolls everything up into a company level risk brief with letter count, topic mix, time since last letter, and severity signals.

Pricing: $1 per start, $0.30 per letter, $1.50 per company risk brief.

Verified live test: CareFusion 213 LLC returned risk band "high" with 4 open enforcement actions tied to manufacturing.

Buyers: FDA consultants (Redica, Greenleaf Health, ProPharma, RQM Plus, The FDA Group), pharma QA SaaS (AssurX, Sparta, Kneat), med-device legal, telehealth ops.

4. Clinical Trial Investigator and Site Intelligence

Store link

CROs and sponsors at any scale rebuild the same data join for every new study. ClinicalTrials.gov plus NPI Registry plus OpenPayments plus PubMed publication count plus geo cluster of sites. The data is public, the join is mechanical, but everyone rebuilds it in a half maintained Python script.

This actor wraps the join into one Standby API. Query by condition or NCT id, get investigator profiles with NPI, OpenPayments dollar totals by company, publication counts, trial history broken down by phase, and a deterministic site fit score for each location.

Pricing: $1 per start, $0.10 per investigator profile, $0.50 per site fit row.

Verified live test: glioblastoma phase 2 query returned Thomas J Kaley MD at Memorial Sloan Kettering with NPI 1578721858, full therapeutic area list, and complete trial history.

Buyers: top 5 CROs (IQVIA, ICON, Parexel, Medidata, Veeva), clinical site networks (Advarra, Clinitiative), patient recruitment platforms (LINEA), sponsor BD teams.

5. Federal Contract Opportunity Monitor

Store link

SAM.gov publishes federal contract opportunities hourly. USAspending publishes awarded contracts with recipient data. Joining them and tagging by topic is the daily grind for federal sales teams and GovCon advisors. Bloomberg Government and Govini do this at enterprise pricing.

This actor pulls SAM.gov internal search (no API key required) plus USAspending POST endpoint, normalizes both into one schema, tags opportunities by topic keyword (configurable per user), and produces partnership leads: which prime contractors won similar work recently and at what value.

Pricing: $1 per start, $0.10 per opportunity, $0.50 per partnership lead.

Verified live test: /leads?keyword=consulting&awarded_amount_min=100000 returned General Dynamics IT with 2 recent awards totaling $1.7 billion, lead band "priority", top buying agency Department of State.

Buyers: GovCon advisors (Govini, Deltek, EZGovOpps), federal sales teams at MSPs and consulting firms, subcontract introduction services, state and local procurement intel.

6. LLM Provider Price and Latency Monitor

Store link

LLM gateways and agent platforms route across 5 to 10 providers. Pricing changes weekly. OpenRouter aggregates in a UI but does not publish it as a clean JSON ingestion feed. Most teams maintain their own scraper.

This actor wraps OpenRouter as canonical (200+ models, no auth), falls back to scraping OpenAI, Anthropic, Together, and Groq pricing pages when a model is missing, and returns a normalized snapshot per model with prompt and completion price per 1M tokens, context length, capability flags (vision, tools, JSON mode), and provider routing list.

A second endpoint produces cross provider benchmark rows: same model family, multiple providers, cost per 1k chat pair, multiplier from cheapest to most expensive.

Pricing: $0.50 per start, $0.025 per model snapshot, $0.15 per benchmark row.

Verified live test: /models?provider=anthropic&limit=3 returned 3 Anthropic models with full pricing populated. Benchmark endpoint returned 2 cross provider rows with cheapest_provider field.

Buyers: LLM gateway operators (Portkey, Helicone, LiteLLM), agent platform model admission, FinOps cost per task budgeting, AI engineering team weekly digests.

7. Multi City Building Permit Aggregator

Store link

Each US city publishes building permit data in its own JSON shape. NYC moved their authoritative feed from the old DOB Issuance dataset to DOB NOW in 2024 and broke every scraper that hardcoded the old IDs.

This actor wraps NYC and Chicago open data portals into one schema. Per permit you get permit type, status, work type, estimated cost, address with GPS, ZIP, block lot, builder business name, builder license, property business name. A second endpoint produces a builder activity roundup for any business name.

Pricing: $1 per start, $0.05 per permit row, $0.30 per builder activity roundup.

Verified live test: COSTELLO CONSTRUCTION roundup returned 88 permits over 3 years for $3 million total value, top property businesses served (Flushing Hospital Medical Center, Jerome Avenue SM Realty LLC), top permit types, top ZIP codes, activity tier "high volume".

Buyers: construction supply distributors planning territories, contractor SaaS lead enrichment (ServiceTitan, Jobber, Buildertrend), real estate investor research, market research firms.

What activation looks like

Pricing activates 2026-05-26 for the first 5 actors and 2026-05-27 for the last 2. Until then runs are free for testers who want to validate the schema.

If any of the buyer categories above describes your work, drop a free trial test at the contact info in each Store listing. Sample data on one company or condition of your choice gets returned within the day.

If you want to follow what each actor earns at first activation, the monthly digest goes out via @ai_in_it on X and the apify-actor-portfolio repo.

George
george.the.developer on Apify

Source and verification reports: github.com/the-ai-entrepreneur-ai-hub/apify-actor-portfolio.

I built a $0.002 email validator because ZeroBounce was killing my margins on a freelance gig

George Kioko — Tue, 05 May 2026 20:47:08 +0000

A client paid me a fixed fee to clean a 47k email list. Sounded fine on paper. Then I priced the verification step.

ZeroBounce minimum is 2,000 credits for $39, so $0.0195 per email at the cheapest pay-as-you-go rate. Their subscription gets you to about $0.0099 per email at 10k volume. For my 47k list that is $462 PAYG or $235 if I commit to a monthly plan I will not need next month. The freelance fee was a flat number. The math was getting ugly.

I have an Apify account and a small list of standby actors that bill per call. So I went to check what an SMTP MX validator actually costs to run.

What is in a real validation

If you scope it tight there are five checks that catch most of the garbage:

RFC 5322 syntax (regex on the local part and domain).
Disposable domain list (Mailinator, 10MinuteMail, 1.4k entries that move slowly).
Free provider tag (Gmail, Yahoo, etc, useful for B2B scoring not deliverability).
MX record lookup via DNS.
Optional SMTP handshake that opens a TCP connection to the MX host on port 25 and reads the banner. No data sent.

The first four are local computation. Microseconds. The SMTP step is the only thing that costs a network round trip and most ISPs will give you a clean answer in under 2 seconds.

The actor

I wrote it as a standby HTTP server on Apify. One endpoint, JSON in, JSON out. Billed per email via Actor.charge. Pay only on a successful return.

curl -X POST 'https://george-the-developer--email-validator-api.apify.actor/?token=YOUR_TOKEN' \
  -H 'Content-Type: application/json' \
  -d '{"email":"someone@somewhere.com"}'

Returns:

{
  "email": "someone@somewhere.com",
  "valid_syntax": true,
  "is_disposable": false,
  "is_free": false,
  "mx_records": ["mail.somewhere.com"],
  "smtp_check": "deliverable",
  "score": 0.95
}

Per call cost is $0.002. For my 47k list that came out to roughly $94 of compute, not $462. The shape of the gig changed.

Architecture

flowchart LR
  Client[Client API call] --> Standby[Apify Standby Actor]
  Standby --> Syntax[RFC 5322 check]
  Syntax --> Disposable[Disposable domain list]
  Disposable --> MX[DNS MX lookup]
  MX --> SMTP[SMTP handshake on :25]
  SMTP --> Charge[Actor.charge per email]
  Charge --> Resp[(JSON response)]

What it does not do

I want to be honest about scope. It does not do role-based detection beyond a simple list of info@, support@, sales@, etc. It does not do toxicity scoring or abuse history, that needs a paid feed. It does not catch every catch-all domain, which is a known industry problem and not solvable with public DNS alone. ZeroBounce charges more partly because they aggregate proprietary signals from their customer base. Worth it if you are validating cold lists for cold outreach. Less worth it if you are cleaning a list of people who already opted in.

Per call vs flat subscription

The shape that bugged me about SaaS validators is the cap. You pay $79 a month and you get a fixed bucket of credits. If your usage is bursty (one client every few weeks) most of that bucket evaporates. If your usage is steady the math works out. But solo and freelance work is bursty by definition. Per call billing matches the work. No cap, no commitment, just a number per email.

Numbers from the gig

47,219 emails through the actor. Run completed in 38 minutes on the lowest memory tier. About 3.1k emails came back as undeliverable, 1.2k as disposable, 41.5k as deliverable. The client took the cleaned list and ran their own send. I did not have to think about a credit pool.

When this is the wrong call

If you are running enterprise email marketing at 10M sends a month, ZeroBounce or NeverBounce or Bouncer probably fit you. They have account managers, SLA contracts, GDPR paperwork. I am one developer with an Apify account. If you need that other thing, go pay for it.

If you are a freelancer cleaning a list, or a small agency that runs verification once a month, per call billing solves your specific problem.

Try it

Actor is at apify.com/george.the.developer/email-validator-api. Free to try if you have an Apify account, billed per call. Source for the docs and curl examples is at github.com/the-ai-entrepreneur-ai-hub/email-validator-api-docs (I keep the actor itself private but the call surface is documented).

If anyone has run into the same SaaS-cap problem with their own freelance gigs, would love to hear what you ended up using.

Originally published on Hashnode.

4 days to lock in current LinkedIn scraper pricing

George Kioko — Fri, 01 May 2026 08:24:24 +0000

Short post. The LinkedIn Company Employees Scraper on my Apify portfolio is raising PPE pricing on May 5. If you have a planned batch coming up, run it before then.

This is not a marketing nudge. This is an "I auto-disabled half the actor's value because I was losing money on residential proxy and the May 5 fix re-enables it" situation.

What changes May 5

Event	Old	New	Multiplier
short-profile	$0.003	$0.009	3x
full-profile	$0.006	$0.015	2.5x
full-profile-with-email	$0.01	$0.025	2.5x
actor-start	$0.005	$0.005	unchanged

A run that costs $0.10 today would cost about $0.30 after May 5. Same data, same accuracy, same actor.

Why now

Verification was auto-disabled on April 24 because residential proxy cost was running below the short-profile price. The actor falls back to confidence=low SERP-only output until the May 5 increase reactivates verified mode.

If you absolutely cannot wait until May 5 for verified data, pass acceptDiscoveryFallback: true on input. Build 2.2.21+ runs SERP-only at the current $0.003 rate.

Full breakdown of how the pricing got into this state in the first place:

theaientrepreneur.hashnode.dev/two-agency-users-were-83-of-my-revenue-they-left-and-i-noticed-29-days-later

Three actions if you use this actor

Run any planned bulk batches before May 5 to lock in current pricing on the events you need.
Update your maxTotalChargeUsd cap if you have one. The same volume after May 5 needs roughly 3x the cap on short-profile-heavy runs.
If your pipeline expects verified output and you cannot wait, switch to acceptDiscoveryFallback now to keep getting results at current pricing, with the confidence=low caveat.

What does not change

The actor itself. Same code, same maintenance, same response time. Same author who reads every Discussion thread.

The May 5 change brings pricing in line with what residential verification costs. After it lands, verification turns back on automatically and your runs return verified worksFor data again at the new rate.

When this is the wrong actor for you

If you only need the candidate set and not LinkedIn-verified worksFor data, the SERP-only path at $0.003 stays cheaper than competitors that bundle verification. If you need fully-verified output and your spreadsheet math says $0.009 per profile is too much, my actor is not the right tool. Phantombuster bundles verification at $69/mo for low volume. Bright Data is more expensive but enterprise-grade. Apify itself has 50+ LinkedIn actors, some priced lower for lower-quality verification.

I run mine at $0.009 because the residential proxy cost behind verification is real. Lower-priced competitors either skip verification, scrape with logged-in cookies (ToS risk), or run at a loss until they shut down (Proxycurl pattern).

Bottom line

Run your batches before May 5 if you can. Drop questions on the Apify Discussion thread for the actor if anything is unclear about the change.

apify.com/george.the.developer/linkedin-company-employees-scraper

6 Apify actors I actually use myself

George Kioko — Tue, 28 Apr 2026 19:51:26 +0000

I have 27 public Apify actors. Most are good. Six are genuinely useful and I run them through n8n and curl on a weekly basis. This is a tour of those six, with the actual prices, sample inputs, and the use cases I built them for.

If you found me through the silent-churn postmortem from earlier this week, this is the followup people kept asking for: "what else do you have?"

1. LinkedIn Company Employees Scraper

apify.com/george.the.developer/linkedin-company-employees-scraper

The one most people find me through. Takes a LinkedIn company URL, returns the top N employees who match your title filter. Verifies via JA4-accurate TLS fetch on a self-hosted Go service so it does not need login cookies and does not get flagged as a bot.

Sample input:

{
  "companies": ["https://www.linkedin.com/company/stripe"],
  "maxEmployees": 25,
  "targetTitles": ["CEO", "CTO", "Head of Engineering"]
}

Pay per event. $0.005 actor-start, $0.003 per short profile. Hike scheduled for May 5 to $0.009 per short profile to cover residential proxy cost properly.

Use it for: lead lists, sales prospecting, recruiter sourcing.

2. Email Validator API

apify.com/george.the.developer/email-validator-api

Standby HTTP API. Sub-second response. Runs syntax + MX + disposable + role-based + SMTP handshake checks. Pay per event $0.002 per email. Run a list of 50,000 emails and pay $100, not the $375 NeverBounce charges.

Sample call:

curl 'https://george-the-developer--email-validator-api.apify.actor/validate?email=test@stripe.com'

Use it for: pre-flight on cold email lists, signup form fraud filtering, list cleanup before importing to a CRM.

3. Domain WHOIS Lookup

apify.com/george.the.developer/domain-whois-lookup

Standby API. Returns registrar, age in days, expiry, DNS records. Falls back to RDAP since 374 gTLDs sunsetted port 43 WHOIS in early 2025. $0.005 per lookup.

Sample call:

curl 'https://george-the-developer--domain-whois-lookup.apify.actor/lookup?domain=stripe.com'

Use it for: lead-scoring (domain age is a real signal), security tooling, brand monitoring.

4. Company Enrichment API

apify.com/george.the.developer/company-enrichment-api

Domain in, company name + industry + tech stack signals out. Sub-second standby response. $0.01 per call.

Use it for: enrichment step in a lead-gen pipeline, ICP scoring, account research.

5. URL Metadata Extractor

apify.com/george.the.developer/url-metadata-extractor

OG tags, Twitter cards, favicon, canonical URL, structured data. Anything an AI agent needs to actually understand a page without parsing the full DOM. $0.003 per URL.

Use it for: content tools that show link previews, AI agents that need to summarize before they read, dashboards that aggregate links.

6. AI Content Detector

apify.com/george.the.developer/ai-content-detector

Text in, AI-probability score out. Uses an LLM-based classifier behind the scenes, not a regex. $0.003 per text.

Use it for: content moderation pipelines, marketplace listings filtering, dataset cleanup before training.

How they fit together

A typical pipeline I see in customer logs looks like this:

LinkedIn Company Employees Scraper finds candidates at a target company
Each candidate's email gets cleaned through Email Validator
Their company domain runs through WHOIS Lookup for age signal
Domain runs through Company Enrichment for industry + tech stack
Output goes into Hubspot or Pipedrive

Each step is pay per event. If you only need 3 of the 5 steps, you only pay for those events. There is no per-seat or per-month surcharge.

Honest pricing context

I just shipped a billing-guard fix for the LinkedIn actor that prevents it from emitting profiles when the charge would exceed your maxTotalChargeUsd cap. Postmortem on that lives here:

theaientrepreneur.hashnode.dev/why-my-linkedin-scraper-now-refuses-jobs

Same gate is rolling out across the standby APIs over the next two weeks. If you used any of these before the fix and your output count did not match your billed events, ping me.

Where to start

If you do lead generation, start with LinkedIn + Email Validator. That is the chain that produces revenue for users I see in the logs.

If you do content moderation or AI agent work, start with URL Metadata Extractor + AI Content Detector. They both return clean JSON in well under a second.

If you do security or domain research, start with WHOIS + Company Enrichment.

All six work standalone. All six bill per event, not per seat. None require API key contortions, just an Apify token and a curl.

apify.com/george.the.developer

What I shipped after the $540 silent churn postmortem

George Kioko — Sat, 25 Apr 2026 08:24:18 +0000

Yesterday I posted a postmortem on losing $540 a month to silent user churn. Some folks asked what the actual fix was. This is that post. Less drama, more code, three concrete patches that went live today.

If you missed yesterday: https://theaientrepreneur.hashnode.dev/two-agency-users-were-83-of-my-revenue-they-left-and-i-noticed-29-days-later

When I started digging into why my LinkedIn employee scraper was bleeding compute on real user runs, I found it was not one bug. It was three, layered.

Bug one: push first, charge second

Every Apify pay per event tutorial shows you Actor.charge('event-name', { count: 1 }). Easy. What none of them stress is what happens when the charge call fails.

There are at least three live failure modes:

The user set maxTotalChargeUsd on the run. They hit it. Charge returns chargedCount: 0.
Apify itself returns eventChargeLimitReached: true mid run.
The platform throws a transient error your try/catch swallows.

My actor's loop was structured like this:

await Actor.pushData(record);
try {
    await Actor.charge({ eventName, count: 1 });
} catch (e) {
    log.warning(`charge failed: ${e.message}`);
}
// the loop keeps going regardless

Push first, charge second, swallow errors, keep looping. So if charge stopped working halfway through a 100 profile run, the actor cheerfully output the remaining 50 for free while still spending real proxy and SERP money. The user got 100 profiles. I got billed for 50.

That is exactly the kind of leak you only notice when you stare at a per run cost graph and wonder why your revenue line is growing slower than your cost line.

The fix: a charge gate that fails closed

The new code calls a small helper before every emit. If charge fails for any reason, the gate refuses every subsequent call without even trying.

export function createProfileChargeGate({ isPPE, eventName, actorCharge, logger, stats }) {
    let chargeLimitReached = false;

    return {
        hasChargeLimitReached: () => chargeLimitReached,
        async chargeForNextProfile() {
            if (!isPPE) return { canEmit: true, charged: false, reason: 'not-ppe' };
            if (chargeLimitReached) return { canEmit: false, charged: false, reason: 'charge-limit-reached' };

            try {
                const result = await actorCharge({ eventName, count: 1 });
                if (result?.eventChargeLimitReached) {
                    chargeLimitReached = true;
                    return { canEmit: false, charged: false, reason: 'charge-limit-reached', result };
                }
                const charged = Number(result?.chargedCount || 0);
                if (charged <= 0) {
                    chargeLimitReached = true;
                    return { canEmit: false, charged: false, reason: 'not-charged', result };
                }
                stats.totalCharges = (stats.totalCharges || 0) + charged;
                return { canEmit: true, charged: true, reason: 'charged', result };
            } catch (error) {
                chargeLimitReached = true;
                return { canEmit: false, charged: false, reason: 'charge-error', error };
            }
        },
    };
}

The main loop now does:

const gate = createProfileChargeGate({ isPPE, eventName, actorCharge: Actor.charge, logger: log, stats });

for (const profile of profiles) {
    const verdict = await gate.chargeForNextProfile();
    if (!verdict.canEmit) {
        log.info(`Stopping at ${stats.totalCharges} profiles, gate refused: ${verdict.reason}`);
        break;
    }
    await Actor.pushData(profile);
}

Once the gate has refused, every call short circuits without trying to charge again. The run wraps up gracefully instead of bleeding compute on uncharged output.

Bug two: jobs that should never start

The second class of bug is the job that should not have run at all. A user sets companyCount 200, targetTitles 30, maxEmployees 1, hits Run, and watches my actor burn proxy and verification cost while emitting almost nothing.

The math is approachable. Per company you do roughly basePages + targetTitleCount SERP requests at about $0.0025 each, plus a verification attempt budget at about $0.0004 each. Per profile emitted you collect actorStartPriceUsd + shortProfilePriceUsd, then Apify takes 20% platform share off the top.

So a preflight estimator can compute estimatedPlatformCostUsd and estimatedCreatorRevenueUsd before any compute happens.

export function buildMarginPreflightEstimate({
    companyCount, targetTitleCount, maxEmployees, verifyEnabled,
    actorStartPriceUsd, shortProfilePriceUsd, creatorRevenueShare,
    serpCostUsd, verificationAttemptCostUsd, maxCostToCreatorRevenueRatio = 0.75,
}) {
    const basePages = getBaseSerpPagesPerCompany({ maxEmployees, verifyEnabled });
    const serpRequests = companyCount * (basePages + targetTitleCount);
    const verificationAttempts = verifyEnabled
        ? companyCount * getInitialVerificationCandidateLimit({ companyCount, maxEmployees })
        : 0;
    const estimatedProfiles = companyCount * maxEmployees;

    const estimatedPlatformCostUsd = (serpRequests * serpCostUsd) + (verificationAttempts * verificationAttemptCostUsd);
    const estimatedCreatorRevenueUsd = (actorStartPriceUsd + estimatedProfiles * shortProfilePriceUsd) * creatorRevenueShare;

    const ratio = estimatedCreatorRevenueUsd > 0
        ? estimatedPlatformCostUsd / estimatedCreatorRevenueUsd
        : Infinity;

    const exceedsMarginBudget = ratio > maxCostToCreatorRevenueRatio;
    return {
        estimatedPlatformCostUsd, estimatedCreatorRevenueUsd, ratio,
        exceedsMarginBudget,
        warning: exceedsMarginBudget
            ? `Not profitable enough for verified mode. Reduce companies, reduce targetTitles, or increase maxEmployees per company.`
            : '',
    };
}

In main, before any real work:

if (estimate.exceedsMarginBudget) {
    throw new Error(`Input rejected before run start: ${estimate.warning}. No PPE events will be charged.`);
}

The throw happens before Actor.charge has been called once. The user gets a clear refusal at submit time and pays nothing. They can resubmit with parameters that actually make sense.

The estimator tests confirm it accepts normal small runs (2 companies, 25 employees each, verified) and rejects the title-heavy 1-employee runs that were the worst offenders.

Bug three: a default that was too generous

The third change is product taste. Default maxEmployees was 100. That is too many for verified scraping with current LinkedIn block rates. Most users wanted 10 to 20 anyway and just left the default. The new default is 25.

If you really want 100 verified profiles per company, type 100 explicitly. Acknowledging it costs you a keystroke instead of nothing.

Small change, real impact. The new default protected at least one user yesterday from accidentally triggering the margin preflight refusal.

What this means if you use the actor

Three things you will notice in the build that went live today:

You will not get billed for partial runs where charge stopped working. Either the run completes and you pay for everything you got, or it stops mid run and you pay nothing past the limit, never both.

You will get rejected at submit time if your input is structurally unprofitable. The error message tells you exactly which knob to turn.

You will be defaulted into a smaller, faster run. Big jobs are still possible, you just have to opt in.

What I am doing next

The same three patterns apply to most of my PPE actors. The charge gate is already a module. I will be rolling it across the rest of the portfolio over the next week:

AI Content Detector
Email Validator API
URL Metadata Extractor
Domain WHOIS Lookup
Company Enrichment API
Website Intelligence API

These already shipped a fix yesterday for a different leak (GPT Store action pings hitting the standby actor with test payloads). The billing gate is the next layer.

If you build pay per event actors on Apify, take an hour and add a similar gate. The savings show up immediately. Your users start trusting your billing numbers because the numbers actually match the work.

Try the actor

The fixes are live as of today's build:

https://apify.com/george.the.developer/linkedin-company-employees-scraper

Pass companies as a list of LinkedIn URLs, set maxEmployees explicitly if you want more than 25, and watch the run console. The new guards should make the cost line predictable for the first time since I shipped this thing.

Yesterday was the diagnosis. Today is the fix. Tomorrow, I find out if anyone other than me actually cares.

How I lost $540/month in 30 days to silent user churn (and didn't notice)

George Kioko — Fri, 24 Apr 2026 04:34:05 +0000

Last week my 30 day profit dropped from $268 to $50 and I assumed it was a bug in the dashboard.

It wasn't a bug. It was me not paying attention for 29 days straight.

This is a postmortem of how I lost roughly $540/month from two agency buyers who just quietly stopped running my actor on March 25. I found out on April 23. That's 29 days of ambient denial while every other part of my portfolio was also quietly rotting.

Writing this partly so I remember the lesson, partly because I know at least three other Apify devs are about to make the same mistake.

The numbers

I run a bunch of scrapers and APIs on Apify. Here's what the revenue split actually looked like in early April before things went sideways.

Actor	Users	Monthly revenue	% of total
Google Maps Lead Intel	2	$540	83%
LinkedIn Employee Scraper	37	$42	6.5%
YouTube Transcript	40	$28	4.3%
Google Scholar	18	$14	2.1%
Email Validator API	46	$11	1.7%
Website Intelligence API	22	$8	1.2%
Everything else (5 actors)	188	$7	1.1%
Total	353	$650	100%

Read that first row again. Two users. $540. More than the other 351 users combined, by a factor of like 7x.

I told myself this was fine because the product was working and the buyers were happy. Both things were true in early March. Neither was true by late March. I just didn't know.

The silence

March 25 was the last run either agency executed. I didn't flag it because nothing explicit broke. No error email, no angry message, no refund request. My Apify dashboard just showed fewer runs but the rolling 30 day number still looked okay because it was still averaging in the fat weeks from before.

Here's roughly how the number moved:

Early April: $268 rolling 30d profit. I'm feeling smug.
Mid April: $92. I figure maybe it's a slow week.
April 22: $50. I finally open the run logs.

By the time I looked, the last run from either agency was 29 days ago. Whatever issue they had (I still don't fully know), they decided it wasn't worth telling me about. They just left.

Agency users don't complain. They just stop paying. If you're building for them, burn that into your forehead.

What I found when I actually looked

This is the part that made me feel physically ill. Once I started doing a proper audit of my portfolio, I found that 6 of my other 10 monetized actors were broken in some way. Not all catastrophic. Some were returning partial data. One had a silently failing selector from a site redesign in February. One was charging $0 per run because of a broken Actor.charge() signature I'd introduced in a refactor.

Let me repeat that: I had actors that were executing successfully, returning data to users, and billing them exactly nothing. For weeks.

If one of those 2 agencies had tried a second actor of mine during that period, they'd have gotten rot. That's probably why they didn't come back.

The root cause wasn't the bugs though. Bugs happen. The root cause was that my attention was entirely on the $25/run cash cow because it was paying the bills. The cash cow was hiding the state of the herd.

Why 2 agency users paid more than 353 devs

I want to sit with this one because I think most solo devs misunderstand it.

My Google Maps Lead Intel actor charges $25 per successful run. It scrapes a geography, enriches each business with a website audit, scores them, and hands the agency a ranked list of cold outreach targets. One agency was running it on a schedule against 40 US cities per week. At $25 a pop, that's serious money.

The 353 devs on my other actors were paying $0.003 to $0.01 per row. They're hobbyists, students, one guy building a thesis scraper. They're lovely. They're also economically irrelevant to whether I can pay rent.

Two lessons fell out of this.

First, your paying users and your popular users are almost never the same people. Popularity on Apify Store is a vanity metric. Agency retention is the only metric that buys groceries.

Second, concentrated revenue is fragile in ways that only hurt you once. When I had 2 whales, my revenue was 83% dependent on their mood. The moment either whale left, my month was destroyed. Worse, because they paid so much, I built no alerting around them. I assumed I'd notice. I did not notice.

If you're reading this as an agency owner looking for scraping tools, what you actually want is a vendor who is not dependent on you. Someone with 50 paying clients will answer your ticket faster than someone with 2, because the guy with 2 is terrified of you and therefore weirdly slow to respond to bad news.

What I'm changing

Not writing a manifesto. Just the four things I'm doing this week.

Push alerting on every run. Apify has webhooks. I never wired them up because my dashboard was enough. It wasn't. Every successful run, every failed run, every billing event now pings a private Telegram channel I actually read. Here's the whole snippet, it's embarrassingly short:

// webhooks config in actor.json points at this endpoint
// payload is whatever Apify sends plus the run metadata
export async function notify(req, res) {
    const { eventType, eventData, resource } = req.body;
    const text = `[${eventType}] ${resource?.actId} run ${resource?.id}\n`
               + `status: ${resource?.status}\n`
               + `charged: $${resource?.usageTotalUsd ?? 0}`;
    await fetch(`https://api.telegram.org/bot${TG_TOKEN}/sendMessage`, {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({ chat_id: TG_CHAT, text }),
    });
    res.status(200).end();
}

That's it. If I'd had this on March 25, I'd have noticed the absence of runs within 48 hours, not 29 days. If you run anything that bills users, stop reading and go wire this up. Seriously.

Weekly self test of every monetized actor. Every Sunday I run each of my paid actors against a known input and diff the output against last week's. If the schema changes or the row count collapses, I know before the user does. This is stupid simple and I should have been doing it from day one.

Diversifying the buyer pool. $25/run is staying. But I'm actively building out the $5 to $10 tier with two new actors targeted at small agencies, because I want the bottom of the revenue chart to be less wobbly. Ten $50/month buyers survive any one of them leaving. Two $270/month buyers don't.

Public status page. Still building this but the idea is: if an actor is degraded, the user knows before they hit it. Trust compounds and I just burned a chunk of it, so I'm paying interest now.

Close

If you run any product that bills users on autopilot, go wire a Telegram or Slack webhook today. Not tomorrow. The 30 day rolling dashboard lies to you when things trend down because it's still averaging in good weeks. Push alerts don't lie. Runs either happen or they don't.

I'm writing this mostly for me. But if you want to see what the portfolio looks like now, or hire the actor that caused all this drama in the first place, it's here:

Apify profile: https://apify.com/george.the.developer
Google Maps Lead Intel (the $25/run one): on the same profile
Everything else I've shipped (27 public actors, some broken last week, all fixed now): same profile

Ask me anything in the comments. Especially if you're an agency buyer thinking about pulling the trigger on a vendor. I have thoughts about what you should actually be looking for.

Two APIs I Built This Week That Cost Nothing to Run

George Kioko — Thu, 23 Apr 2026 11:50:41 +0000

Most APIs have a dirty secret in their pricing: the upstream service they call costs money, and that cost gets passed to you plus margin. LLM based APIs charge you for tokens. Geocoding APIs charge you for lookups. Data enrichment APIs charge you for the enrichment source.

I wanted to build APIs where the underlying operation costs literally zero. Here are two I shipped this week.

API 1: DNS Record Checker

Node.js ships with a built in dns module. It can resolve A records, MX records, CNAME, TXT, NS, and more. No external API call needed. No third party service. The DNS resolution happens through the operating system's resolver, which is free.

import dns from 'dns/promises';

const records = await dns.resolveAny('example.com');
// Returns A, AAAA, MX, TXT, NS, SOA records

That's it. Zero dependency, zero API cost, zero rate limits from upstream providers.

The actor wraps this into a clean JSON API. Pass it a domain, get back every DNS record type with TTLs, priorities for MX records, and SPF/DKIM/DMARC validation. The whole thing runs on Apify's Standby infrastructure so it responds in under a second.

Use cases that keep coming up: automated domain verification for SaaS onboarding, email deliverability checks (MX + SPF + DKIM in one call), security audits scanning for misconfigured DNS, and monitoring tools that alert when records change unexpectedly.

API 2: Sentiment Analysis

The common approach to sentiment analysis is sending text to an LLM and paying per token. That works but it's expensive at scale and adds latency.

Instead I used a word level lexicon approach. The API scores text using a pre built dictionary of ~7,000 words with known sentiment values. No LLM call. No external API. The scoring runs entirely in memory on the Node.js process.

// Simplified version of the scoring logic
const score = words.reduce((sum, word) => {
  return sum + (lexicon[word] || 0);
}, 0) / words.length;

The result includes an overall sentiment score, confidence level, and breakdown of positive vs negative word matches. It handles negation ("not good" scores negative) and intensifiers ("very good" scores higher than "good").

Is it as nuanced as GPT? No. But for brand monitoring, review analysis, social media tracking, and content moderation at scale, a deterministic lexicon approach that returns in 50ms beats a 2 second LLM call that costs 10x more.

The pattern worth noticing

Both of these APIs follow the same principle: use what's already built into the runtime or ship a static dataset with the code. No external dependencies that cost money per call.

This matters because of what I've seen with my existing domain tools. The WHOIS Lookup actor has power users running 262 lookups per user on average. Domain and DNS tools get embedded in automated workflows and run at high volume. When your per call cost is zero, your margin stays healthy no matter how much a single user hammers the API.

Pricing

DNS Record Checker: $0.003 per lookup. Sentiment Analysis: $0.003 per text analysis. Both running on Apify Standby mode for instant responses.

The infrastructure cost is just Apify compute time. No upstream API bills eating into revenue.

Try them on Apify:

Built in Nairobi. 52 actors, zero external API costs on these two. Comments and questions welcome.

2 Users Pay Me More Than 353 Users: The Pricing Lesson That Changed Everything

George Kioko — Thu, 23 Apr 2026 11:49:30 +0000

I have 48 actors running on Apify. Same platform, same developer, same tech stack. Two of those actors tell completely different stories about how software makes money.

My LinkedIn Employee Scraper has 353 users. It runs thousands of times per month. It charges $0.005 per profile scraped. Total monthly revenue from all those users and all those runs? About $9.

My Google Maps Lead Intel actor has 2 users. Two. They run it about 22 times per month between them, paying roughly $25 per run. Monthly revenue? Around $540.

That is a 60x difference in revenue per user. Same platform. Same developer. Same billing system.

What Makes Google Maps Worth $25 a Run

The LinkedIn scraper returns raw data. Names, titles, company info. It does one thing and does it well, but developers treat it like a commodity. They plug it into their own pipelines and expect it to cost almost nothing. At $0.005 per profile, it basically does.

Google Maps Lead Intel returns something different. For every business it finds, you get validated email addresses, a lead score based on 12 online presence signals, Google Ads detection, website tech stack analysis, social media profiles, and review sentiment. It is not scraping. It is intelligence.

The two users paying $25 per run are lead generation agencies. One services appointment setting clients across 15 metro areas. The other runs local SEO audits. For both of them, a single $25 run replaces 3 to 4 hours of manual research that would cost $200+ if done by a VA.

The Buyer Problem

Here is what I missed for months: the LinkedIn scraper attracts developers. Developers are price sensitive. They can build their own scraper given enough time, so they benchmark your tool against their hourly rate. If your scraper costs more than 20 minutes of their time to build, they will build it themselves.

The Google Maps actor attracts agencies. Agency buyers think in terms of client value, not engineering time. If their client pays $1,500/month for lead gen services and your tool costs $25 per market, that is a rounding error in their margin. They do not negotiate. They do not churn. They run it more as they sign more clients.

Same platform. Totally different buyer psychology.

What Actually Changed

The technical shift was not dramatic. I stopped returning raw JSON blobs and started returning enriched, scored, validated output. Specifically:

Raw Google Maps results became leads with quality scores
Guessed emails became validated emails with deliverability checks
Basic business info became competitive intelligence with ad spend signals
Flat data became actionable reports that agencies could forward to clients

The pricing shift followed naturally. When your output saves someone 4 hours of work and costs them $25, you are not competing on data volume. You are competing on time saved and decision quality.

The Numbers I Wish I Knew Earlier

353 users at $0.005/run = roughly $9/month. Those users submit support tickets, request features, and compare you to 6 other LinkedIn scrapers in the Apify Store.

2 users at $25/run = roughly $540/month. Those users send you "thank you" messages and ask if you can build them something custom.

If I could go back and rebuild my portfolio from scratch, I would build fewer tools and make each one solve a complete problem for a specific buyer. Not "scrape this website" but "find me qualified leads in this market with contact info I can trust."

The Takeaway

Stop counting users. Start counting revenue per user. Build for the buyer who measures your tool against the cost of the alternative, not against the cost of building it themselves. Package intelligence, not data.

The developer who needs 10,000 LinkedIn profiles will always shop on price. The agency owner who needs 200 qualified leads by Friday will pay whatever gets it done.

I know which buyer I am building for now.

Built in Nairobi. 48 actors in production. Questions? Drop them below.

The 5 APIs That Run 200+ Times Per User (And Why That Matters)

George Kioko — Thu, 23 Apr 2026 11:47:43 +0000

Most developer tools get used a handful of times. Someone finds your API, tries it on a test case, maybe runs it a dozen more times, then moves on. That is the normal pattern. Out of 38 actors I have running on Apify, most average 5 to 20 runs per user. Respectable numbers.

But five of them break the pattern completely. These five average 100 to 260 runs per user. Not because of better marketing or a viral tweet. Because they solve problems that require bulk processing by design.

The Numbers

Here is the actual usage data from my Apify dashboard:

API	Runs Per User
Domain WHOIS Lookup	262
Google Scholar Scraper	230
AI Content Detector	132
Website Tech Detector	126
Email Validator	105

Compare that to something like the LinkedIn Employee Scraper, which has 37 users but averages about 17 runs each. LinkedIn users grab the data they need and stop. WHOIS users feed in hundreds of domains every single session.

Why These Five?

The common thread is not the subject matter. It is the workflow. Every one of these tools plugs into a process where the user already has a list and needs to process all of it:

Domain WHOIS Lookup (262 runs/user): Security researchers and domain investors run this on batches of suspicious domains. When a phishing campaign registers 10,000 domains with similar naming patterns, someone needs registrar data, creation dates, and nameservers for every single one. That is not a one time task. New domains appear daily.

Google Scholar Scraper (230 runs/user): Academic researchers doing systematic literature reviews or bibliometric analysis. They need every paper matching a query, with citations, h index scores, and author profiles exported as structured JSON. One research project can require pulling data on thousands of papers across multiple search terms.

AI Content Detector (132 runs/user): Content moderation teams, academic integrity offices, and publishers who need to scan entire content catalogs. Checking one essay at a time is pointless when you have 500 submissions or 2,000 product descriptions to verify. The bulk API call is the only thing that makes this practical.

Website Tech Detector (126 runs/user): Sales development teams that need technology intelligence on their entire prospect list. If you are selling a React migration service, you need to know which of your 3,000 target companies still run Angular or jQuery. Feed in the list, get back frameworks, CDNs, analytics tools, CMS platforms in clean JSON.

Email Validator (105 runs/user): Cold outreach operators who clean their lists before every campaign. A 5% bounce rate destroys your sender reputation, so smart operators validate 500 to 5,000 emails before hitting send. They do this before every single campaign, not once.

What This Means for Builders

The lesson is simple: if your API solves a problem that people encounter once, you need constant marketing to keep new users flowing in. If your API solves a problem that people encounter in batches, repeatedly, you get sticky users who come back on their own.

None of these five APIs went viral. None of them got featured in a newsletter. The WHOIS lookup has 7 total users. But those 7 users have collectively run it 1,837 times. That is revenue without marketing spend.

The best APIs are not the ones with the most users. They are the ones where each user cannot stop running them.

Try Them

All five are live on the Apify Store under my profile (george.the.developer), priced per call with no monthly subscription. Domain WHOIS at $0.005/lookup, Scholar at $0.004/paper, AI Detector at $0.003/text, Tech Detector at $0.005/site, Email Validator at $0.002/email.

Built in Nairobi. 38 actors, 700+ users, 14,000+ total runs.

Google Scholar Has No API Either. Here's What 5,000 Runs Taught Me

George Kioko — Thu, 23 Apr 2026 11:46:16 +0000

Google Scholar is the single most important search engine for academic research. Billions of papers indexed, citation counts, author profiles, related work links. And Google has never released an official API for it.

Not deprecated. Not restricted. Just... never built one.

If you want to programmatically search Google Scholar, grab paper titles, authors, citation counts, and PDF links, you are on your own. So I built an actor that does exactly that.

What It Pulls

You give it a search query (like "transformer architecture attention mechanism") and it returns structured data:

{
  "title": "Attention Is All You Need",
  "authors": "A Vaswani, N Shazeer, N Parmar...",
  "citationCount": 112847,
  "year": "2017",
  "url": "https://arxiv.org/abs/1706.03762",
  "pdfUrl": "https://arxiv.org/pdf/1706.03762",
  "snippet": "The dominant sequence transduction models are based on complex recurrent..."
}

Paper titles, author lists, citation counts, publication year, direct links, and PDF URLs when available. Everything a researcher needs to build a literature review or track citations over time.

The Numbers Tell a Story

Here's where it gets interesting. The actor has 22 users and 5,065 total runs. Do the math on that ratio: 230 runs per user on average.

These are not casual users clicking "Run" once to test it. These are power users running it at scale. Academics building citation databases. Research firms tracking publication trends across thousands of queries. AI companies monitoring new papers in their domain.

That run to user ratio is the strongest signal I have that this tool solves a real problem. When someone runs your tool 200+ times, they have built it into a workflow.

Why Scholar Is Hard to Scrape

Google Scholar is notoriously aggressive about blocking automated access. It will throw CAPTCHAs after just a handful of requests from the same IP. Most simple scraping scripts break within minutes.

The actor handles this with:

Proxy rotation across residential IPs
Session management to maintain cookies between requests
Randomized delays that mimic human browsing patterns
Automatic retry logic when a request gets blocked

I also had to deal with Google's inconsistent HTML. Scholar's markup changes subtly over time. Element class names shift, layout structures get tweaked. The parser needs regular maintenance to keep working.

Who Uses This

Three main groups keep showing up:

Academics and PhD students building systematic literature reviews. Instead of manually searching and copying results, they run batch queries and get structured data they can feed into reference managers or spreadsheets.

Research firms and think tanks tracking publication trends. They want to know how many papers mention "large language models" per quarter, or which authors are publishing most frequently in a specific subfield.

AI and ML teams monitoring state of the art. When a new paper drops with high early citation velocity, they want to know about it fast.

Try It

The actor is on the Apify Store with pay per result pricing ($0.004 per paper): Google Scholar Scraper

If you have ever copy pasted results from Google Scholar into a spreadsheet, this will save you hours. And if you are doing it at scale, it will save you from getting IP banned.

Built in Nairobi by George. 40+ actors, 5,000+ runs on Scholar alone.

YouTube Has No Transcript API So I Built One (150 Users Later)

George Kioko — Thu, 23 Apr 2026 11:44:51 +0000

You know what's wild? YouTube, a Google product, has no official API for pulling video transcripts. You can upload, search, and manage playlists through their API. But if you want the actual words spoken in a video? Good luck.

I ran into this wall in late 2025 while building a content repurposing tool. I needed transcripts from YouTube videos to feed into an LLM for summarization. The YouTube Data API v3 gives you metadata, thumbnails, view counts. But transcripts? Nope.

So I built my own.

What It Actually Does

The actor loads a YouTube video page, grabs the auto generated captions that YouTube creates for most videos, and returns clean text with timestamps. It supports multiple languages because YouTube generates captions in different languages automatically.

Here's what the output looks like:

{
  "videoUrl": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
  "title": "Example Video Title",
  "language": "en",
  "transcript": [
    { "text": "Welcome to this tutorial", "start": 0.0, "duration": 2.5 },
    { "text": "Today we are going to cover", "start": 2.5, "duration": 3.1 }
  ]
}

No API key needed. No OAuth flows. Just pass in a video URL and get the transcript back.

The Numbers After 8 Months

I published this on the Apify Store and kind of forgot about it for a while. Then I checked the dashboard:

154 users have tried it
1,737 total runs across all users
It's one of my most popular actors out of 40+

The thing that surprised me was who's using it. I expected developers. And yes, developers building AI pipelines are a big chunk. But I also see researchers pulling transcripts from lecture series, content creators repurposing their own videos into blog posts, and marketing teams analyzing competitor video content.

The Hard Parts

YouTube does not make this easy. Captions are loaded dynamically through a separate request after the page renders. The URL for the caption track is embedded inside a massive JSON blob in the page source. Finding and parsing that reliably took more debugging than the actual extraction logic.

The other challenge: some videos have manually uploaded captions, some have auto generated ones, and some have both. The actor handles all three cases and lets you pick which language you want.

Rate limiting is real too. YouTube will throttle you if you hammer it. The actor spaces out requests and uses session management to stay under the radar.

Why Not Just Use a Python Library?

There are Python packages like youtube_transcript_api that do something similar. They work fine for one off scripts. But when you need to run this at scale, on a schedule, with proxy rotation and automatic retries, you want infrastructure around it.

That's what Apify gives you. The actor runs in the cloud, handles failures gracefully, and stores results in a dataset you can export to JSON, CSV, or push to a webhook.

Try It

The actor is free to run on Apify (you just pay for compute, which is pennies per video): YouTube Transcript Extractor

If you are building anything that needs video content as text, save yourself the headache of reverse engineering YouTube's caption system. Someone already did that part for you.

Built in Nairobi by George. 40+ actors on the Apify Store, 154 users on this one alone.

My LinkedIn Scraper Just Hit Top 20 on Apify — Here's How I Built It

George Kioko — Thu, 23 Apr 2026 11:41:25 +0000

I woke up last week to an email from Apify saying my LinkedIn Employee Scraper had earned the Rising Star badge — meaning it cracked the top 20 actors on the entire platform. 176 users, 2,430 runs, and counting.

This is the story of how a side project built in Nairobi turned into one of the most-used LinkedIn scrapers on Apify.

The Problem: LinkedIn Has No Real API for Employee Data

If you've ever tried to pull employee data from LinkedIn programmatically, you already know the pain. LinkedIn's official API is locked down tight — you need partner status or a Sales Navigator license ($800–$1,200/month) just to get basic company employee info.

For indie developers, recruiters building internal tools, or startups doing competitive intel, that price tag kills the project before it starts. I needed a different approach.

How It Works: Playwright + Crawlee + Anti-Detection

The scraper runs as an Apify Actor using Crawlee (Apify's open-source crawling framework) with Playwright driving a real Chromium browser. Here's the core pattern:

import { Actor } from 'apify';
import { PlaywrightCrawler } from 'crawlee';

await Actor.init();
const input = await Actor.getInput();
const { linkedinUrls = [], maxProfiles = 50 } = input;

const crawler = new PlaywrightCrawler({
  proxyConfiguration: await Actor.createProxyConfiguration({
    groups: ['RESIDENTIAL'],
  }),
  launchContext: {
    launchOptions: {
      headless: true,
      args: ['--no-sandbox', '--disable-blink-features=AutomationControlled'],
    },
  },
  minConcurrency: 1,
  maxConcurrency: 2,

  async requestHandler({ page, request }) {
    // Human-like delay between actions
    await page.waitForTimeout(2000 + Math.random() * 3000);

    const employees = await page.$$eval('.org-people-profile-card', cards =>
      cards.map(card => ({
        name: card.querySelector('.artdeco-entity-lockup__title')?.innerText?.trim(),
        title: card.querySelector('.artdeco-entity-lockup__subtitle')?.innerText?.trim(),
        profileUrl: card.querySelector('a')?.href,
      }))
    );

    await Actor.charge({ eventName: 'profile-scraped', count: employees.length });
    await Actor.pushData(employees);
  },
});

await crawler.addRequests(linkedinUrls.map(url => ({ url })));
await crawler.run();
await Actor.exit();

The architecture isn't complicated, but the details are what make it survive in production. LinkedIn is one of the most aggressive anti-bot platforms out there, so every layer matters.

What Actually Keeps It Running

Three things separate a LinkedIn scraper that works once from one that runs 2,430 times without breaking:

Session management. Instead of logging in fresh every run, the scraper persists cookies and reuses sessions. This mimics real user behavior and avoids triggering LinkedIn's "new device" verification flow.

Residential proxies. Datacenter IPs get flagged within minutes on LinkedIn. The actor routes through Apify's residential proxy pool, rotating IPs per request. Each request looks like it comes from a different home internet connection.

Randomized timing. No fixed delays. Every pause between actions uses Math.random() to vary between 2–5 seconds. Linear timing patterns are the easiest signal for bot detection systems to catch.

I also limit concurrency to 1–2 parallel requests max. It's slower, but LinkedIn's rate limiting is harsh enough that going faster just burns through proxy credits with nothing to show for it.

The Numbers

Here's where the scraper stands today:

176 users on Apify Store
2,430 total runs in production
Rising Star badge — top 20 actor on the platform
Pay-per-event pricing at $0.004 per profile scraped

For context, I launched this about a year ago as one of my first Apify actors. It started getting steady traction around the 500-run mark, and growth has been compounding since. The Rising Star badge was a genuine surprise — I didn't realize it had climbed that high until the notification hit my inbox.

Lessons Learned Building Scrapers at Scale

LinkedIn changes its DOM constantly. I've had to update selectors at least four times. If you build a LinkedIn scraper, abstract your selectors into a config object so you can patch them without rewriting handler logic.

Users will throw anything at your actor. Company pages with 50,000 employees, URLs with typos, private profiles, pages behind auth walls. Defensive coding isn't optional — it's the entire job. Every edge case that crashes your actor is a 1-star review waiting to happen.

Pay-per-event pricing works. Charging per profile scraped instead of per run aligns cost with value. Users scraping 10 profiles pay less than users scraping 10,000. This keeps casual users happy while still generating real revenue from power users.

Good README = more users. My most-used actors all have detailed READMEs with input/output examples, Mermaid architecture diagrams, and clear pricing breakdowns. Developers don't install tools they can't understand in 30 seconds.

What's Next

I'm currently running 38+ actors on Apify covering everything from Google Scholar to Telegram channels to OFAC sanctions data. The LinkedIn scraper remains my top performer, and I'm working on v2 with better pagination handling and support for scraping by department filters.

If you're building scrapers and want to see the code, everything is on GitHub. If you just need LinkedIn employee data without building anything, the actor is ready to run on Apify Store.

Apify Store: https://apify.com/george.the.developer
GitHub: https://github.com/the-ai-entrepreneur-ai-hub

Built in Nairobi. Questions about the scraper or Apify actors in general — drop them in the comments.