<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Erik Strömberg</title>
    <description>The latest articles on Forem by Erik Strömberg (@apa512).</description>
    <link>https://forem.com/apa512</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3909710%2Fddbe8ebe-8884-4764-a811-a74fa8343740.jpeg</url>
      <title>Forem: Erik Strömberg</title>
      <link>https://forem.com/apa512</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/apa512"/>
    <language>en</language>
    <item>
      <title>There's no clean way to verify an email address. Here's what works anyway.</title>
      <dc:creator>Erik Strömberg</dc:creator>
      <pubDate>Sat, 09 May 2026 01:49:53 +0000</pubDate>
      <link>https://forem.com/apa512/theres-no-clean-way-to-verify-an-email-address-heres-what-works-anyway-36bn</link>
      <guid>https://forem.com/apa512/theres-no-clean-way-to-verify-an-email-address-heres-what-works-anyway-36bn</guid>
      <description>&lt;p&gt;You have a list of addresses. Maybe they came from a signup form, a partner CRM, an enrichment run, or last quarter's webinar. You want to know which ones are real before you put them on a campaign and torch your sender reputation.&lt;/p&gt;

&lt;p&gt;You search "email verification" and find a hundred services with landing pages claiming 99% accuracy. You install the obvious package, run it on your list, and 95% come back "valid." You send. A quarter of them bounce, and the major inbox providers start flagging your domain.&lt;/p&gt;

&lt;p&gt;What happened? Some combination of: syntax-valid addresses that don't exist, mail servers that lie, catch-all domains, greylisting, and anti-probe behavior. "This is a real mailbox" is much harder to prove than it looks. Here's how the protocol actually works, where it breaks, and what a serious verifier has to do about it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Syntax checks filter the obvious garbage and nothing else
&lt;/h2&gt;

&lt;p&gt;Run a regex against &lt;code&gt;[email protected]&lt;/code&gt; and you'll catch the obvious malformed strings. Use a real RFC 5322 parser and you'll catch a few more (&lt;code&gt;john..doe@example.com&lt;/code&gt;, leading whitespace, addresses with control characters).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;
&lt;span class="n"&gt;EMAIL_RE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;compile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;^[^@\s]+@[^@\s]+\.[^@\s]+$&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;EMAIL_RE&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;match&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;definitelynotreal@gmail.com&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# matches
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;definitelynotreal@gmail.com&lt;/code&gt; passes every syntax check ever written. It does not exist. Syntax validation tells you whether a string &lt;em&gt;could&lt;/em&gt; be an address; it tells you nothing about whether it &lt;em&gt;is&lt;/em&gt; one.&lt;/p&gt;

&lt;p&gt;You still want this as a first pass — there's no point burning network calls on &lt;code&gt;not an email&lt;/code&gt;. But anyone who ships a regex as their email verifier is solving a different problem than they think they are.&lt;/p&gt;

&lt;h2&gt;
  
  
  DNS proves the domain accepts mail (and nothing else)
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;dig +short MX example.com
&lt;span class="c"&gt;# 0 mail.example.com.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If a domain has no MX record (and no fallback A record per &lt;a href="https://datatracker.ietf.org/doc/html/rfc5321" rel="noopener noreferrer"&gt;RFC 5321&lt;/a&gt;), it doesn't accept mail at all. Marking the address invalid is correct. This catches typo-domains, expired domains, and domains that were never set up for email.&lt;/p&gt;

&lt;p&gt;But &lt;code&gt;gmail.com&lt;/code&gt; has MX records. So does every Fortune 500. So does every catch-all spam trap. MX-exists tells you the domain is in the mail business, not that the address you care about exists on it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Talking SMTP to the server
&lt;/h2&gt;

&lt;p&gt;This is where every "real" verifier lives. You connect to the destination MX server, walk through the SMTP handshake, and stop one step short of actually sending the email:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;openssl s_client &lt;span class="nt"&gt;-starttls&lt;/span&gt; smtp &lt;span class="nt"&gt;-connect&lt;/span&gt; gmail-smtp-in.l.google.com:25 &lt;span class="nt"&gt;-crlf&lt;/span&gt;
&lt;span class="go"&gt;220 mx.google.com ESMTP ready
EHLO verifier.example.com
250-mx.google.com at your service
&lt;/span&gt;&lt;span class="c"&gt;...
&lt;/span&gt;&lt;span class="gp"&gt;MAIL FROM:&amp;lt;probe@verifier.example.com&amp;gt;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="go"&gt;250 2.1.0 OK
&lt;/span&gt;&lt;span class="gp"&gt;RCPT TO:&amp;lt;linus@gmail.com&amp;gt;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="go"&gt;550 5.1.1 The email account that you tried to reach does not exist
QUIT
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The signal is in the &lt;code&gt;RCPT TO&lt;/code&gt; response:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;250&lt;/code&gt; → server says it would accept mail for this address&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;550&lt;/code&gt; / &lt;code&gt;551&lt;/code&gt; → mailbox doesn't exist&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;4xx&lt;/code&gt; → temporary failure, try again later&lt;/li&gt;
&lt;li&gt;nothing, eventually a timeout → ¯\_(ツ)_/¯&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A naive verifier writes this loop in twenty lines and ships. It is wrong roughly as often as it is right, for reasons that have nothing to do with the code.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why the SMTP handshake lies
&lt;/h2&gt;

&lt;p&gt;Modern mail servers know that automated probers exist, and they don't make life easy for them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Catch-all domains.&lt;/strong&gt; Many companies configure their inbound to accept &lt;em&gt;any&lt;/em&gt; address at their domain and route unmatched ones to a default mailbox or a black hole. Probe &lt;code&gt;xq8z29zz@somecompany.com&lt;/code&gt; and you get &lt;code&gt;250 OK&lt;/code&gt;. Probe &lt;code&gt;ceo@somecompany.com&lt;/code&gt; and you get &lt;code&gt;250 OK&lt;/code&gt;. They both look identical from the outside; one of them is the CEO and the other is gibberish. If the domain is catch-all, RCPT TO is meaningless.&lt;/p&gt;

&lt;p&gt;You detect this by sending a deliberate-fake probe first and inferring catch-all from a positive response — something like &lt;code&gt;definitely-does-not-exist-12345@domain.com&lt;/code&gt;. Any verifier that doesn't do this catch-all check is silently classifying random nonsense as valid mail on every catch-all domain it sees.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Greylisting.&lt;/strong&gt; A receiver returns &lt;code&gt;451 try again later&lt;/code&gt; on first contact from an unknown sender. Legit MTAs queue and retry minutes or hours later. Probers usually don't. Naive verifiers mark these as failed; the addresses are fine.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Anti-probe behavior on the big providers.&lt;/strong&gt; Gmail, Outlook, Office 365, Proofpoint, Mimecast, and several other large inbound systems either always return &lt;code&gt;250&lt;/code&gt; regardless of the mailbox, or always return &lt;code&gt;4xx&lt;/code&gt; to anything that smells like a verifier. A handshake against &lt;code&gt;gmail.com&lt;/code&gt; does not tell you whether the address exists on Gmail; it tells you that Gmail received a connection. Any verifier that reports a clean "valid" on a Gmail address from a single RCPT TO probe is making it up. Serious tools maintain a list of these providers and fall back to other signals when they hit one.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Outbound IP reputation.&lt;/strong&gt; Even when the receiver is willing to give you a real answer, it'll only do it if your sending IP doesn't look hostile. If you've been hammering a domain — or, more likely, if the IP block you happen to be on has been hammering it — you'll be tarpitted, throttled, or refused at HELO. Running verification from a residential IP, an EC2 box without rDNS, or a VPN basically doesn't work.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tarpitting.&lt;/strong&gt; Some servers respond to &lt;code&gt;RCPT TO&lt;/code&gt; slowly on purpose — 30 seconds per probe, deliberately — to make automated verification economically unviable. Your verifier needs to handle long timeouts on some domains without falling over on the rest.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Some servers only check on DATA.&lt;/strong&gt; They accept any &lt;code&gt;RCPT TO&lt;/code&gt; and only bounce after you've sent the body. Verification-without-sending is impossible against those servers; the most you can do is flag them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Port 25 is often blocked outbound.&lt;/strong&gt; Most cloud providers and residential ISPs block outbound 25 to limit spam. A naive verifier from your laptop or your default EC2 instance will silently fail the SMTP step on every domain. Real verifiers connect from infrastructure with port 25 open, and fall back to 587/465 with STARTTLS when needed.&lt;/p&gt;

&lt;h2&gt;
  
  
  What "good" verification looks like
&lt;/h2&gt;

&lt;p&gt;A serious verifier does, roughly:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Syntax-check the address.&lt;/li&gt;
&lt;li&gt;Suggest typo corrections for common domains (&lt;code&gt;gmial.com&lt;/code&gt; → &lt;code&gt;gmail.com&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;Look up MX (and A as fallback).&lt;/li&gt;
&lt;li&gt;Check disposable-domain lists (Mailinator, 10MinuteMail, Guerrilla Mail, and a few hundred others).&lt;/li&gt;
&lt;li&gt;Identify the receiver: Gmail, Outlook/Office 365, Yahoo, ProtonMail, Proofpoint, Mimecast, an in-house Postfix, etc. The provider determines which signals are trustworthy.&lt;/li&gt;
&lt;li&gt;For receivers known to give honest &lt;code&gt;RCPT TO&lt;/code&gt; responses, probe — from a warmed-up IP with valid rDNS, sensible HELO, and conservative pacing per destination domain.&lt;/li&gt;
&lt;li&gt;Detect catch-all by probing a fake address first.&lt;/li&gt;
&lt;li&gt;Honor &lt;code&gt;4xx&lt;/code&gt; by retrying with backoff over hours, not seconds.&lt;/li&gt;
&lt;li&gt;For receivers known to lie, fall back to historical signals: has this address shown up in our previous deliveries? Has it bounced before?&lt;/li&gt;
&lt;li&gt;Classify each result honestly. "Valid / invalid" is the wrong vocabulary; you need at least three buckets — &lt;em&gt;deliverable&lt;/em&gt;, &lt;em&gt;undeliverable&lt;/em&gt;, and &lt;em&gt;risky&lt;/em&gt; (catch-all, greylisting timeout, large-provider unknown). Sending into the risky bucket is a business decision, not a technical one.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The infrastructure question is harder than the protocol question. You need a pool of warmed sending IPs, a monitoring system for IPs starting to bounce, per-receiver rate limiters, a queue that respects greylist retry intervals, and a database of which providers behave how. Get any of that wrong and the answers you get are noise.&lt;/p&gt;

&lt;h2&gt;
  
  
  Things people get burned by
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Treating "catch-all" as "valid."&lt;/strong&gt; Catch-all means &lt;em&gt;I don't know whether this mailbox exists&lt;/em&gt;, not &lt;em&gt;yes, it does&lt;/em&gt;. Sending into a catch-all domain blindly is one of the cleanest ways to end up flagged for spam, because spam traps love catch-alls.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Trusting role-based addresses.&lt;/strong&gt; &lt;code&gt;info@&lt;/code&gt;, &lt;code&gt;support@&lt;/code&gt;, &lt;code&gt;sales@&lt;/code&gt;, &lt;code&gt;noreply@&lt;/code&gt; are almost always deliverable. They're almost never the right address for cold outreach, and many ESPs treat marketing mail to role addresses as a strong spam signal.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Running checks at send time.&lt;/strong&gt; Don't verify in your signup form's request handler. The verification call can take seconds, sometimes tens of seconds. Verify async, or against a cached lookup, never inline.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bulk-probing without pacing.&lt;/strong&gt; A loop that fires 1,000 RCPT TO calls at the same Google Workspace tenant gets that tenant's whole verification surface to lock you out for the rest of the day. Per-domain pacing is mandatory.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ignoring the score.&lt;/strong&gt; A binary "valid: true" hides a lot. An address that passes syntax + MX + disposable but where the SMTP step had to be assumed (problematic provider, timeout, IP-blocked) is a different animal from one that got a clean &lt;code&gt;250&lt;/code&gt; from a non-catch-all server. Anything that doesn't expose its confidence is hiding bad news.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The shortcut
&lt;/h2&gt;

&lt;p&gt;This is what &lt;code&gt;/api/v1/email_verifications&lt;/code&gt; does at &lt;a href="https://peopledb.co" rel="noopener noreferrer"&gt;PeopleDB&lt;/a&gt;. The protocol piece, the receiver classification, the catch-all detection, the IP reputation, the disposable-domain list, and the typo suggestions all run on the server side; you get one HTTP call:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="s2"&gt;"https://peopledb.co/api/v1/email_verifications?email_address=somebody@example.com"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer &lt;/span&gt;&lt;span class="nv"&gt;$PEOPLEDB_TOKEN&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"email"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"somebody@example.com"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"valid"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"classification"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"valid"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"score"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;95&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"checks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"syntax"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"mx_record"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"smtp_deliverable"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"disposable"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"typo_suggestion"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"accepts_any_email"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"warnings"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"errors"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The classification has three states: &lt;code&gt;valid&lt;/code&gt;, &lt;code&gt;risky&lt;/code&gt;, &lt;code&gt;invalid&lt;/code&gt;. &lt;em&gt;Risky&lt;/em&gt; is the honest one — catch-all domains, large providers that won't tell you the truth over SMTP, and disposable addresses all surface there. The score is 0–100 so you can pick your own threshold per use case (a stricter cutoff for cold outreach than for password reset).&lt;/p&gt;

&lt;p&gt;If you've got a list, run it through this before you put it on a sender. SMTP is a thirty-year-old protocol layered with anti-abuse, and the difference between a verifier that knows that and one that doesn't is the difference between a clean send and a deliverability incident.&lt;/p&gt;

</description>
      <category>email</category>
      <category>smtp</category>
      <category>api</category>
      <category>deliverability</category>
    </item>
    <item>
      <title>There is no LinkedIn email API. Here's what to use instead.</title>
      <dc:creator>Erik Strömberg</dc:creator>
      <pubDate>Sun, 03 May 2026 14:11:28 +0000</pubDate>
      <link>https://forem.com/apa512/there-is-no-linkedin-email-api-heres-what-to-use-instead-4l92</link>
      <guid>https://forem.com/apa512/there-is-no-linkedin-email-api-heres-what-to-use-instead-4l92</guid>
      <description>&lt;p&gt;You have a LinkedIn URL. You want the person's email. You search "linkedin email api" and find a maze of marketing pages and Stack Overflow threads from 2018 that all say roughly the same thing: there isn't one.&lt;/p&gt;

&lt;p&gt;That's true. There is no first-party LinkedIn API that takes a profile URL and returns an email address. There never has been, and there isn't going to be. But there is a perfectly normal way to do this, and it's been quietly running under the hood of every CRM, sales tool, and recruiting platform for a decade. Here's what's actually going on.&lt;/p&gt;

&lt;h2&gt;
  
  
  What LinkedIn's APIs actually return
&lt;/h2&gt;

&lt;p&gt;LinkedIn does have APIs. They are aggressively gated and aggressively scoped:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Marketing Developer Platform&lt;/strong&gt; — ad campaign management, ad analytics, posting on behalf of pages. No contact data.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Talent Solutions API&lt;/strong&gt; — for ATS integrations. Returns candidates &lt;em&gt;who applied to your job&lt;/em&gt;, not arbitrary lookups.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sales Navigator API&lt;/strong&gt; — internal use by Sales Nav itself, not really a public developer surface.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sign In With LinkedIn (OIDC)&lt;/strong&gt; — returns the email of the &lt;em&gt;signed-in user&lt;/em&gt;, with their consent, scoped to that session. You cannot use this to look up anyone else.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Profile API&lt;/strong&gt; — restricted to LinkedIn-approved partners with formal agreements; even then, contact info isn't part of the schema for anyone other than the authenticated user.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Every LinkedIn API exposes data about the authenticated user (consent-driven) or paid customer activity within your own account. None of them lets you say "give me the email for &lt;code&gt;linkedin.com/in/somebody&lt;/code&gt;."&lt;/p&gt;

&lt;p&gt;Why? Because that's not what LinkedIn sells. LinkedIn's product is the platform — the network effect, InMail, recruiter seats. Letting third parties bypass that with a &lt;code&gt;GET&lt;/code&gt; would directly compete with their own monetization.&lt;/p&gt;

&lt;h2&gt;
  
  
  What "LinkedIn email API" actually means
&lt;/h2&gt;

&lt;p&gt;When developers search for it, they almost always mean this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight http"&gt;&lt;code&gt;&lt;span class="err"&gt;GET /lookup?linkedin_url=https://linkedin.com/in/somebody
→ { "email": "somebody@example.com", ... }
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's not a LinkedIn endpoint. It's an &lt;em&gt;enrichment&lt;/em&gt; endpoint, served by a third-party service that maintains its own contact database and uses the LinkedIn URL as a join key.&lt;/p&gt;

&lt;p&gt;Aggregators have been building these databases for over a decade by combining:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Public-web signals&lt;/strong&gt; — corporate websites, conference speaker bios, GitHub commits, press releases, paper authorships, podcast guest pages.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Contributed data&lt;/strong&gt; — address books voluntarily uploaded by users of various email and CRM tools, contact graphs from sales acceleration platforms, opt-in business directories.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;B2B partnerships&lt;/strong&gt; — companies that have shared customer lists in business-data co-ops in exchange for access to the pooled data.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Verification feedback&lt;/strong&gt; — every time a user marks an email as bounced or correct, the database learns.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None of this requires LinkedIn's permission, because none of it comes &lt;em&gt;from&lt;/em&gt; LinkedIn. The LinkedIn URL is the lookup key, not the source.&lt;/p&gt;

&lt;h2&gt;
  
  
  The two LinkedIn identifiers worth knowing
&lt;/h2&gt;

&lt;p&gt;Before you can hit any of these APIs, you need to extract a stable identifier from the LinkedIn URL the user gave you. There are two:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. The public identifier (slug):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://linkedin.com/in/williamhgates
                       ^^^^^^^^^^^^^^^
                       public_identifier
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Stable until the user customizes it. Most enrichment APIs accept this as the input.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. The numeric LinkedIn ID:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Sometimes you'll see URLs like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://linkedin.com/in/ACoAAA-3B7U-_b0123abc/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That long string is an opaque, account-scoped encoding of the user's internal numeric ID. It looks intimidating, but for lookup purposes it functions as just another stable identifier — useful when the slug isn't available (e.g. some recruiter or Sales Navigator URLs).&lt;/p&gt;

&lt;p&gt;When parsing user-supplied LinkedIn URLs, handle both:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;urllib.parse&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;urlparse&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;parse_linkedin_url&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;urlparse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startswith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;in/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
    &lt;span class="n"&gt;slug&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="c1"&gt;# ACoAA... = numeric URN, anything else = public identifier
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;slug&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startswith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ACoAA&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nf"&gt;return &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;linkedin_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;slug&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;return &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;linkedin_public_identifier&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;slug&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;parse_linkedin_url&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://linkedin.com/in/williamhgates&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# → ("linkedin_public_identifier", "williamhgates")
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What the lookup actually does
&lt;/h2&gt;

&lt;p&gt;Pseudocode for what a serious enrichment API runs when you hit it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;lookup&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;identifier&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# 1. Find every record across every source that matches this identifier
&lt;/span&gt;    &lt;span class="n"&gt;matches&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;profiles&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;where&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;linkedin_public_identifier&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;identifier&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;matches&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;

    &lt;span class="c1"&gt;# 2. Expand: pull in any other records that share an email or
&lt;/span&gt;    &lt;span class="c1"&gt;#    secondary identifier with the initial matches.
&lt;/span&gt;    &lt;span class="c1"&gt;#    A person often has separate records from separate sources;
&lt;/span&gt;    &lt;span class="c1"&gt;#    this is how you merge them.
&lt;/span&gt;    &lt;span class="n"&gt;matches&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;expand_by_shared_attributes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;matches&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# 3. Collect emails, phones, social handles
&lt;/span&gt;    &lt;span class="n"&gt;emails&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;dedupe&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;matches&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;emails&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;linkedin_public_identifier&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;identifier&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;email_addresses&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;emails&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;phone_numbers&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[...],&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;github_login&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;...,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;twitter_username&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;...,&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two non-obvious things going on here:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Identity resolution is the hard part.&lt;/strong&gt; There's no canonical "person ID" across sources. You have to chain matches: this LinkedIn record shares an email with a CRM record, which shares a phone with a conference-speaker entry, etc. Done well, you end up with a merged view. Done badly, you accidentally fuse two different people who happen to share a generic &lt;code&gt;info@&lt;/code&gt; inbox.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Email classification matters more than people think.&lt;/strong&gt; Anyone serious will separate personal addresses (&lt;code&gt;@gmail.com&lt;/code&gt;, &lt;code&gt;@protonmail.com&lt;/code&gt;) from work addresses. Outreach into a personal inbox has very different deliverability and acceptable-use implications than reaching someone at their employer.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  A practical example
&lt;/h2&gt;

&lt;p&gt;Here's enriching a CSV of LinkedIn URLs into emails, end to end:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;csv&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;urllib.parse&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;urlparse&lt;/span&gt;

&lt;span class="n"&gt;API&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://peopledb.co/api/v1/people&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;TOKEN&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_TOKEN&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;urlparse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startswith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;in/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
    &lt;span class="n"&gt;slug&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="nf"&gt;return &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;linkedin_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;slug&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;slug&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startswith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ACoAA&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nf"&gt;else &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;linkedin_public_identifier&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;slug&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;lookup&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;parsed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;parsed&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
    &lt;span class="n"&gt;param&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;parsed&lt;/span&gt;
    &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;API&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;param&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Authorization&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Bearer &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;TOKEN&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ok&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;

&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input.csv&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;output.csv&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;w&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;newline&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;reader&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;csv&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DictReader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;writer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;csv&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DictWriter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fieldnames&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;linkedin_url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;work_email&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;personal_email&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="n"&gt;writer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;writeheader&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;reader&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;lookup&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;linkedin_url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
        &lt;span class="n"&gt;writer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;writerow&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;linkedin_url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;linkedin_url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;work_email&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;     &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;work_email_addresses&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;])[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;personal_email&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;personal_email_addresses&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;])[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's the whole pipeline: parse, look up, write. The interesting work happens server-side; the client is twenty lines.&lt;/p&gt;

&lt;h2&gt;
  
  
  Things to keep in mind
&lt;/h2&gt;

&lt;p&gt;A few things worth being honest about:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Coverage is not 100%.&lt;/strong&gt; Aggregator APIs hit on a meaningful fraction of profiles — but never all of them. Plan for misses, not just hits.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data ages.&lt;/strong&gt; People change jobs, drop addresses, move to new domains. Anything you enrich today should be re-validated before you use it months later.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Verify before sending.&lt;/strong&gt; A correctly-resolved email that bounces is still a bounce. SMTP-level validation (or a verification endpoint, if your provider has one) before bulk outreach is table stakes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compliance is on you.&lt;/strong&gt; If you're enriching contacts at scale and reaching out from the EU or UK, GDPR's legitimate-interest tests apply. CCPA in California, CASL in Canada, similar regimes elsewhere. The API gives you data; what you do with it is your problem.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Don't scrape LinkedIn directly.&lt;/strong&gt; Aside from the ToS issues, you'll be fighting CAPTCHAs and IP bans within minutes. The whole point of an enrichment API is that someone else has already absorbed that operational cost — across many sources, not just LinkedIn.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The shortcut
&lt;/h2&gt;

&lt;p&gt;This is what &lt;a href="https://peopledb.co" rel="noopener noreferrer"&gt;PeopleDB&lt;/a&gt; does. The endpoint accepts either identifier:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# By public identifier (the slug)&lt;/span&gt;
curl &lt;span class="s2"&gt;"https://peopledb.co/api/v1/people?linkedin_public_identifier=williamhgates"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer &lt;/span&gt;&lt;span class="nv"&gt;$PEOPLEDB_TOKEN&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

&lt;span class="c"&gt;# By numeric ID&lt;/span&gt;
curl &lt;span class="s2"&gt;"https://peopledb.co/api/v1/people?linkedin_id=ACoAAA-3B7U-_b0123abc"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer &lt;/span&gt;&lt;span class="nv"&gt;$PEOPLEDB_TOKEN&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"linkedin_public_identifier"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"williamhgates"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"linkedin_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"github_login"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"email_addresses"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"..."&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"personal_email_addresses"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"..."&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"work_email_addresses"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"..."&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"phone_numbers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"..."&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The same identity-resolution layer also accepts &lt;code&gt;github_login&lt;/code&gt; and &lt;code&gt;github_id&lt;/code&gt;, so if a person shows up under both LinkedIn and GitHub in the index, you get the union — useful when you've got a contributor's GitHub handle but want to reach them at their work address.&lt;/p&gt;

&lt;p&gt;There is no first-party LinkedIn email API. There probably never will be. What there is, is a category of enrichment APIs that have been quietly solving this for years. That's the trade.&lt;/p&gt;

</description>
      <category>linkedin</category>
      <category>api</category>
      <category>python</category>
      <category>sales</category>
    </item>
    <item>
      <title>Why finding a GitHub user's email is harder than you'd think</title>
      <dc:creator>Erik Strömberg</dc:creator>
      <pubDate>Sun, 03 May 2026 01:21:36 +0000</pubDate>
      <link>https://forem.com/apa512/why-finding-a-github-users-email-is-harder-than-youd-think-gjd</link>
      <guid>https://forem.com/apa512/why-finding-a-github-users-email-is-harder-than-youd-think-gjd</guid>
      <description>&lt;p&gt;You've found a contributor whose work you depend on. The maintainer of a package you use, a developer who fixed something for you upstream, the author of a CVE you need to coordinate with. You have their GitHub username. You'd like their email.&lt;/p&gt;

&lt;p&gt;You'd think this would be a &lt;code&gt;GET&lt;/code&gt; away. It isn't. Here's why — and what it actually takes to find one.&lt;/p&gt;

&lt;h2&gt;
  
  
  The GitHub API doesn't have it
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;GET /users/:login&lt;/code&gt; returns an &lt;code&gt;email&lt;/code&gt; field. For the vast majority of users, that field is &lt;code&gt;null&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;GitHub flipped private-by-default years ago. When you sign up today, your commit email is set to &lt;code&gt;&amp;lt;id&amp;gt;+&amp;lt;login&amp;gt;@users.noreply.github.com&lt;/code&gt; and the public profile email field is empty. Older accounts that opted in still expose addresses, but they're a minority — and the people you actually want to reach (active maintainers, security-conscious developers) are exactly the ones who turned this off.&lt;/p&gt;

&lt;p&gt;So that's out.&lt;/p&gt;

&lt;h2&gt;
  
  
  The commits don't have it either (mostly)
&lt;/h2&gt;

&lt;p&gt;The next obvious move: look at the user's commits. Every commit has an author email in its metadata. Pick a public repo, fetch the commit, get the email.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl https://api.github.com/repos/torvalds/linux/commits | jq &lt;span class="s1"&gt;'.[0].commit.author.email'&lt;/span&gt;
&lt;span class="c"&gt;# "torvalds@linux-foundation.org"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That works for Linus. It does not work for most people. Run this against any reasonably modern repo and you'll see a lot of:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"49699333+dependabot[bot]@users.noreply.github.com"
"12345678+somecontributor@users.noreply.github.com"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;GitHub rewrites commit emails to the &lt;code&gt;noreply&lt;/code&gt; form whenever the author has the "Keep my email addresses private" setting on, which is the default. The &lt;code&gt;&amp;lt;id&amp;gt;+&amp;lt;login&amp;gt;&lt;/code&gt; part is the user's GitHub ID and login — useful if all you wanted was to identify them, but you already had their login. You wanted to &lt;em&gt;email&lt;/em&gt; them.&lt;/p&gt;

&lt;h2&gt;
  
  
  The events archive: harder, but real data
&lt;/h2&gt;

&lt;p&gt;There's another source that dev tooling people sometimes forget about: the public events stream. GitHub publishes a firehose of public events (pushes, opens, comments, releases) and &lt;a href="https://www.gharchive.org/" rel="noopener noreferrer"&gt;GH Archive&lt;/a&gt; has been recording it hourly since 2011 — terabytes of newline-delimited JSON, gzipped, freely downloadable.&lt;/p&gt;

&lt;p&gt;Each &lt;code&gt;PushEvent&lt;/code&gt; carries the underlying commit metadata, including author name and email. In principle, if a developer ever pushed a commit &lt;em&gt;before&lt;/em&gt; they turned on private email — or if they push from a CI pipeline that uses a real address — that email is in the archive.&lt;/p&gt;

&lt;p&gt;The job that processes it looks roughly like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="n"&gt;gz&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Zlib&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;GzipReader&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;StringIO&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;archive_url&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;body&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="n"&gt;gz&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;each_line&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
  &lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;next&lt;/span&gt; &lt;span class="k"&gt;unless&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;"PushEvent"&lt;/span&gt;

  &lt;span class="n"&gt;login&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"actor"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"login"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

  &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"payload"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"commits"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;to_a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;each&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;commit&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
    &lt;span class="n"&gt;author_name&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;commit&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"author"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;author_email&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;commit&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"author"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"email"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="c1"&gt;# ... we have a (login, name, email) triple. Now what?&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You scan a few hours of archive and immediately find a problem. A lot of those emails look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;8a3f9b2c1d4e5f6a7b8c9d0e1f2a3b4c5d6e7f80@gmail.com
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Forty hex characters. That's a SHA-1 hash. The local part of the email has been one-way-hashed; only the domain is in the clear. This is a historical artifact of how the events feed has been emitted for stretches of GitHub's history — commit emails arriving with the local part obfuscated.&lt;/p&gt;

&lt;p&gt;Great. Now you have a hash.&lt;/p&gt;

&lt;h2&gt;
  
  
  Reversing the hash
&lt;/h2&gt;

&lt;p&gt;SHA-1 of an arbitrary string is a one-way function. SHA-1 of an email local part is not, because email local parts are not arbitrary. They're drawn from a tiny, predictable distribution: &lt;code&gt;firstname&lt;/code&gt;, &lt;code&gt;firstname.lastname&lt;/code&gt;, &lt;code&gt;f.lastname&lt;/code&gt;, &lt;code&gt;firstnamelastname&lt;/code&gt;, &lt;code&gt;firstname_lastname&lt;/code&gt;, &lt;code&gt;firstname1985&lt;/code&gt;, and a few hundred other patterns layered over a finite list of names.&lt;/p&gt;

&lt;p&gt;If you precompute a table of &lt;code&gt;sha1(local_part) → local_part&lt;/code&gt; for every plausible candidate — every name you've ever encountered, every email you've ever seen published — you can reverse most of these in O(1).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;email&lt;/span&gt; &lt;span class="o"&gt;=~&lt;/span&gt; &lt;span class="sr"&gt;/^([a-f0-9]{40})@(.+)$/&lt;/span&gt;
  &lt;span class="nb"&gt;hash&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;domain&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="vg"&gt;$1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="vg"&gt;$2&lt;/span&gt;
  &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;record&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Sha1Hash&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find_by&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;sha1_hash: &lt;/span&gt;&lt;span class="nb"&gt;hash&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;real_email&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;text&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;@&lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;domain&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The lookup table is the asset. Building it well is most of the work. Mine is hundreds of millions of rows and grows every time the world publishes another address.&lt;/p&gt;

&lt;h2&gt;
  
  
  The harder problem: was that actually them?
&lt;/h2&gt;

&lt;p&gt;You now have a &lt;code&gt;(login, author_name, real_email)&lt;/code&gt; triple. The temptation is to claim the email belongs to the login. Don't.&lt;/p&gt;

&lt;p&gt;Anyone can configure git locally. People commit with &lt;code&gt;user.name&lt;/code&gt; set to their full legal name, their nickname, "John", "John D.", "johndoe", "John Doe via Acme Corp", "Acme CI Bot", or — frequently — &lt;em&gt;someone else's name entirely&lt;/em&gt;, because they cloned a repo on a coworker's machine and never reconfigured. A login pushes hundreds of commits over the years; many of them carry author names that don't actually identify the person behind the login.&lt;/p&gt;

&lt;p&gt;So you need a confidence layer. Mine is a separate pass over the same archive that builds a &lt;code&gt;(login, author_name) → observation_count&lt;/code&gt; table:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;github_login_author_name_mappings&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;login&lt;/span&gt;              &lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;author_name&lt;/span&gt;        &lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;observation_count&lt;/span&gt;  &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;login&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;author_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When a hash reverses to a candidate email, I look up every author name that login has ever been observed pushing under, and ask: what fraction of this login's total commits use &lt;em&gt;this&lt;/em&gt; author name?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="n"&gt;total_count&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;all_names&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sum&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;count&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;count&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="n"&gt;name_count&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;all_names&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="nb"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="nb"&gt;name&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;author_name&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;last&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
&lt;span class="n"&gt;percentage&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name_count&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_f&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;total_count&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_f&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;

&lt;span class="c1"&gt;# Need at least 10 commits for any signal at all&lt;/span&gt;
&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kp"&gt;false&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;total_count&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;

&lt;span class="c1"&gt;# With a long history, 50% co-occurrence is enough; with little, demand 80%&lt;/span&gt;
&lt;span class="n"&gt;threshold&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;total_count&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="mf"&gt;50.0&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;80.0&lt;/span&gt;
&lt;span class="n"&gt;percentage&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;threshold&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This rejects the noise. The contributor who once pushed a commit signed "Test User" doesn't get linked to a &lt;code&gt;test.user@example.com&lt;/code&gt; reversal. The CI bot pushing under a real engineer's login but with &lt;code&gt;git config user.name "Buildkite"&lt;/code&gt; doesn't pollute the index. What survives is the set of (login, name) pairs that consistently co-occur — a fairly trustworthy proxy for "this is the human behind this login."&lt;/p&gt;

&lt;h2&gt;
  
  
  What's left
&lt;/h2&gt;

&lt;p&gt;Doing this for one user, end to end:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Identify which monthly archive shards likely contain their activity.&lt;/li&gt;
&lt;li&gt;Stream and decompress hundreds of gigabytes of JSON.&lt;/li&gt;
&lt;li&gt;Maintain a SHA-1 lookup table of every plausible email local part you've ever seen.&lt;/li&gt;
&lt;li&gt;Maintain a parallel &lt;code&gt;(login, author_name)&lt;/code&gt; co-occurrence index across the entire archive.&lt;/li&gt;
&lt;li&gt;For every reversed hash, run the confidence check.&lt;/li&gt;
&lt;li&gt;Validate the resulting email isn't already claimed by a different GitHub login (people misconfigure git constantly).&lt;/li&gt;
&lt;li&gt;Verify the address actually accepts mail before you rely on it.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That's a multi-day backfill the first time, hundreds of gigabytes resident, and a continuous trickle of new data forever. Perfectly reasonable to build if finding email addresses for GitHub users is your full-time job. Absurd to build if you just need to email three maintainers about a CVE.&lt;/p&gt;

&lt;h2&gt;
  
  
  The shortcut
&lt;/h2&gt;

&lt;p&gt;This is the work &lt;a href="https://peopledb.co" rel="noopener noreferrer"&gt;PeopleDB&lt;/a&gt; does in the background. The pipeline above — archive ingestion, hash reversal, identity correlation, deduplication, SMTP validation — runs continuously. The answer is one HTTP call:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="s2"&gt;"https://peopledb.co/api/v1/people?github_login=octocat"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer &lt;/span&gt;&lt;span class="nv"&gt;$PEOPLEDB_TOKEN&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"github_login"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"octocat"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"github_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;583231&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"linkedin_public_identifier"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"email_addresses"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"..."&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"personal_email_addresses"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"..."&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"work_email_addresses"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"..."&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The endpoint also accepts &lt;code&gt;github_id&lt;/code&gt;, &lt;code&gt;linkedin_id&lt;/code&gt;, and &lt;code&gt;linkedin_public_identifier&lt;/code&gt; — the same identity-merging logic runs across all of them, so if a person has both a GitHub and a LinkedIn record in the index, you get the union.&lt;/p&gt;

&lt;p&gt;If you're doing security disclosure, contributor outreach, or any kind of identity resolution where you start with a username and need to actually reach the human, that's the trade: spin up the pipeline, or skip it.&lt;/p&gt;

</description>
      <category>github</category>
      <category>api</category>
      <category>opensource</category>
      <category>security</category>
    </item>
  </channel>
</rss>
