<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: red bean</title>
    <description>The latest articles on Forem by red bean (@red_bean_37803fd04e673991).</description>
    <link>https://forem.com/red_bean_37803fd04e673991</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3887164%2Faec5641f-edb7-4a6b-afc2-48e1fdba62b4.png</url>
      <title>Forem: red bean</title>
      <link>https://forem.com/red_bean_37803fd04e673991</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/red_bean_37803fd04e673991"/>
    <language>en</language>
    <item>
      <title>You Don't Have 50,000 Users — How Ghost Profiles Pollute Your PostHog Data</title>
      <dc:creator>red bean</dc:creator>
      <pubDate>Mon, 20 Apr 2026 12:27:47 +0000</pubDate>
      <link>https://forem.com/red_bean_37803fd04e673991/you-dont-have-50000-users-how-ghost-profiles-pollute-your-posthog-data-4bha</link>
      <guid>https://forem.com/red_bean_37803fd04e673991/you-dont-have-50000-users-how-ghost-profiles-pollute-your-posthog-data-4bha</guid>
      <description>&lt;p&gt;You open PostHog. Persons tab says 50,000. You tell your team, your investors, your board: "We have 50,000 users."&lt;/p&gt;

&lt;p&gt;You don't. You probably have 35,000. Maybe fewer.&lt;/p&gt;

&lt;p&gt;The rest are ghosts — duplicate person records that PostHog created because its identity resolution isn't perfect. The same human, counted twice, three times, sometimes more. Different devices. Different browsers. An app session here, a web session there. PostHog sees each one as a new person.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this happens
&lt;/h2&gt;

&lt;p&gt;PostHog links anonymous sessions to known users when you call &lt;code&gt;posthog.identify()&lt;/code&gt;. In theory, this merges the anonymous profile with the identified one. In practice, it fails more often than you'd think:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Same person, two devices.&lt;/strong&gt; They use your app on their phone and your website on their laptop. Two separate &lt;code&gt;distinct_id&lt;/code&gt; values. If they don't log in on both, PostHog has no way to connect them. Two person records.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Same person, same device, cleared cookies.&lt;/strong&gt; They visited your site in January. Cookies expired or got cleared. They came back in March. New &lt;code&gt;distinct_id&lt;/code&gt;. PostHog creates a second person.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Race conditions in identify calls.&lt;/strong&gt; PostHog's merge logic is eventual, not transactional. When identify events come in close together from different clients, the merge can silently fail. The result: two person records with the same email.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;In-app browsers.&lt;/strong&gt; Your app opens a link in a WebView. PostHog's web SDK sees a completely new session with no connection to the native app. One user, two people in PostHog.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This isn't theoretical. PostHog has had &lt;a href="https://github.com/PostHog/posthog/issues" rel="noopener noreferrer"&gt;open GitHub issues about identity merging since 2020&lt;/a&gt;. They built &lt;code&gt;$merge_dangerously&lt;/code&gt; specifically because the standard merge doesn't always work.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why it matters more than you think
&lt;/h2&gt;

&lt;p&gt;Ghost profiles don't just inflate a vanity metric. They break everything downstream.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Your funnels are wrong.&lt;/strong&gt; A user who signed up on their phone and converted on their laptop shows as two separate journeys. One person who completed the funnel, but PostHog sees it as one dropout and one direct conversion. Your conversion rate is wrong in both directions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Your retention is wrong.&lt;/strong&gt; Same user counted twice means they can "retain" by switching devices. Or they look churned on one device while still active on another. Your retention curves are lying to you.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Your cohorts are wrong.&lt;/strong&gt; Behavioral cohorts include ghost profiles. You're targeting "users who did X but not Y" — except some of them did Y, just on a different device under a different person record.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Your A/B tests are wrong.&lt;/strong&gt; If the same person is in your experiment as two different participants, potentially in different variants, your results have noise you can't account for.&lt;/p&gt;

&lt;p&gt;You're making product decisions based on data that's 15-30% wrong. That's the real cost.&lt;/p&gt;

&lt;h2&gt;
  
  
  How bad is it in your project?
&lt;/h2&gt;

&lt;p&gt;We built a free tool that connects to your PostHog instance and tells you exactly how many ghost profiles you have. It checks three signals:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Email duplicates&lt;/strong&gt; — multiple person records with the same email address. PostHog should have merged these but didn't.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Phone duplicates&lt;/strong&gt; — same thing with phone numbers.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Device ID duplicates&lt;/strong&gt; — PostHog's own SDKs set a &lt;code&gt;$device_id&lt;/code&gt; on every event. If two different person records have events from the same device ID, that's the same browser or phone creating two people. This catches anonymous duplicates that email matching can't.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;It takes about 60 seconds. No code changes, nothing gets modified in your PostHog instance. You paste a read-only API key and get a report.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://crosstrackdata.com/audit" rel="noopener noreferrer"&gt;Run a free audit on your PostHog data&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What you can do about it
&lt;/h2&gt;

&lt;p&gt;Once you know the scope of the problem, you have a few options:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Manual merges.&lt;/strong&gt; PostHog lets you merge person records manually in the UI. Fine if you have 20 duplicates. Not practical if you have 2,000.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Better identify() hygiene.&lt;/strong&gt; Make sure every client calls &lt;code&gt;posthog.identify()&lt;/code&gt; with a consistent user ID as early as possible. This prevents some duplicates going forward but doesn't fix the ones already in your data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CrossTrack.&lt;/strong&gt; We're building an identity resolution layer that sits alongside PostHog. It detects duplicates in real-time using email, phone, device fingerprinting, and a WebView bridge that links app sessions to in-app browser sessions. The SDK is &lt;a href="https://github.com/CrossTrackData" rel="noopener noreferrer"&gt;open source&lt;/a&gt;, 3.2 KB, zero dependencies.&lt;/p&gt;

&lt;h2&gt;
  
  
  The number you should actually be tracking
&lt;/h2&gt;

&lt;p&gt;Instead of "persons" in PostHog, look at your ratio of distinct IDs to persons. If you have 50,000 persons and 80,000 distinct IDs, that's 1.6 distinct IDs per person on average. Some of that is normal (one anonymous ID + one identified ID = 2). But if your average is above 2.0, or if you find persons with 5, 10, 20 distinct IDs — your identity resolution has gaps.&lt;/p&gt;

&lt;p&gt;The audit report shows you this ratio along with the specific duplicate clusters. At minimum, it tells you whether your user count is a number you can trust.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://crosstrackdata.com/audit" rel="noopener noreferrer"&gt;Check your PostHog data for free&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>posthog</category>
      <category>analytics</category>
      <category>identity</category>
      <category>datascience</category>
    </item>
    <item>
      <title>How to Track Users Across Your App and Website</title>
      <dc:creator>red bean</dc:creator>
      <pubDate>Sun, 19 Apr 2026 09:36:06 +0000</pubDate>
      <link>https://forem.com/red_bean_37803fd04e673991/how-to-track-users-across-your-app-and-website-58g4</link>
      <guid>https://forem.com/red_bean_37803fd04e673991/how-to-track-users-across-your-app-and-website-58g4</guid>
      <description>&lt;p&gt;If your company has both a mobile app and a website, you've probably noticed a problem: the same person using both platforms shows up as two completely different users in your analytics.&lt;/p&gt;

&lt;p&gt;Your website assigns them a visitor ID. Your app assigns them a device ID. These two systems have no way to know they're looking at the same person.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this happens
&lt;/h2&gt;

&lt;p&gt;Every platform generates its own anonymous identifier:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Website:&lt;/strong&gt; A visitor ID stored in localStorage or a cookie (e.g., &lt;code&gt;visitor_8f2k&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Android app:&lt;/strong&gt; A device-scoped ID (e.g., &lt;code&gt;device_x9m2&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;iOS app:&lt;/strong&gt; Same concept, different ID (e.g., &lt;code&gt;device_p4wn&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;In-app WebView:&lt;/strong&gt; Yet another visitor ID (e.g., &lt;code&gt;visitor_q7wv&lt;/code&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None of these systems talk to each other. So one real person browsing your site and using your app looks like 2, 3, or even 4 separate "unique users" in your data.&lt;/p&gt;

&lt;h2&gt;
  
  
  The impact
&lt;/h2&gt;

&lt;p&gt;This isn't just a data quality issue. It causes real business problems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Inflated user counts&lt;/strong&gt; — your "monthly active users" number is higher than reality&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Broken funnels&lt;/strong&gt; — a user starting on web and converting in the app looks like a web drop-off&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Wasted ad spend&lt;/strong&gt; — you're retargeting users who already converted, because your ad platform doesn't know they're the same person&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Blind support teams&lt;/strong&gt; — when a user calls, your support team can't see what they just did in the app&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Common approaches (and their limitations)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Wait for login
&lt;/h3&gt;

&lt;p&gt;The simplest approach: once a user logs in on both platforms with the same account, you link them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Problem:&lt;/strong&gt; Many users never log in, or only log in on one platform. You're blind to anonymous cross-platform behavior, which is often the majority of your traffic.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Build it yourself in dbt
&lt;/h3&gt;

&lt;p&gt;Some teams write SQL models that try to stitch identities together in the warehouse after the fact.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Problem:&lt;/strong&gt; This gets complex fast. You need to handle edge cases like cascade merges (two already-merged profiles need to merge again), shared email addresses, and retroactive event re-tagging. Most DIY solutions break on these edge cases.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Use Segment's identity resolution
&lt;/h3&gt;

&lt;p&gt;Segment has a basic identity graph that merges anonymous and known users.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Problem:&lt;/strong&gt; It's limited — last-seen-wins merge logic, no WebView bridge, no real cross-platform stitching. If your user opens a WebView inside your app, Segment creates a brand new identity.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Enterprise tools (Celebrus, mParticle)
&lt;/h3&gt;

&lt;p&gt;These handle identity resolution well but cost $50k-$500k/year and take months to integrate.&lt;/p&gt;

&lt;h2&gt;
  
  
  The WebView bridge problem
&lt;/h2&gt;

&lt;p&gt;The hardest part of cross-platform identity is the WebView. When a user taps a link inside your app that opens a web page in an embedded browser (WKWebView on iOS, WebView on Android), a completely new web session starts. The web page has no idea it's running inside your app.&lt;/p&gt;

&lt;p&gt;This is where most solutions fall short. The WebView session becomes an orphaned identity that never gets connected to the app user.&lt;/p&gt;

&lt;p&gt;The solution is a bridge: the app SDK injects the device's visitor ID into the WebView's localStorage before the web page loads. The web SDK reads it and includes it with every event. The server sees both IDs and merges them — no login required.&lt;/p&gt;

&lt;h2&gt;
  
  
  How we solved this
&lt;/h2&gt;

&lt;p&gt;We built CrossTrack specifically for this problem. It's a set of lightweight SDKs (Web, Android, iOS) that handle visitor ID generation, cross-platform session stitching, and WebView bridging automatically.&lt;/p&gt;

&lt;p&gt;The SDKs send events to an identity resolution service that merges anonymous sessions into unified profiles. When identifiers overlap (same user logs in on both platforms, or a WebView bridge fires), profiles merge and all historical events get re-tagged to the surviving profile.&lt;/p&gt;

&lt;p&gt;You can see this working in an interactive demo: &lt;a href="https://crosstrack-demo.onrender.com" rel="noopener noreferrer"&gt;https://crosstrack-demo.onrender.com&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It walks through the full scenario — anonymous web visit, app open, WebView bridge, login on both platforms — and shows profiles merging in real-time.&lt;/p&gt;

&lt;p&gt;If cross-platform identity is a problem your team deals with, take a look: &lt;a href="https://crosstrackdata.com" rel="noopener noreferrer"&gt;https://crosstrackdata.com&lt;/a&gt;&lt;/p&gt;

</description>
      <category>dataengineering</category>
      <category>analytics</category>
    </item>
  </channel>
</rss>
