<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Yuchen Lin</title>
    <description>The latest articles on Forem by Yuchen Lin (@metarain).</description>
    <link>https://forem.com/metarain</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3757310%2F6d27d8a0-e904-4891-93dd-a70068b6f4df.png</url>
      <title>Forem: Yuchen Lin</title>
      <link>https://forem.com/metarain</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/metarain"/>
    <language>en</language>
    <item>
      <title>Building FlowLens-Web: A HAR-Driven Data-Flow Observatory for Tracking Research</title>
      <dc:creator>Yuchen Lin</dc:creator>
      <pubDate>Mon, 16 Feb 2026 00:01:51 +0000</pubDate>
      <link>https://forem.com/metarain/building-flowlens-web-a-har-driven-data-flow-observatory-for-tracking-research-3enh</link>
      <guid>https://forem.com/metarain/building-flowlens-web-a-har-driven-data-flow-observatory-for-tracking-research-3enh</guid>
      <description>&lt;p&gt;I wanted a practical answer to one question:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How do we measure web tracking signals in a way that is reproducible, explainable, and non-invasive?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This post walks through the approach, what we built, and what we learned from a 10-site batch run.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;FlowLens-Web is a TypeScript CLI that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;records browser sessions with Playwright + HAR,&lt;/li&gt;
&lt;li&gt;extracts identifier-like request signals,&lt;/li&gt;
&lt;li&gt;scores evidence levels (L1-L5),&lt;/li&gt;
&lt;li&gt;reports cross-domain reuse and cross-run persistence,&lt;/li&gt;
&lt;li&gt;outputs Markdown + Mermaid summaries.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It is a research/measurement tool, not a blocker.&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture
&lt;/h2&gt;

&lt;p&gt;Core stack:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Node.js + TypeScript&lt;/li&gt;
&lt;li&gt;Playwright (Chromium)&lt;/li&gt;
&lt;li&gt;tldts (eTLD+1 classification)&lt;/li&gt;
&lt;li&gt;SHA-256 hashing for safe identifier matching&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Pipeline:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;run scripted browsing scenario&lt;/li&gt;
&lt;li&gt;save HAR&lt;/li&gt;
&lt;li&gt;parse entries + normalize request metadata&lt;/li&gt;
&lt;li&gt;extract candidate identifier fields&lt;/li&gt;
&lt;li&gt;compute reuse/persistence signals&lt;/li&gt;
&lt;li&gt;assign evidence levels&lt;/li&gt;
&lt;li&gt;generate reports (case, matrix, A/B, funnel, longitudinal)&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Evidence Model
&lt;/h2&gt;

&lt;p&gt;We use explicit confidence tiers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;L1: third-party domain observed&lt;/li&gt;
&lt;li&gt;L2: identifier-like field observed&lt;/li&gt;
&lt;li&gt;L3: repeated within run&lt;/li&gt;
&lt;li&gt;L4: cross-domain hash reuse&lt;/li&gt;
&lt;li&gt;L5: cross-run persistence&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This keeps interpretation honest: higher level = stronger network evidence, not guaranteed ad-decision proof.&lt;/p&gt;

&lt;h2&gt;
  
  
  CLI Workflows
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Matrix (multi-site)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm run flowlens &lt;span class="nt"&gt;--&lt;/span&gt; study-matrix &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--sites&lt;/span&gt; https://www.google.com,https://www.youtube.com &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--scenarios&lt;/span&gt; baseline,engaged,ad-click &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--runs&lt;/span&gt; 3
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  A/B (causal contrast)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm run flowlens &lt;span class="nt"&gt;--&lt;/span&gt; study-ab &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--url&lt;/span&gt; https://www.youtube.com &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--control&lt;/span&gt; baseline &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--treatment&lt;/span&gt; ad-click &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--runs&lt;/span&gt; 3
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Funnel (stage deltas)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm run flowlens &lt;span class="nt"&gt;--&lt;/span&gt; study-funnel &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--url&lt;/span&gt; https://www.google.com &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; running+shoes &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--runs&lt;/span&gt; 3
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Longitudinal (stability over samples)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm run flowlens &lt;span class="nt"&gt;--&lt;/span&gt; study-longitudinal &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--url&lt;/span&gt; https://www.wikipedia.org &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--samples&lt;/span&gt; 7 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--runs&lt;/span&gt; 1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Full-Batch Findings (Current Run)
&lt;/h2&gt;

&lt;p&gt;Batch design:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;10 sites&lt;/li&gt;
&lt;li&gt;3 scenarios&lt;/li&gt;
&lt;li&gt;target 3 runs/scenario&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Outcome:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;9/10 sites produced complete scenario outputs&lt;/li&gt;
&lt;li&gt;Amazon repeatedly failed under runtime constraints in this environment (timeouts/session closure), and was kept as explicit failed evidence&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Pattern-level observations:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;signal intensity varied strongly by site/scenario&lt;/li&gt;
&lt;li&gt;deeper interaction stages often increased observed signal metrics&lt;/li&gt;
&lt;li&gt;some content-centric cases remained low-signal across repeated runs&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why the Redaction Layer Matters
&lt;/h2&gt;

&lt;p&gt;Raw tokens are not published.&lt;br&gt;
Instead, FlowLens stores:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;redacted preview&lt;/li&gt;
&lt;li&gt;token length&lt;/li&gt;
&lt;li&gt;stable hash for equality/reuse checks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That gives us reproducibility without leaking sensitive raw values.&lt;/p&gt;

&lt;h2&gt;
  
  
  What You Can Claim Responsibly
&lt;/h2&gt;

&lt;p&gt;From this tooling and dataset, you can claim:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;network-observed data-flow signals vary by context,&lt;/li&gt;
&lt;li&gt;controlled behavior changes can shift measured signals,&lt;/li&gt;
&lt;li&gt;reuse/persistence patterns are measurable in a repeatable way.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You cannot claim from network traces alone:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;definitive platform-internal ad decision logic,&lt;/li&gt;
&lt;li&gt;person-level identity resolution.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Engineering Notes
&lt;/h2&gt;

&lt;p&gt;What worked well:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;modular analysis pipeline&lt;/li&gt;
&lt;li&gt;evidence-level abstraction for communication quality&lt;/li&gt;
&lt;li&gt;matrix/funnel/A-B/longitudinal complement each other&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What remains hard:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;large-site reliability under fixed timeouts&lt;/li&gt;
&lt;li&gt;anti-bot/session constraints&lt;/li&gt;
&lt;li&gt;balancing coverage vs runtime cost&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Read the Full Materials
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Repository: &lt;a href="https://github.com/yul761/FlowLens" rel="noopener noreferrer"&gt;https://github.com/yul761/FlowLens&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Full-batch summary: &lt;code&gt;data/reports/published/formal-v1-full-overall-summary.md&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Academic-style article: &lt;code&gt;data/reports/published/public-v1-academic-article.md&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  If You Want to Build on This
&lt;/h2&gt;

&lt;p&gt;Next useful extensions:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;stronger single-variable controls (consent, login, click-id toggles)&lt;/li&gt;
&lt;li&gt;bootstrap confidence intervals on key deltas&lt;/li&gt;
&lt;li&gt;cross-environment runs (device profile/region)&lt;/li&gt;
&lt;li&gt;publication-grade data manifests&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Closing
&lt;/h2&gt;

&lt;p&gt;A lot of tracking debates are stuck between oversimplified claims and opaque internals.&lt;br&gt;
A HAR-first, evidence-tier approach gives a practical middle path: measurable, repeatable, and honest about uncertainty.&lt;/p&gt;

</description>
      <category>playwright</category>
      <category>privacy</category>
      <category>webdev</category>
      <category>typescript</category>
    </item>
    <item>
      <title>Designing a Drift-Resistant Memory System for LLMs</title>
      <dc:creator>Yuchen Lin</dc:creator>
      <pubDate>Fri, 06 Feb 2026 20:03:27 +0000</pubDate>
      <link>https://forem.com/metarain/designing-a-drift-resistant-memory-system-for-llms-47fa</link>
      <guid>https://forem.com/metarain/designing-a-drift-resistant-memory-system-for-llms-47fa</guid>
      <description>&lt;p&gt;I recently built an open-source long-term memory engine called ProjectMemory.&lt;/p&gt;

&lt;p&gt;While working on it, I realized that most LLM memory systems fail for the same reason: they treat memory as accumulated text instead of managed state.&lt;/p&gt;

&lt;p&gt;This article explains that idea from a systems design perspective.&lt;/p&gt;




&lt;p&gt;I recently built an open-source long-term memory engine called ProjectMemory.&lt;/p&gt;

&lt;p&gt;While working on it, I kept running into the same problem:&lt;br&gt;
LLM systems with “memory” work well at first, but drift over time.&lt;/p&gt;

&lt;p&gt;They:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Forget earlier decisions&lt;/li&gt;
&lt;li&gt;Contradict previous summaries&lt;/li&gt;
&lt;li&gt;Change goals without explicit updates&lt;/li&gt;
&lt;li&gt;Start giving generic next steps&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The common reaction is to improve prompts, increase context windows, or add more summarization layers.&lt;/p&gt;

&lt;p&gt;But after building and benchmarking a real system, I came to a different conclusion:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Long-term memory for LLMs is not a prompting problem.&lt;br&gt;
It’s a state management problem.&lt;/p&gt;
&lt;/blockquote&gt;



&lt;p&gt;&lt;strong&gt;The core mistake: memory as accumulated text&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Most LLM memory systems look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;memory = previous_summary + new_events
summary = LLM(memory)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Over time, this causes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Information loss&lt;/li&gt;
&lt;li&gt;Subtle goal drift&lt;/li&gt;
&lt;li&gt;Conflicting summaries&lt;/li&gt;
&lt;li&gt;Irreversible errors&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Because the system treats memory as just text, the model is free to reinterpret or rewrite facts.&lt;/p&gt;

&lt;p&gt;There is no notion of:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Protected state&lt;/li&gt;
&lt;li&gt;Valid transitions&lt;/li&gt;
&lt;li&gt;Consistency constraints&lt;/li&gt;
&lt;li&gt;Rebuildability
In other words: no state model.&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;*&lt;em&gt;A different approach: memory as evolving state&lt;br&gt;
*&lt;/em&gt;&lt;br&gt;
In ProjectMemory, memory is modeled as a sequence of state transitions.&lt;/p&gt;

&lt;p&gt;Each digest is not “a new summary,” but:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;next_state = transition(previous_state, recent_events)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Where:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;previous_state&lt;/code&gt; = last digest&lt;/p&gt;

&lt;p&gt;&lt;code&gt;recent_events&lt;/code&gt; = new logs, notes, or decisions&lt;/p&gt;

&lt;p&gt;&lt;code&gt;transition&lt;/code&gt; = LLM-proposed update + deterministic checks&lt;/p&gt;

&lt;p&gt;So the LLM does not directly control memory.&lt;br&gt;
It only proposes a new state.&lt;/p&gt;

&lt;p&gt;The system decides whether that state is allowed.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Consistency gates&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;To prevent drift, each proposed digest passes through consistency gates.&lt;/p&gt;

&lt;p&gt;These enforce rules like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Core goals cannot change unless explicitly updated.&lt;/li&gt;
&lt;li&gt;Stable facts must not disappear.&lt;/li&gt;
&lt;li&gt;Decisions cannot contradict previous accepted decisions.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If a digest violates these constraints:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It is rejected.&lt;/li&gt;
&lt;li&gt;The previous state remains valid.&lt;/li&gt;
&lt;li&gt;The system can retry or escalate.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This turns memory into something closer to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A state machine&lt;/li&gt;
&lt;li&gt;A versioned configuration&lt;/li&gt;
&lt;li&gt;Or a database with constraints&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Not just a growing text blob.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Correctness over latency&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;One key design decision in the project was:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;A wrong memory is worse than no memory.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;So the system intentionally trades:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Higher digest latency
for&lt;/li&gt;
&lt;li&gt;Stronger consistency guarantees&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Benchmarks showed that enabling consistency checks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Increased digest latency&lt;/li&gt;
&lt;li&gt;But significantly improved state stability&lt;/li&gt;
&lt;li&gt;And eliminated certain classes of drift&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For long-running systems, this trade-off is worth it.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Rebuildability as a first-class property&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Another key idea is that memory must be rebuildable.&lt;/p&gt;

&lt;p&gt;Because digests are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Stored as versioned states&lt;/li&gt;
&lt;li&gt;Derived from event streams&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The system can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Recompute memory from raw events&lt;/li&gt;
&lt;li&gt;Detect drift&lt;/li&gt;
&lt;li&gt;Compare online vs rebuilt states&lt;/li&gt;
&lt;li&gt;Repair corrupted summaries&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is much closer to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Event sourcing&lt;/li&gt;
&lt;li&gt;Or stateful system design&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Than to prompt engineering.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Why this matters&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Many people think long-term memory is mainly about:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Better prompts&lt;/li&gt;
&lt;li&gt;Better summaries&lt;/li&gt;
&lt;li&gt;Bigger context windows&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But those only delay drift.&lt;/p&gt;

&lt;p&gt;If memory is treated as &lt;strong&gt;unstructured text&lt;/strong&gt;,&lt;br&gt;
the system has no way to enforce consistency over time.&lt;/p&gt;

&lt;p&gt;Reliable memory requires:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;State models&lt;/li&gt;
&lt;li&gt;Transition rules&lt;/li&gt;
&lt;li&gt;Consistency checks&lt;/li&gt;
&lt;li&gt;Rebuildable history&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s a systems problem, not a prompting problem.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;ProjectMemory (v0.1.0)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This article is based on building ProjectMemory, an open-source, developer-first long-term memory engine.&lt;/p&gt;

&lt;p&gt;It provides:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Event-based memory ingestion&lt;/li&gt;
&lt;li&gt;Layered digest pipelines&lt;/li&gt;
&lt;li&gt;Consistency gates&lt;/li&gt;
&lt;li&gt;Retrieval over structured memory&lt;/li&gt;
&lt;li&gt;Benchmark tooling&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;GitHub:&lt;br&gt;
&lt;a href="https://github.com/yul761/ProjectMemory" rel="noopener noreferrer"&gt;https://github.com/yul761/ProjectMemory&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Closing thought&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If an LLM system is expected to run for weeks or months, memory is no longer just context.&lt;/p&gt;

&lt;p&gt;It becomes &lt;strong&gt;state&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;And once it is state, it needs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Rules&lt;/li&gt;
&lt;li&gt;Constraints&lt;/li&gt;
&lt;li&gt;And a way to rebuild the truth.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>systemdesign</category>
      <category>webdev</category>
    </item>
  </channel>
</rss>
