Forem: Yuchen Lin

Building FlowLens-Web: A HAR-Driven Data-Flow Observatory for Tracking Research

Yuchen Lin — Mon, 16 Feb 2026 00:01:51 +0000

I wanted a practical answer to one question:

How do we measure web tracking signals in a way that is reproducible, explainable, and non-invasive?

This post walks through the approach, what we built, and what we learned from a 10-site batch run.

TL;DR

FlowLens-Web is a TypeScript CLI that:

records browser sessions with Playwright + HAR,
extracts identifier-like request signals,
scores evidence levels (L1-L5),
reports cross-domain reuse and cross-run persistence,
outputs Markdown + Mermaid summaries.

It is a research/measurement tool, not a blocker.

Architecture

Core stack:

Node.js + TypeScript
Playwright (Chromium)
tldts (eTLD+1 classification)
SHA-256 hashing for safe identifier matching

Pipeline:

run scripted browsing scenario
save HAR
parse entries + normalize request metadata
extract candidate identifier fields
compute reuse/persistence signals
assign evidence levels
generate reports (case, matrix, A/B, funnel, longitudinal)

Evidence Model

We use explicit confidence tiers:

L1: third-party domain observed
L2: identifier-like field observed
L3: repeated within run
L4: cross-domain hash reuse
L5: cross-run persistence

This keeps interpretation honest: higher level = stronger network evidence, not guaranteed ad-decision proof.

CLI Workflows

Matrix (multi-site)

npm run flowlens -- study-matrix \
  --sites https://www.google.com,https://www.youtube.com \
  --scenarios baseline,engaged,ad-click \
  --runs 3

A/B (causal contrast)

npm run flowlens -- study-ab \
  --url https://www.youtube.com \
  --control baseline \
  --treatment ad-click \
  --runs 3

Funnel (stage deltas)

npm run flowlens -- study-funnel \
  --url https://www.google.com \
  --query running+shoes \
  --runs 3

Longitudinal (stability over samples)

npm run flowlens -- study-longitudinal \
  --url https://www.wikipedia.org \
  --samples 7 \
  --runs 1

Full-Batch Findings (Current Run)

Batch design:

10 sites
3 scenarios
target 3 runs/scenario

Outcome:

9/10 sites produced complete scenario outputs
Amazon repeatedly failed under runtime constraints in this environment (timeouts/session closure), and was kept as explicit failed evidence

Pattern-level observations:

signal intensity varied strongly by site/scenario
deeper interaction stages often increased observed signal metrics
some content-centric cases remained low-signal across repeated runs

Why the Redaction Layer Matters

Raw tokens are not published.
Instead, FlowLens stores:

redacted preview
token length
stable hash for equality/reuse checks

That gives us reproducibility without leaking sensitive raw values.

What You Can Claim Responsibly

From this tooling and dataset, you can claim:

network-observed data-flow signals vary by context,
controlled behavior changes can shift measured signals,
reuse/persistence patterns are measurable in a repeatable way.

You cannot claim from network traces alone:

definitive platform-internal ad decision logic,
person-level identity resolution.

Engineering Notes

What worked well:

modular analysis pipeline
evidence-level abstraction for communication quality
matrix/funnel/A-B/longitudinal complement each other

What remains hard:

large-site reliability under fixed timeouts
anti-bot/session constraints
balancing coverage vs runtime cost

Read the Full Materials

Repository: https://github.com/yul761/FlowLens
Full-batch summary: data/reports/published/formal-v1-full-overall-summary.md
Academic-style article: data/reports/published/public-v1-academic-article.md

If You Want to Build on This

Next useful extensions:

stronger single-variable controls (consent, login, click-id toggles)
bootstrap confidence intervals on key deltas
cross-environment runs (device profile/region)
publication-grade data manifests

Closing

A lot of tracking debates are stuck between oversimplified claims and opaque internals.
A HAR-first, evidence-tier approach gives a practical middle path: measurable, repeatable, and honest about uncertainty.

Designing a Drift-Resistant Memory System for LLMs

Yuchen Lin — Fri, 06 Feb 2026 20:03:27 +0000

I recently built an open-source long-term memory engine called ProjectMemory.

While working on it, I realized that most LLM memory systems fail for the same reason: they treat memory as accumulated text instead of managed state.

This article explains that idea from a systems design perspective.

I recently built an open-source long-term memory engine called ProjectMemory.

While working on it, I kept running into the same problem:
LLM systems with “memory” work well at first, but drift over time.

They:

Forget earlier decisions
Contradict previous summaries
Change goals without explicit updates
Start giving generic next steps

The common reaction is to improve prompts, increase context windows, or add more summarization layers.

But after building and benchmarking a real system, I came to a different conclusion:

Long-term memory for LLMs is not a prompting problem.
It’s a state management problem.

The core mistake: memory as accumulated text

Most LLM memory systems look like this:

memory = previous_summary + new_events
summary = LLM(memory)

Over time, this causes:

Information loss
Subtle goal drift
Conflicting summaries
Irreversible errors

Because the system treats memory as just text, the model is free to reinterpret or rewrite facts.

There is no notion of:

Protected state
Valid transitions
Consistency constraints
Rebuildability In other words: no state model.

*A different approach: memory as evolving state
*
In ProjectMemory, memory is modeled as a sequence of state transitions.

Each digest is not “a new summary,” but:

next_state = transition(previous_state, recent_events)

Where:

previous_state = last digest

recent_events = new logs, notes, or decisions

transition = LLM-proposed update + deterministic checks

So the LLM does not directly control memory.
It only proposes a new state.

The system decides whether that state is allowed.

Consistency gates

To prevent drift, each proposed digest passes through consistency gates.

These enforce rules like:

Core goals cannot change unless explicitly updated.
Stable facts must not disappear.
Decisions cannot contradict previous accepted decisions.

If a digest violates these constraints:

It is rejected.
The previous state remains valid.
The system can retry or escalate.

This turns memory into something closer to:

A state machine
A versioned configuration
Or a database with constraints

Not just a growing text blob.

Correctness over latency

One key design decision in the project was:

A wrong memory is worse than no memory.

So the system intentionally trades:

Higher digest latency for
Stronger consistency guarantees

Benchmarks showed that enabling consistency checks:

Increased digest latency
But significantly improved state stability
And eliminated certain classes of drift

For long-running systems, this trade-off is worth it.

Rebuildability as a first-class property

Another key idea is that memory must be rebuildable.

Because digests are:

Stored as versioned states
Derived from event streams

The system can:

Recompute memory from raw events
Detect drift
Compare online vs rebuilt states
Repair corrupted summaries

This is much closer to:

Event sourcing
Or stateful system design

Than to prompt engineering.

Why this matters

Many people think long-term memory is mainly about:

Better prompts
Better summaries
Bigger context windows

But those only delay drift.

If memory is treated as unstructured text,
the system has no way to enforce consistency over time.

Reliable memory requires:

State models
Transition rules
Consistency checks
Rebuildable history

That’s a systems problem, not a prompting problem.

ProjectMemory (v0.1.0)

This article is based on building ProjectMemory, an open-source, developer-first long-term memory engine.

It provides:

Event-based memory ingestion
Layered digest pipelines
Consistency gates
Retrieval over structured memory
Benchmark tooling

GitHub:
https://github.com/yul761/ProjectMemory

Closing thought

If an LLM system is expected to run for weeks or months, memory is no longer just context.

It becomes state.

And once it is state, it needs:

Rules
Constraints
And a way to rebuild the truth.