<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Chester Guan （Ziyuan Guan）</title>
    <description>The latest articles on Forem by Chester Guan （Ziyuan Guan） (@chesterguan).</description>
    <link>https://forem.com/chesterguan</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3868053%2Fa0991d82-dae4-4891-9d1c-a70e3c7fbd46.png</url>
      <title>Forem: Chester Guan （Ziyuan Guan）</title>
      <link>https://forem.com/chesterguan</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/chesterguan"/>
    <language>en</language>
    <item>
      <title>How synthetic data actually performs</title>
      <dc:creator>Chester Guan （Ziyuan Guan）</dc:creator>
      <pubDate>Thu, 14 May 2026 17:08:47 +0000</pubDate>
      <link>https://forem.com/chesterguan/how-synthetic-data-actually-performs-43a5</link>
      <guid>https://forem.com/chesterguan/how-synthetic-data-actually-performs-43a5</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;Originally published at &lt;a href="https://prometheno.org/blog/2026-02-12-how-synthetic-data-actually-performs" rel="noopener noreferrer"&gt;prometheno.org&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Now let's think together. In &lt;a href="https://dev.to/blog/2026-02-05-the-clinical-truth-gap"&gt;&lt;em&gt;The clinical-truth gap&lt;/em&gt;&lt;/a&gt;&lt;br&gt;
I said clinical-truth verification belongs in medical empiricism. A&lt;br&gt;
fair objection: why bother with real-patient infrastructure at all when&lt;br&gt;
synthetic data exists? Synthea, MDClone, Syntegra, mostly.ai — generate&lt;br&gt;
fake patients with the statistical properties of real ones, train&lt;br&gt;
models on those, ship.&lt;/p&gt;

&lt;p&gt;The honest answer is to look at how synthetic data actually performs,&lt;br&gt;
not how it's pitched.&lt;/p&gt;

&lt;h2&gt;
  
  
  What synthetic does well
&lt;/h2&gt;

&lt;p&gt;Three uses where it earns its place.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pipeline testing.&lt;/strong&gt; No PHI, no HIPAA review, no consent overhead.&lt;br&gt;
Engineers stress-test ingestion code, validate FHIR mappings, exercise&lt;br&gt;
edge cases. Synthea — the MITRE-developed open-source generator — was&lt;br&gt;
built explicitly for this&lt;sup id="fnref1"&gt;1&lt;/sup&gt; and most US health-IT projects use it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Training augmentation.&lt;/strong&gt; For rare conditions where real-data samples&lt;br&gt;
are clinically inadequate, synthetic supplementation lifts model&lt;br&gt;
performance measurably. A 2024 study on rare thyroid cancer subtype&lt;br&gt;
classification used text-guided diffusion to generate synthetic images&lt;br&gt;
and improved subtype-classification AUC from 0.7364 to 0.8442&lt;sup id="fnref2"&gt;2&lt;/sup&gt;. The&lt;br&gt;
gain came from hybrid training. Synthetic + real beat real alone.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Aggregate statistical research.&lt;/strong&gt; Questions like &lt;em&gt;what's the average&lt;br&gt;
HbA1c trajectory&lt;/em&gt; or &lt;em&gt;what's the comorbidity prevalence&lt;/em&gt; often produce&lt;br&gt;
similar answers on synthetic and real data, with no individual-level&lt;br&gt;
exposure. A JMIR comparison study of five MDClone-generated cohorts&lt;br&gt;
against their real counterparts found the analyses "provide a close&lt;br&gt;
estimate of real data results in general," with caveats depending on&lt;br&gt;
the patient-to-variable ratio&lt;sup id="fnref3"&gt;3&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;That's a real value proposition. The series doesn't dismiss it.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the benchmarks show
&lt;/h2&gt;

&lt;p&gt;Three places the numbers cut against synthetic-as-substitute.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Rare-event performance plateaus.&lt;/strong&gt; SHEPHERD — a Harvard/Zitnik-lab&lt;br&gt;
model trained on 40,000+ synthetic patients across 2,134 rare diseases&lt;br&gt;
— achieved &lt;strong&gt;40% top-1 accuracy&lt;/strong&gt; in causal gene discovery when&lt;br&gt;
evaluated against the real-world Undiagnosed Diseases Network&lt;br&gt;
cohort&lt;sup id="fnref4"&gt;4&lt;/sup&gt;. Forty percent is useful as a triage tool. It is not&lt;br&gt;
clinical-grade. The gap between synthetic-trained performance and&lt;br&gt;
real-world ground truth is precisely the gap synthetic data can't close&lt;br&gt;
on its own.&lt;/p&gt;

&lt;p&gt;
  src="/blog/synthetic-data-actually-performs/figure-1-decision-matrix.svg"&lt;br&gt;
  alt="A two-column decision matrix. Left column 'SYNTHETIC SUFFICES' lists pipeline testing, training augmentation, aggregate research, hypothesis generation. Right column 'REAL DATA REQUIRED' lists regulatory submission, outcome verification, AI accountability, rare-event prediction."&lt;br&gt;
  caption="Synthetic data does real work in the left column. The right column is what HAVEN's real-patient infrastructure exists to serve."&lt;br&gt;
/&amp;gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hybrid almost always wins, and hybrid needs real data.&lt;/strong&gt; Across&lt;br&gt;
healthcare AI benchmarks, models trained on synthetic + real outperform&lt;br&gt;
either alone. An AMD fundus-image study using ResNet-18 reached 85%&lt;br&gt;
accuracy when combined data was used — outperforming the same&lt;br&gt;
architecture trained on synthetic-only by a clinically meaningful&lt;br&gt;
margin&lt;sup id="fnref5"&gt;5&lt;/sup&gt;. The destination is rarely synthetic. It's augmentation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Privacy isn't as clean as advertised.&lt;/strong&gt; Membership-inference attacks&lt;br&gt;
against synthetic health data work. A 2022 JMIR analysis (since&lt;br&gt;
extended by multiple 2024 papers) demonstrated attackers can infer with&lt;br&gt;
substantial confidence whether a specific real patient's record was&lt;br&gt;
used to generate a synthetic cohort&lt;sup id="fnref6"&gt;6&lt;/sup&gt;. The re-identification risk&lt;br&gt;
rises for unique cases — older patients, rare conditions — which is&lt;br&gt;
exactly the population synthetic data is most often used for.&lt;br&gt;
Differential privacy mitigates this, but only at meaningful utility&lt;br&gt;
cost.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where synthetic structurally can't go
&lt;/h2&gt;

&lt;p&gt;Two categorical limits. Better generators don't fix them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Real outcomes.&lt;/strong&gt; A synthetic patient doesn't develop sepsis. Doesn't&lt;br&gt;
survive their cancer. Doesn't die from heart failure five years later.&lt;br&gt;
Synthetic outcome data is fictional — produced to match training&lt;br&gt;
distributions, not real biology. For Prometheno's longer-term horizon&lt;br&gt;
— paying or penalizing AI vendors when their predictions match or miss&lt;br&gt;
reality — the outcome side cannot be synthetic. No algorithm turns&lt;br&gt;
simulation into observation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Regulatory ground truth.&lt;/strong&gt; FDA Center for Devices issued updated&lt;br&gt;
real-world-evidence guidance December 2025&lt;sup id="fnref7"&gt;7&lt;/sup&gt;. The framework rests on&lt;br&gt;
observational data from actual patients in actual care. Synthetic&lt;br&gt;
control arms have a defined pathway as supplements to real evidence,&lt;br&gt;
not substitutes for it. EMA position is similar. For any AI/ML medical&lt;br&gt;
device seeking clearance, the path runs through real data.&lt;/p&gt;

&lt;h2&gt;
  
  
  What HAVEN does that synthetic can't
&lt;/h2&gt;

&lt;p&gt;The strongest argument for HAVEN comes from accepting synthetic&lt;br&gt;
data's strengths.&lt;/p&gt;

&lt;p&gt;If synthetic-only training plateaus well below clinical-grade for rare&lt;br&gt;
events, the path forward runs through hybrid models — and hybrid needs&lt;br&gt;
governed real data. Consent, audit, and quality grading are what make&lt;br&gt;
hybrid defensible at population scale.&lt;/p&gt;

&lt;p&gt;If real outcomes can't be synthesized, AI accountability runs on real&lt;br&gt;
outcome data. The infrastructure for collecting outcomes, tying them&lt;br&gt;
to the predictions that preceded them, and attributing value back to&lt;br&gt;
the contributing patients is what HAVEN's four primitives enable.&lt;/p&gt;

&lt;p&gt;If membership-inference attacks compromise synthetic privacy claims,&lt;br&gt;
the answer isn't to abandon real data. It's to govern access to real&lt;br&gt;
data properly. Consent-attestation and hash-chained audit produce&lt;br&gt;
traceability that de-identification alone never did.&lt;/p&gt;

&lt;p&gt;Synthetic data is complementary. It strengthens the case for a&lt;br&gt;
patient-sovereign protocol layer rather than replacing it.&lt;/p&gt;

&lt;h2&gt;
  
  
  What comes next
&lt;/h2&gt;

&lt;p&gt;The next post commits to what would prove the whole argument wrong.&lt;/p&gt;




&lt;ol&gt;

&lt;li id="fn1"&gt;
&lt;p&gt;Walonoski, J., et al. "Synthea: An approach, method, and software mechanism for generating synthetic patients and the synthetic electronic health care record." &lt;em&gt;Journal of the American Medical Informatics Association&lt;/em&gt; 25, no. 3 (2018): 230-238. Open-source, MITRE-maintained, used widely for testing and demonstration. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn2"&gt;
&lt;p&gt;Frontiers in Digital Health, "Synthetic data generation: a privacy-preserving approach to accelerate rare disease research" (2025). Text-guided diffusion produced synthetic images with 92.2% realism rate; hybrid training improved AUC from 0.7364 to 0.8442. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn3"&gt;
&lt;p&gt;JMIR Medical Informatics 8, no. 2 (2020), "Analyzing Medical Research Results Based on Synthetic Data and Their Relation to Real Data Results: Systematic Comparison From Five Observational Studies." &lt;a href="https://medinform.jmir.org/2020/2/e16492/" rel="noopener noreferrer"&gt;https://medinform.jmir.org/2020/2/e16492/&lt;/a&gt; ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn4"&gt;
&lt;p&gt;Alsentzer, E., et al. "Deep learning for diagnosing patients with rare genetic diseases." Zitnik Lab, Harvard. SHEPHERD model evaluated against the NIH Undiagnosed Diseases Network real-world cohort; published results show 40% top-1 accuracy on causal gene discovery. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn5"&gt;
&lt;p&gt;npj Digital Medicine, "Generating high-fidelity synthetic patient data for assessing machine learning healthcare software" (2020). &lt;a href="https://www.nature.com/articles/s41746-020-00353-9" rel="noopener noreferrer"&gt;https://www.nature.com/articles/s41746-020-00353-9&lt;/a&gt;. ResNet-18 on AMD fundus images: 85% accuracy with combined real+synthetic data. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn6"&gt;
&lt;p&gt;Hyeong, J., et al. "Membership inference attacks against synthetic health data." &lt;em&gt;Journal of Biomedical Informatics&lt;/em&gt; 125 (2022). &lt;a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC8766950/" rel="noopener noreferrer"&gt;https://pmc.ncbi.nlm.nih.gov/articles/PMC8766950/&lt;/a&gt;. Extended by multiple 2024 papers including work on differentially private synthetic data and re-identification on tabular GANs. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn7"&gt;
&lt;p&gt;U.S. Food and Drug Administration. &lt;em&gt;Use of Real-World Evidence to Support Regulatory Decision-Making for Medical Devices.&lt;/em&gt; Final guidance, December 16, 2025 (supersedes 2017 guidance). Real-World Data quality criteria emphasize relevance and reliability of observational data from actual patients in actual care. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;/ol&gt;

</description>
      <category>thesis</category>
      <category>haven</category>
      <category>protocol</category>
      <category>synthetic</category>
    </item>
    <item>
      <title>The clinical-truth gap</title>
      <dc:creator>Chester Guan （Ziyuan Guan）</dc:creator>
      <pubDate>Thu, 14 May 2026 17:00:04 +0000</pubDate>
      <link>https://forem.com/chesterguan/the-clinical-truth-gap-18j0</link>
      <guid>https://forem.com/chesterguan/the-clinical-truth-gap-18j0</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;Originally published at &lt;a href="https://prometheno.org/blog/2026-02-05-the-clinical-truth-gap" rel="noopener noreferrer"&gt;prometheno.org&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Now let's think together. In &lt;a href="https://dev.to/blog/2026-01-29-the-identity-gap"&gt;&lt;em&gt;The identity gap&lt;/em&gt;&lt;/a&gt;&lt;br&gt;
I said identity-proofing rides on existing institutions, not on crypto.&lt;br&gt;
This post is the same shape, different regime: clinical truth rides on&lt;br&gt;
existing medical practice. What HAVEN delivers is what the protocol&lt;br&gt;
layer can deliver — quality grading at ingest. The clinical-truth&lt;br&gt;
verification happens where it should: in medicine.&lt;/p&gt;

&lt;h2&gt;
  
  
  Two questions a record makes
&lt;/h2&gt;

&lt;p&gt;A clinical record claims two things at once.&lt;/p&gt;

&lt;p&gt;First — this byte sequence is the one that was written. Crypto answers&lt;br&gt;
that. Hash matches, signature verifies, chain intact. Done.&lt;/p&gt;

&lt;p&gt;Second — the byte sequence describes the patient's body. Glucose was&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The diagnosis was correct. The procedure happened. Different
question. Not because crypto fails at it; because crypto isn't pointed
at it.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Three concrete ways the second question goes wrong:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Wrong patient.&lt;/strong&gt; Two MRNs swapped at intake. The lab value belongs to&lt;br&gt;
someone else's blood. Signature, timestamp, chain — clean.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Wrong recording.&lt;/strong&gt; The phlebotomist drew from a contaminated line. The&lt;br&gt;
instrument read 247. The instrument was reading the IV bag.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Wrong interpretation.&lt;/strong&gt; "Type 2 diabetes" assigned to a patient whose&lt;br&gt;
elevated A1C was steroid-induced. The patient doesn't have diabetes.&lt;br&gt;
The record says they do.&lt;/p&gt;

&lt;p&gt;The chain is fine. The record is well-formed. The data is just wrong&lt;br&gt;
about the body. Catching this is medicine's job, and medicine has been&lt;br&gt;
doing it for centuries.&lt;/p&gt;

&lt;h2&gt;
  
  
  What HAVEN solves
&lt;/h2&gt;

&lt;p&gt;HAVEN's contribution at this layer is the &lt;strong&gt;3-Gate Quality Protocol&lt;/strong&gt;&lt;br&gt;
from §6.4. Reproducible, machine-verifiable quality grading at ingest.&lt;/p&gt;

&lt;p&gt;
  src="/blog/the-clinical-truth-gap/figure-2-quality-gates.svg"&lt;br&gt;
  alt="A pipeline showing a record entering three gates — Provenance valid, Structure complete, Concepts mapped — and emerging with a Grade A, B, C, or D classification."&lt;br&gt;
  caption="The 3-Gate Quality Protocol. Three checks at ingest, one grade out."&lt;br&gt;
/&amp;gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Gate 1: Provenance valid.&lt;/strong&gt; Cryptographic chain intact, signatures&lt;br&gt;
verify, hash hasn't moved. Catches custodian-level tampering.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Gate 2: Structure complete.&lt;/strong&gt; Required OMOP fields populated. FHIR&lt;br&gt;
resources validate against schema. Required relationships resolve. No&lt;br&gt;
nulls in required positions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Gate 3: Concepts mapped.&lt;/strong&gt; Diagnosis codes resolve to standard&lt;br&gt;
vocabularies (SNOMED, RxNorm, LOINC&lt;sup id="fnref1"&gt;1&lt;/sup&gt;) rather than local custom&lt;br&gt;
strings. Measurement units standardized. Medications map to active&lt;br&gt;
ingredients.&lt;/p&gt;

&lt;p&gt;All three pass → A. Two → B. One → C. None → D. The grade rides on&lt;br&gt;
the record's metadata, visible to anyone who pulls it.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the grade buys you
&lt;/h2&gt;

&lt;p&gt;Before quality grading, a researcher pulling a cohort had two options:&lt;br&gt;
trust the source, or audit every record by hand. A reproducible grade&lt;br&gt;
gives them a third — filter to grade A and know exactly what was&lt;br&gt;
checked.&lt;/p&gt;

&lt;p&gt;An AI vendor training on a grade-A cohort gets a cleaner training&lt;br&gt;
signal than one training on raw mixed-grade data. Models can be&lt;br&gt;
validated against the grade.&lt;/p&gt;

&lt;p&gt;A patient who contributed records sees their contributions weighted by&lt;br&gt;
grade. HAVEN's 3-Tier Value Model ties the grade to the attribution&lt;br&gt;
score&lt;sup id="fnref2"&gt;2&lt;/sup&gt;. Quality matters for compensation.&lt;/p&gt;

&lt;p&gt;The grade isn't a clinical-truth guarantee. It is the strongest claim&lt;br&gt;
the protocol layer can make on its own — and it already changes how&lt;br&gt;
research-grade data gets compiled.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where clinical truth lives
&lt;/h2&gt;

&lt;p&gt;
  src="/blog/the-clinical-truth-gap/figure-1-verification-stack.svg"&lt;br&gt;
  alt="A four-layer verification stack with the Clinical truth layer highlighted."&lt;br&gt;
  caption="Clinical truth sits two layers above crypto. Different regime, different evidence."&lt;br&gt;
/&amp;gt;&lt;/p&gt;

&lt;p&gt;Clinical truth — whether the record matches the body — lives in&lt;br&gt;
medical empiricism. Repeated observation, independent measurement,&lt;br&gt;
longitudinal follow-up. The protocols are mature: Good Clinical&lt;br&gt;
Practice guidelines for trial data&lt;sup id="fnref3"&gt;3&lt;/sup&gt;, data monitoring committees,&lt;br&gt;
multi-source validation, adjudication panels.&lt;/p&gt;

&lt;p&gt;These have been doing the work for decades, by people who do nothing&lt;br&gt;
else. The protocol layer connects to them. It doesn't try to be them.&lt;/p&gt;

&lt;h2&gt;
  
  
  The decomposition is the design
&lt;/h2&gt;

&lt;p&gt;A protocol that tried to verify clinical truth on its own would have&lt;br&gt;
to run adjudication panels. It would have to be a mortality registry.&lt;br&gt;
It would need credentialed physicians on staff. That's not a protocol&lt;br&gt;
— that's a research institute.&lt;/p&gt;

&lt;p&gt;HAVEN decomposes the work. Quality grading runs at the protocol layer,&lt;br&gt;
where it scales across institutions. Clinical-truth verification runs&lt;br&gt;
in the medical regime, where it already happens. The two meet at the&lt;br&gt;
attribution layer — research outcomes flow back, tied to graded&lt;br&gt;
contributions, validated against medical-empirical evidence&lt;sup id="fnref4"&gt;4&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;The longer-term arc — paying patients when their data contributes to&lt;br&gt;
outcomes, paying or penalizing AI vendors when predictions match or&lt;br&gt;
miss reality — depends on this decomposition holding. Quality is the&lt;br&gt;
protocol's job. Truth is medicine's. Both are necessary. Neither&lt;br&gt;
substitutes for the other.&lt;/p&gt;

&lt;h2&gt;
  
  
  What comes next
&lt;/h2&gt;

&lt;p&gt;Posts 2 through 5 have argued what &lt;em&gt;would&lt;/em&gt; happen if the protocol&lt;br&gt;
works. The next post commits to what would prove the whole argument&lt;br&gt;
wrong.&lt;/p&gt;




&lt;ol&gt;

&lt;li id="fn1"&gt;
&lt;p&gt;SNOMED CT (Systematized Nomenclature of Medicine, Clinical Terms), RxNorm (NIH unified medication nomenclature), and LOINC (Logical Observation Identifiers Names and Codes) — the OHDSI/OMOP standard vocabularies for diagnoses, medications, and laboratory results respectively. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn2"&gt;
&lt;p&gt;HAVEN whitepaper v2.0, §6.4: Quality Assessment and the 3-Tier Value Model. DOI: &lt;a href="https://doi.org/10.5281/zenodo.18701303" rel="noopener noreferrer"&gt;10.5281/zenodo.18701303&lt;/a&gt;. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn3"&gt;
&lt;p&gt;International Council for Harmonisation. &lt;em&gt;ICH Harmonised Guideline: Good Clinical Practice E6(R3).&lt;/em&gt; ICH, January 2025. Normative standard for the conduct of clinical trials, including source-document verification and endpoint adjudication procedures. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn4"&gt;
&lt;p&gt;U.S. FDA Center for Devices and Radiological Health, &lt;em&gt;Software as a Medical Device (SaMD) — Clinical Evaluation,&lt;/em&gt; and the IMDRF SaMD framework. Validation of clinical AI/ML is empirical and ongoing, separate from data-integrity verification. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;/ol&gt;

</description>
      <category>thesis</category>
      <category>haven</category>
      <category>protocol</category>
      <category>codequality</category>
    </item>
    <item>
      <title>The identity gap</title>
      <dc:creator>Chester Guan （Ziyuan Guan）</dc:creator>
      <pubDate>Thu, 14 May 2026 17:00:04 +0000</pubDate>
      <link>https://forem.com/chesterguan/the-identity-gap-1o4h</link>
      <guid>https://forem.com/chesterguan/the-identity-gap-1o4h</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;Originally published at &lt;a href="https://prometheno.org/blog/2026-01-29-the-identity-gap" rel="noopener noreferrer"&gt;prometheno.org&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Now let's think together. In &lt;a href="https://dev.to/blog/2026-01-22-four-primitives"&gt;&lt;em&gt;The four primitives&lt;/em&gt;&lt;/a&gt;&lt;br&gt;
I said identity sits outside the protocol's scope. This post says what&lt;br&gt;
"outside" means.&lt;/p&gt;

&lt;p&gt;The cryptographic primitives from post 3 can prove a signature came&lt;br&gt;
from a specific key. They can't prove the key belongs to a specific&lt;br&gt;
human. That second question is older than crypto, and the field that&lt;br&gt;
answers it has been working at it for forty years. Building a patient-&lt;br&gt;
sovereign protocol that tries to redo that work is a category error.&lt;/p&gt;

&lt;h2&gt;
  
  
  A regime, not a function
&lt;/h2&gt;

&lt;p&gt;Every claim in a healthcare system gets verified inside some regime.&lt;br&gt;
Signatures get checked with math. Audit chains get checked with hash&lt;br&gt;
replay. Both live in the cryptographic regime — the truth-conditions&lt;br&gt;
are mathematical, the evidence is computable, any honest party with&lt;br&gt;
the artifacts reaches the same answer.&lt;/p&gt;

&lt;p&gt;"Alice Chen holds this key" isn't that kind of claim. It's a claim&lt;br&gt;
about the world. The evidence is a passport, a biometric, a notary's&lt;br&gt;
seal. You can audit the procedure. You can't compute it.&lt;/p&gt;

&lt;p&gt;Not harder math. A different question.&lt;/p&gt;

&lt;p&gt;
  src="/blog/the-identity-gap/figure-1-verification-stack.svg"&lt;br&gt;
  alt="A four-layer stack labeled Outcome verification, Clinical truth, Identity proofing, and Cryptographic verification. The Identity proofing layer is highlighted."&lt;br&gt;
  caption="Four regimes the protocol rides on. Crypto sits at the bottom. Identity proofing is the next layer up, and the subject of this post."&lt;br&gt;
/&amp;gt;&lt;/p&gt;

&lt;p&gt;The protocol can use a regime's &lt;em&gt;output&lt;/em&gt; — an identity assertion, a&lt;br&gt;
quality grade, an outcome label. It can't manufacture that output from&lt;br&gt;
below.&lt;/p&gt;

&lt;h2&gt;
  
  
  What a signature doesn't say
&lt;/h2&gt;

&lt;p&gt;A signature confirms whoever holds the key signed the message. It says&lt;br&gt;
nothing about who holds the key. Three ways that gap bites:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stolen keys.&lt;/strong&gt; A signature from Alice's key, after the key has been&lt;br&gt;
exfiltrated to Bob, is indistinguishable from a genuine Alice&lt;br&gt;
signature. The math doesn't care who's typing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Shared keys.&lt;/strong&gt; Alice gives her key to her daughter to manage Alice's&lt;br&gt;
records. Every consent grant looks like Alice. The daughter could&lt;br&gt;
grant access Alice would refuse, and the protocol has no way to know.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sybil accounts.&lt;/strong&gt; One person creates ten patient identities, each&lt;br&gt;
with a different key. Signatures all verify. Contributions all look&lt;br&gt;
distinct. Cryptography is structurally blind to this&lt;sup id="fnref1"&gt;1&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;Identity proofing is what makes those expensive. Crypto is a&lt;br&gt;
coordination primitive — it lets parties who don't trust each other&lt;br&gt;
agree on what was signed. It deliberately doesn't try to settle who&lt;br&gt;
the parties are. That settlement happens upstream.&lt;/p&gt;

&lt;h2&gt;
  
  
  What proofing actually means
&lt;/h2&gt;

&lt;p&gt;NIST 800-63-4&lt;sup id="fnref2"&gt;2&lt;/sup&gt; is the U.S. federal standard for identity proofing,&lt;br&gt;
last revised in 2024. It defines three Identity Assurance Levels —&lt;br&gt;
IAL1, IAL2, IAL3 — each naming what evidence binds a credential to a&lt;br&gt;
real human. It predates healthcare-specific concerns by years. The&lt;br&gt;
federal government, the financial sector, and most regulated&lt;br&gt;
industries already use it.&lt;/p&gt;

&lt;p&gt;
  src="/blog/the-identity-gap/figure-2-ial-ladder.svg"&lt;br&gt;
  alt="Three steps ascending from IAL1 (self-asserted, no proofing) to IAL2 (remote or in-person proofing with strong evidence) to IAL3 (in-person plus supervised biometric capture)."&lt;br&gt;
  caption="NIST 800-63-4 Identity Assurance Levels. The protocol's assurance is bounded by the IAL the implementing system delivers."&lt;br&gt;
/&amp;gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;IAL1 is self-asserted.&lt;/strong&gt; You type your name; the system believes you.&lt;br&gt;
Fine for newsletter signups. Not fine for an asset binding that gates&lt;br&gt;
clinical data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;IAL2 is the working floor for healthcare.&lt;/strong&gt; Government-issued ID plus&lt;br&gt;
a live biometric, remote or in-person. The Cures Act patient-access&lt;br&gt;
rule&lt;sup id="fnref3"&gt;3&lt;/sup&gt; effectively assumes IAL2. ONC certified-health-IT requirements&lt;br&gt;
align&lt;sup id="fnref4"&gt;4&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;IAL3 adds a supervised in-person step.&lt;/strong&gt; An authorized agent&lt;br&gt;
inspects the evidence and the person presenting it. Federal benefits,&lt;br&gt;
defense clearances, some clinical research.&lt;/p&gt;

&lt;p&gt;A protocol that demands IAL3 everywhere prices itself out of&lt;br&gt;
population scale. A protocol that accepts IAL1 and pretends otherwise&lt;br&gt;
prices itself out of credibility. HAVEN doesn't pick — the&lt;br&gt;
implementing system picks, and the consent grant inherits whatever&lt;br&gt;
assurance the system can deliver.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Prometheno consumes
&lt;/h2&gt;

&lt;p&gt;A Consent Attestation names a grantor and a grantee. The protocol&lt;br&gt;
doesn't say how either gets resolved to a human. The implementing&lt;br&gt;
system mounts an existing substrate.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;OpenID Connect&lt;/strong&gt;&lt;sup id="fnref5"&gt;5&lt;/sup&gt; — federated identity over OAuth 2.0. The&lt;br&gt;
standard for "log in with your hospital portal." Token in, identity&lt;br&gt;
out, at whatever IAL the provider supports.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Decentralized Identifiers&lt;/strong&gt;&lt;sup id="fnref6"&gt;6&lt;/sup&gt; — W3C standard for self-sovereign&lt;br&gt;
identities backed by verifiable credentials. Useful when patients&lt;br&gt;
carry identity across institutions. Doesn't itself produce IAL; relies&lt;br&gt;
on the credential issuer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;EHR-issued identity&lt;/strong&gt; — the provider already proofed the patient at&lt;br&gt;
intake. SMART on FHIR&lt;sup id="fnref7"&gt;7&lt;/sup&gt; surfaces that identity to apps via the&lt;br&gt;
Patient resource. Most US silent-pilot work starts here, because the&lt;br&gt;
proofing already happened in-clinic.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;eIDAS in the EU&lt;/strong&gt;&lt;sup id="fnref8"&gt;8&lt;/sup&gt; — national-level electronic identity with&lt;br&gt;
Assurance Levels (Low / Substantial / High) that map cleanly onto&lt;br&gt;
IAL1/2/3. Relevant when EU patient pools come into scope.&lt;/p&gt;

&lt;p&gt;The whitepaper §9 is explicit: &lt;em&gt;"How you verify patients are who they&lt;br&gt;
say they are is up to you."&lt;/em&gt;&lt;sup id="fnref9"&gt;9&lt;/sup&gt; That's not a punt. It's the same&lt;br&gt;
scoping move SMTP made for institutional directories, OAuth made for&lt;br&gt;
existing accounts, OIDC made for OAuth. Building identity proofing&lt;br&gt;
into HAVEN would be like asking HTTP to run a passport office.&lt;/p&gt;

&lt;h2&gt;
  
  
  What HAVEN inherits
&lt;/h2&gt;

&lt;p&gt;Every weakness in the substrate propagates into the protocol.&lt;/p&gt;

&lt;p&gt;If the substrate is IAL1, every signed consent is IAL1 consent. The&lt;br&gt;
chain is unbroken; the signatures verify; the audit log holds. And&lt;br&gt;
"the patient" is whoever clicked through. Crypto can make the fiction&lt;br&gt;
tamper-evident. It can't make it true.&lt;/p&gt;

&lt;p&gt;If credential recovery is a security question, an attacker who guesses&lt;br&gt;
the answer takes over the credential and signs whatever they like.&lt;br&gt;
The protocol records it as a legitimate session. The fix isn't more&lt;br&gt;
crypto. The fix is in the identity layer.&lt;/p&gt;

&lt;p&gt;If the substrate is high-assurance for the patient but low-assurance&lt;br&gt;
for the grantee — the researcher, the AI vendor, the lab — the&lt;br&gt;
asymmetry hides. A chain is only as strong as its identity links.&lt;/p&gt;

&lt;p&gt;The honest posture: document what the protocol assumes (IAL on&lt;br&gt;
grantor, IAL on grantee), make those assumptions explicit in the&lt;br&gt;
consent record, reject grants where the substrate can't deliver them.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this is a separate post
&lt;/h2&gt;

&lt;p&gt;Build identity proofing into consent. Pick an IAL. Mandate biometric&lt;br&gt;
capture. Ship. That path ends two ways: rejected by every system that&lt;br&gt;
already has an identity substrate, or drifting into a quasi-identity&lt;br&gt;
provider that competes with the substrates it was supposed to&lt;br&gt;
consume.&lt;/p&gt;

&lt;p&gt;So the boundary stays where it is. Prometheno consumes whatever the&lt;br&gt;
implementing system mounts. The cost is real — the protocol's&lt;br&gt;
sovereignty claim is contingent on identity-proofing upstream — but&lt;br&gt;
it's the same cost SMTP pays for not running mail servers and OAuth&lt;br&gt;
pays for not running user databases.&lt;/p&gt;

&lt;h2&gt;
  
  
  What comes next
&lt;/h2&gt;

&lt;p&gt;The second gap is harder, because it lives in a different&lt;br&gt;
epistemology. Crypto can prove a value hasn't been altered. Identity&lt;br&gt;
proofing can prove a key belongs to a person. Neither can prove the&lt;br&gt;
value reflects clinical reality. That sits in medical empiricism. The&lt;br&gt;
next post takes it up.&lt;/p&gt;




&lt;ol&gt;

&lt;li id="fn1"&gt;
&lt;p&gt;Douceur, J.R. "The Sybil Attack." &lt;em&gt;International Workshop on Peer-to-Peer Systems&lt;/em&gt; (IPTPS), 2002. Sybil resistance requires an out-of-band binding to a costly real-world artifact. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn2"&gt;
&lt;p&gt;NIST Special Publication 800-63-4: &lt;em&gt;Digital Identity Guidelines.&lt;/em&gt; National Institute of Standards and Technology, 2024. Defines IAL (Identity Assurance Level), AAL (Authenticator Assurance Level), FAL (Federation Assurance Level) as orthogonal axes. This post discusses IAL only. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn3"&gt;
&lt;p&gt;21st Century Cures Act, Public Law 114-255 (2016); ONC interoperability rules 85 FR 25642 (May 2020), 89 FR 1437 (January 2024). Patient-access APIs operate under identity proofing equivalent to NIST IAL2. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn4"&gt;
&lt;p&gt;ONC Health IT Certification Program §170.315(d) — identity-proofing requirements for credentialed access to certified health IT. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn5"&gt;
&lt;p&gt;Sakimura, N., Bradley, J., Jones, M., de Medeiros, B., Mortimore, C. &lt;em&gt;OpenID Connect Core 1.0&lt;/em&gt;, OpenID Foundation, incorporating errata set 2 (2014–present). Federated authentication on top of OAuth 2.0 (RFC 6749). ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn6"&gt;
&lt;p&gt;W3C &lt;em&gt;Decentralized Identifiers (DIDs) v1.0&lt;/em&gt;, W3C Recommendation, July 2022. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn7"&gt;
&lt;p&gt;SMART App Launch Framework v2.0. Mandel, J.C., et al. &lt;em&gt;SMART on FHIR: A standards-based, interoperable apps platform for electronic health records.&lt;/em&gt; JAMIA 23(5):899-908, 2016. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn8"&gt;
&lt;p&gt;eIDAS Regulation EU 910/2014, effective 2016; eIDAS 2.0 (Regulation EU 2024/1183) extending the framework with the European Digital Identity Wallet. Assurance Levels (Low / Substantial / High) align with NIST IAL1/2/3. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn9"&gt;
&lt;p&gt;HAVEN whitepaper v2.0, §9, "What We're Not Trying to Do." DOI: &lt;a href="https://doi.org/10.5281/zenodo.18701303" rel="noopener noreferrer"&gt;10.5281/zenodo.18701303&lt;/a&gt;. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;/ol&gt;

</description>
      <category>thesis</category>
      <category>haven</category>
      <category>protocol</category>
      <category>identity</category>
    </item>
    <item>
      <title>The four primitives</title>
      <dc:creator>Chester Guan （Ziyuan Guan）</dc:creator>
      <pubDate>Wed, 13 May 2026 15:42:55 +0000</pubDate>
      <link>https://forem.com/chesterguan/the-four-primitives-4e7g</link>
      <guid>https://forem.com/chesterguan/the-four-primitives-4e7g</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;Originally published at &lt;a href="https://prometheno.org/blog/2026-05-26-four-primitives" rel="noopener noreferrer"&gt;prometheno.org&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Now let's think together. In &lt;a href="https://dev.to/blog/2026-05-12-three-failures-one-layer"&gt;&lt;em&gt;Three failures, one missing layer&lt;/em&gt;&lt;/a&gt; I&lt;br&gt;
argued that healthcare's three persistent AI failures share one shape:&lt;br&gt;
each requires a governance protocol layer that doesn't yet exist&lt;br&gt;
anywhere — not in applications, not in regulations, not in platforms,&lt;br&gt;
not even in the standards layer that handles data shape. This post&lt;br&gt;
specifies what such a layer must provide.&lt;/p&gt;

&lt;p&gt;Four primitives carry the load.&lt;/p&gt;

&lt;p&gt;Content-addressable Health Assets. Programmable Consent. Hash-chained&lt;br&gt;
Provenance. Quality-weighted Contribution. Each one addresses a&lt;br&gt;
specific failure named in the previous post. Each one earns its place&lt;br&gt;
against an alternative that doesn't work.&lt;/p&gt;

&lt;p&gt;The claim isn't that these four are provably the smallest possible&lt;br&gt;
set. Design spaces resist that kind of proof. The claim is that each&lt;br&gt;
one earns its place, that they cluster naturally as a working set,&lt;br&gt;
and that any honest governance protocol has to answer all four&lt;br&gt;
questions they answer.&lt;/p&gt;
&lt;h2&gt;
  
  
  Specifying, instead of waving at it
&lt;/h2&gt;

&lt;p&gt;"Consent and audit" is what every patient-data pitch already says.&lt;br&gt;
The phrase is correct and inert.&lt;br&gt;
Anything strict enough to actually rule out the failure modes post 2&lt;br&gt;
named has to be specified at the level of data structures and&lt;br&gt;
algorithms, not slogans.&lt;/p&gt;

&lt;p&gt;A primitive, to count here, has to do three things:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Address a specific failure mode that doesn't dissolve if you
refuse to define the primitive. (If "consent" can be replaced by
"the patient signed a form," the form was the primitive, not the
word.)&lt;/li&gt;
&lt;li&gt;Rule out alternatives that look similar but don't carry the same
guarantee. (Signed consent records aren't the same as
hash-chained consent records, even though both involve signing.)&lt;/li&gt;
&lt;li&gt;Compose with the other primitives without circular dependency.
(Provenance can't be the thing that verifies its own integrity.)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The four primitives below each pass this bar. Each section names the&lt;br&gt;
failure it addresses, the obvious alternative that doesn't work, and&lt;br&gt;
what breaks if you remove it.&lt;/p&gt;
&lt;h2&gt;
  
  
  Health Assets
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Failure addressed: fragmentation.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Post 2 named the fragmentation: EHRs hold parts, apps hold other&lt;br&gt;
parts, research datasets are fixed snapshots. For a governance&lt;br&gt;
protocol to mean anything across all of these, it needs a way to&lt;br&gt;
refer to a specific piece of clinical data that everyone agrees is&lt;br&gt;
the same piece — verifiably, across systems that don't trust each&lt;br&gt;
other.&lt;/p&gt;

&lt;p&gt;That's what a Health Asset is. From HAVEN whitepaper §6.1&lt;sup id="fnref1"&gt;1&lt;/sup&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;HealthAsset := {
    asset_id        : ContentHash      // Derived from content
    data_ref        : SecureReference  // Pointer to clinical data
    substrate       : Identifier       // Data format (FHIR, OMOP, etc.)
    consent_ref     : ConsentID        // Active consent policy
    quality_class   : {A, B, C, D}     // Data quality grade
    provenance_ref  : ProvenanceID     // Audit chain reference
    patient_ref     : PatientID        // Owner of this data
    created_at      : Timestamp
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;asset_id&lt;/code&gt; is a SHA-256 hash of the content. Change one byte of&lt;br&gt;
the underlying data, the hash changes, the &lt;code&gt;asset_id&lt;/code&gt; no longer&lt;br&gt;
matches. The pointer carries its own integrity check. The same trick&lt;br&gt;
Git uses for commits&lt;sup id="fnref2"&gt;2&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The obvious alternative: just give every record a UUID.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;UUIDs work fine inside one system. They fail at the boundary. Two&lt;br&gt;
custodians can issue the same UUID for different records, or&lt;br&gt;
different UUIDs for what should be the same record. Reconciliation&lt;br&gt;
becomes a coordination problem that has to be solved custodian by&lt;br&gt;
custodian. Content addressing dissolves it: same content, same hash,&lt;br&gt;
anywhere. No registry needed. No reconciliation needed&lt;sup id="fnref3"&gt;3&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What breaks if you remove this primitive:&lt;/strong&gt; the protocol loses any&lt;br&gt;
basis for saying two systems are referring to the same record. Every&lt;br&gt;
audit becomes "trust me, this is the same record." Every consent&lt;br&gt;
becomes ambiguous about what it covers. The fragmentation failure&lt;br&gt;
named in post 2 stays unfixed.&lt;/p&gt;

&lt;p&gt;Content addressing isn't new. Git has used it since 2005. IPFS&lt;br&gt;
implements it for general data. RFC 6920 specifies it for URIs&lt;sup id="fnref4"&gt;4&lt;/sup&gt;.&lt;br&gt;
The choice in HAVEN is to apply it to healthcare records specifically,&lt;br&gt;
in a substrate-neutral way — the same Health Asset can wrap a FHIR&lt;br&gt;
resource, an OMOP measurement, or a raw document reference.&lt;/p&gt;
&lt;h2&gt;
  
  
  Consent
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Failure addressed: no role for the patient.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Three of the four primitives map directly to one of the failures&lt;br&gt;
named in post 2. This one doesn't. Consent is the precondition for&lt;br&gt;
the other three to mean anything. It's what turns the patient from a&lt;br&gt;
data source into an actor in the coordination protocol. Without it,&lt;br&gt;
governance has nothing to bite on.&lt;/p&gt;

&lt;p&gt;A patient's record is one of dozens. Each custodian decides what's&lt;br&gt;
shared, with whom, for what purpose. The patient signs a form, often&lt;br&gt;
under duress, and the form is then interpreted application by&lt;br&gt;
application. Revoking is a phone call to records. Auditing is a FOIA&lt;br&gt;
request. "Consent" in this regime is a paper artifact, not a&lt;br&gt;
machine-verifiable proposition.&lt;/p&gt;

&lt;p&gt;HAVEN's Consent Protocol turns it into one. From whitepaper §6.2&lt;sup id="fnref1"&gt;1&lt;/sup&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ConsentAttestation := {
    consent_id      : UUID
    grantor         : PatientIdentity   // Who grants
    grantee         : AccessorIdentity  // Who receives
    scope           : DataScope         // What data
    purpose         : PurposeType       // Why
    conditions      : Conditions[]      // Under what rules
    ...
    status          : {active, revoked, expired}
    signature       : CryptoSignature
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three properties make this primitive different from existing consent&lt;br&gt;
practice.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Closed-world semantics.&lt;/strong&gt; If you didn't explicitly grant access to&lt;br&gt;
a resource type, the answer is no. Silence is denial. Existing&lt;br&gt;
consent regimes default to permission for anything not explicitly&lt;br&gt;
forbidden; HAVEN inverts that.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Deterministic verification.&lt;/strong&gt; Same inputs, same answer, every&lt;br&gt;
time. No randomness, no "it depends." That's what makes the consent&lt;br&gt;
machine-verifiable rather than interpretable.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Immediate revocation.&lt;/strong&gt; The next &lt;code&gt;verify()&lt;/code&gt; call after a &lt;code&gt;revoke()&lt;/code&gt;&lt;br&gt;
returns denied. Not "after the next sync." Not "within 24 hours."&lt;br&gt;
Immediately.&lt;/p&gt;

&lt;p&gt;The ethical foundation isn't new. The Nuremberg Code (1947)&lt;sup id="fnref5"&gt;5&lt;/sup&gt;&lt;br&gt;
established that voluntary consent is the floor for medical research.&lt;br&gt;
The Belmont Report (1979)&lt;sup id="fnref6"&gt;6&lt;/sup&gt; codified the principle for modern&lt;br&gt;
practice. What's new is making the principle executable — turning a&lt;br&gt;
40-page form into a function.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The obvious alternative: signed consent forms (digital or paper).&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A signed form attests that consent happened. It doesn't attest to&lt;br&gt;
what the consent permits, doesn't compose with audit trails, and&lt;br&gt;
doesn't carry revocation state. Two systems sharing the same signed&lt;br&gt;
form will interpret its scope differently. The form is evidence; the&lt;br&gt;
primitive needs to be a function.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A note on Identity.&lt;/strong&gt; Consent grants reference two parties —&lt;br&gt;
grantor and grantee. Both have to be verifiable identities for the&lt;br&gt;
consent to mean anything. HAVEN deliberately doesn't define how&lt;br&gt;
identity is established: &lt;em&gt;"How you verify patients are who they say&lt;br&gt;
they are is up to you"&lt;/em&gt;&lt;sup id="fnref7"&gt;7&lt;/sup&gt;. The protocol consumes identity from&lt;br&gt;
established systems (OIDC, DIDs, EHR identity proofing&lt;sup id="fnref8"&gt;8&lt;/sup&gt;) and&lt;br&gt;
operates over those. Identity-proofing is its own deep field;&lt;br&gt;
reinventing it inside the governance protocol would be a bad bet.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What breaks if you remove this primitive:&lt;/strong&gt; data flow without&lt;br&gt;
governance. The sovereignty failure stays unfixed regardless of how&lt;br&gt;
clean the data layer is.&lt;/p&gt;
&lt;h2&gt;
  
  
  Provenance
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Failure addressed: missing audit.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;An audit log inside the system being audited is auditable by the&lt;br&gt;
system's custodian. Nobody else. If MyChart logs your record access,&lt;br&gt;
you have to ask MyChart for the log. If the log is wrong, you have&lt;br&gt;
to ask MyChart to prove it isn't. That's not audit. That's a&lt;br&gt;
custodian's self-attestation, served on a printout.&lt;/p&gt;

&lt;p&gt;The Provenance Record fixes this by making the log structurally&lt;br&gt;
tamper-evident. From whitepaper §6.3&lt;sup id="fnref1"&gt;1&lt;/sup&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ProvenanceEntry := {
    entry_id        : UUID
    timestamp       : Timestamp
    event_type      : EventType
    actor           : Identity
    subject         : AssetRef | ConsentRef
    details         : EventData
    previous_hash   : Hash          // Chain linkage
    signature       : CryptoSignature
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each entry includes the hash of the previous one. Tampering with&lt;br&gt;
history breaks the chain — the change cascades forward, every entry&lt;br&gt;
after the tampered one becomes invalid. Each entry is signed with&lt;br&gt;
Ed25519 or ECDSA, binding it to a specific actor. Verification is&lt;br&gt;
O(log n) via Merkle proofs&lt;sup id="fnref9"&gt;9&lt;/sup&gt;: you don't need to replay the whole&lt;br&gt;
chain to check a single entry.&lt;/p&gt;

&lt;p&gt;This is the same construction Certificate Transparency uses for the&lt;br&gt;
public web's certificate logs&lt;sup id="fnref10"&gt;10&lt;/sup&gt;. And before CT, it's the same&lt;br&gt;
construction Haber and Stornetta proposed in 1991&lt;sup id="fnref11"&gt;11&lt;/sup&gt; — seventeen&lt;br&gt;
years before Bitcoin. The technique is well-understood. The novelty is&lt;br&gt;
applying it to clinical data access.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The obvious alternative: signed but mutable audit logs.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Signatures alone aren't enough. The custodian who owns the log can&lt;br&gt;
re-sign a modified version with the same key, and the substitution&lt;br&gt;
is undetectable to anyone who doesn't have the original. The&lt;br&gt;
chaining is what makes substitution detectable. Without it, "audit"&lt;br&gt;
remains a courtesy.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What breaks if you remove this primitive:&lt;/strong&gt; the patient has no&lt;br&gt;
basis for verifying any claim about what happened to their record.&lt;br&gt;
Consent becomes unenforceable in the wild, because revocation can't&lt;br&gt;
be verified after the fact. The missing-audit failure stays unfixed.&lt;/p&gt;
&lt;h2&gt;
  
  
  Contribution
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Failure addressed: misaligned incentives.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Patients contribute data; researchers use it; outcomes flow to&lt;br&gt;
neither directly. To realign, the protocol needs an accounting&lt;br&gt;
primitive — something that turns "Alice contributed records to study&lt;br&gt;
X" into a value-weighted quantity that can be tracked, attributed,&lt;br&gt;
and eventually paid.&lt;/p&gt;

&lt;p&gt;From whitepaper §6.4&lt;sup id="fnref1"&gt;1&lt;/sup&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Contribution := {
    patient_id      : PatientIdentity
    asset_refs      : AssetRef[]
    quality_score   : Float[0, 1]
    tier            : ContributionTier
    context         : UsageContext
    timestamp       : Timestamp
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The score follows a transparent formula: &lt;code&gt;Value = TierWeight ×&lt;br&gt;
QualityScore × VolumeNorm&lt;/code&gt;. Tiers run from PROFILE (demographics)&lt;br&gt;
through STRUCTURED (labs, meds, conditions) and LONGITUDINAL&lt;br&gt;
(multi-year records) to COMPLEX (notes, imaging, genomics). Quality&lt;br&gt;
is determined by a three-gate protocol — provenance valid, structure&lt;br&gt;
complete, concepts mapped — producing a score from 0 to 1 and a&lt;br&gt;
class from A to D.&lt;/p&gt;

&lt;p&gt;The score isn't dollars. It's a relative weight. If Alice scores&lt;br&gt;
0.83 and Bob scores 0.41, Alice contributed roughly twice as much to&lt;br&gt;
that study. What that translates to in money is between the&lt;br&gt;
implementing system, the patients, and the business model. HAVEN&lt;br&gt;
provides the accounting, not the payment rails.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The obvious alternative #1: equal-share data dividends.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The Datacoup / Datawallet / LunaDNA model — every contributor gets&lt;br&gt;
the same share. This collapses on contact with reality. A patient&lt;br&gt;
contributing a single demographic record is treated identically to&lt;br&gt;
one contributing ten years of multi-system labs. Researchers won't&lt;br&gt;
trust the cohort because it can't be quality-weighted. Patients who&lt;br&gt;
contribute heavily get the same as those who contribute thinly. The&lt;br&gt;
system fails on both ends&lt;sup id="fnref12"&gt;12&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The obvious alternative #2: pure clinical-weight, no quality&lt;br&gt;
gating.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Skip the quality gates, weight by clinical content alone. Works in&lt;br&gt;
theory, breaks in practice — clinical content quality varies wildly.&lt;br&gt;
A LONGITUDINAL record with 95% concept-mapping coverage is different&lt;br&gt;
research material from a LONGITUDINAL record with 30%. Without&lt;br&gt;
quality gating, "value" becomes garbage-in garbage-out.&lt;/p&gt;

&lt;p&gt;The three-gate quality protocol exists because each previous attempt&lt;br&gt;
at patient data marketplaces collapsed in one of these two ways. The&lt;br&gt;
historical evidence is on the page already.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What breaks if you remove this primitive:&lt;/strong&gt; value pools at the&lt;br&gt;
custodian, not the patient. The misaligned-incentives failure stays&lt;br&gt;
unfixed. The protocol becomes another consent-and-audit layer with&lt;br&gt;
no honest accounting of where research value goes.&lt;/p&gt;

&lt;p&gt;Data Shapley and related attribution methods&lt;sup id="fnref13"&gt;13&lt;/sup&gt; suggest more&lt;br&gt;
refined math is possible. The three-tier quality-weighted formula is&lt;br&gt;
HAVEN's deliberate floor — easy enough to compute, hard enough to&lt;br&gt;
defend, and intentionally open to richer attribution schemes layered&lt;br&gt;
on top.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why these four cluster
&lt;/h2&gt;

&lt;p&gt;Each primitive answers a different question that any governance&lt;br&gt;
protocol has to answer:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Primitive&lt;/th&gt;
&lt;th&gt;Question it answers&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Health Asset&lt;/td&gt;
&lt;td&gt;What is the record?&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Consent&lt;/td&gt;
&lt;td&gt;Who may use it, and for what?&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Provenance&lt;/td&gt;
&lt;td&gt;What happened to it?&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Contribution&lt;/td&gt;
&lt;td&gt;What was it worth?&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Remove any one and the protocol stops being a protocol. Remove&lt;br&gt;
Health Assets and Consent has no stable thing to authorize. Remove&lt;br&gt;
Consent and Provenance has nothing to audit against. Remove&lt;br&gt;
Provenance and the whole system runs on trust. Remove Contribution&lt;br&gt;
and the system has no honest reason for patients to participate.&lt;/p&gt;

&lt;p&gt;The four cluster naturally because each answers a category of&lt;br&gt;
question that the others can't. They aren't variations on a theme.&lt;br&gt;
They aren't aspects of a single underlying concept. They're four&lt;br&gt;
distinct functions a governance protocol has to provide if it's&lt;br&gt;
going to be the missing layer post 2 named.&lt;/p&gt;

&lt;p&gt;That's the working set. A reader who can show that one of them is&lt;br&gt;
reducible to another, or that a fifth answers a question I haven't&lt;br&gt;
named, should write back. The series is better for the pressure.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this argument can't show
&lt;/h2&gt;

&lt;p&gt;Three limits worth naming.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Identity sits outside the protocol's scope.&lt;/strong&gt; HAVEN's position is that&lt;br&gt;
identity-proofing happens outside the protocol, in established&lt;br&gt;
systems. That's a deliberate boundary. It also&lt;br&gt;
means the protocol inherits whatever weakness exists in the identity&lt;br&gt;
layer it rides on. A weak identity binding produces weak consents;&lt;br&gt;
HAVEN doesn't fix that upstream problem. It just declines to make it&lt;br&gt;
worse.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Settlement is downstream.&lt;/strong&gt; Turning attribution scores into actual&lt;br&gt;
payments — to patients, to research funds, to whatever model the&lt;br&gt;
implementing system chooses — is an application concern, not a&lt;br&gt;
protocol primitive. HAVEN gives you the accounting. What's done with&lt;br&gt;
the accounting is yours to design.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"Cluster naturally" is a judgment call.&lt;/strong&gt; The&lt;br&gt;
argument that each of these four primitives is necessary doesn't&lt;br&gt;
rule out the possibility that some other set of four (or five) could&lt;br&gt;
do the same work via different decompositions. Design spaces resist&lt;br&gt;
that kind of proof, as post 2 already conceded. The defensible claim is&lt;br&gt;
necessity against the failures we named. Universal minimality is a&lt;br&gt;
separate question.&lt;/p&gt;

&lt;h2&gt;
  
  
  What comes next
&lt;/h2&gt;

&lt;p&gt;Specifying four primitives didn't dissolve every problem post 2&lt;br&gt;
named. Two of them surfaced as separate work during specification —&lt;br&gt;
not because the primitives are wrong, but because each ran into a&lt;br&gt;
verification regime the cryptographic primitives don't cover. One of&lt;br&gt;
them lives in a different epistemology entirely.&lt;/p&gt;

&lt;p&gt;The next post takes up the first of those two gaps.&lt;/p&gt;







&lt;ol&gt;

&lt;li id="fn1"&gt;
&lt;p&gt;HAVEN whitepaper v2.0 (February 2026). DOI: &lt;a href="https://doi.org/10.5281/zenodo.18701303" rel="noopener noreferrer"&gt;10.5281/zenodo.18701303&lt;/a&gt;. Source: &lt;a href="https://github.com/Chesterguan/HAVEN" rel="noopener noreferrer"&gt;github.com/Chesterguan/HAVEN&lt;/a&gt;. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn2"&gt;
&lt;p&gt;Git uses SHA-1 today, with migration to SHA-256 in progress. The construction is identical to HAVEN's: hash the content, use the hash as the identifier. Original design: Linus Torvalds, 2005. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn3"&gt;
&lt;p&gt;IPFS (InterPlanetary File System) implements the same model for general data storage. Content Identifiers (CIDs) are the operational form. The pattern long predates blockchain. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn4"&gt;
&lt;p&gt;Farrell, S., Kutscher, D., Dannewitz, C., Ohlman, B., Keränen, A., and Hallam-Baker, P. RFC 6920: &lt;em&gt;Naming Things with Hashes&lt;/em&gt;. April 2013. Defines the &lt;code&gt;ni:&lt;/code&gt; URI scheme for content-addressable resources, including hash-algorithm parameterization. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn5"&gt;
&lt;p&gt;"The Nuremberg Code." &lt;em&gt;Trials of War Criminals before the Nuremberg Military Tribunals.&lt;/em&gt; U.S. Government Printing Office, 1949 (originally issued 1947). ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn6"&gt;
&lt;p&gt;National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research. &lt;em&gt;The Belmont Report: Ethical Principles and Guidelines for the Protection of Human Subjects of Research.&lt;/em&gt; U.S. Department of Health, Education, and Welfare, 1979. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn7"&gt;
&lt;p&gt;HAVEN whitepaper §9, "What We're Not Trying to Do." ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn8"&gt;
&lt;p&gt;NIST Special Publication 800-63-4 (2024): &lt;em&gt;Digital Identity Guidelines&lt;/em&gt; — identity-proofing assurance levels. W3C &lt;em&gt;Decentralized Identifiers (DIDs) v1.0&lt;/em&gt;, W3C Recommendation, July 2022. OpenID Connect Core 1.0 (federated identity). eIDAS Regulation EU 910/2014 and eIDAS 2.0 (2024) for the EU's electronic identification framework. HAVEN is compatible with any of these as the underlying identity substrate. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn9"&gt;
&lt;p&gt;Merkle, R.C. "A digital signature based on a conventional encryption function." &lt;em&gt;Advances in Cryptology — CRYPTO '87.&lt;/em&gt; The hash-tree construction enabling O(log n) inclusion proofs. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn10"&gt;
&lt;p&gt;Laurie, B., Messeri, E., and Stradling, R. RFC 9162: &lt;em&gt;Certificate Transparency Version 2.0.&lt;/em&gt; December 2021. The current normative standard for the public web's certificate transparency logs. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn11"&gt;
&lt;p&gt;Haber, S., and Stornetta, W.S. "How to time-stamp a digital document." &lt;em&gt;Journal of Cryptology&lt;/em&gt; 3.2 (1991): 99-111. First presented at CRYPTO '90. The original hash-linked timestamp construction. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn12"&gt;
&lt;p&gt;Datacoup (founded 2012, NYC; shut down November 2019; later acquired by ODE July 2021), Datawallet (founded 2014; pivoted to crypto with a $40M DXT token sale in February 2018; functionally dormant by 2026), LunaDNA (founded December 2017 by Bob Kain et al.; closed January 31, 2024 citing capital shortage). Each attempted a patient-side data marketplace with various dividend models; each failed to attract either the patient volume or the research-buyer trust necessary to sustain a market. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn13"&gt;
&lt;p&gt;Ghorbani, A., and Zou, J. "Data Shapley: Equitable Valuation of Data for Machine Learning." &lt;em&gt;Proceedings of the 36th International Conference on Machine Learning (ICML),&lt;/em&gt; 2019. Shapley-value-based attribution for individual data points in machine-learning training sets. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;/ol&gt;

</description>
      <category>thesis</category>
      <category>haven</category>
      <category>protocol</category>
      <category>primitives</category>
    </item>
    <item>
      <title>Three failures, one missing layer</title>
      <dc:creator>Chester Guan （Ziyuan Guan）</dc:creator>
      <pubDate>Wed, 13 May 2026 15:42:54 +0000</pubDate>
      <link>https://forem.com/chesterguan/three-failures-one-missing-layer-1g53</link>
      <guid>https://forem.com/chesterguan/three-failures-one-missing-layer-1g53</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;Originally published at &lt;a href="https://prometheno.org/blog/2026-05-12-three-failures-one-layer" rel="noopener noreferrer"&gt;prometheno.org&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Now let's think together. In &lt;a href="https://dev.to/blog/2026-05-11-hello-from-here"&gt;&lt;em&gt;Hello from here&lt;/em&gt;&lt;/a&gt; I said I'd revisit what&lt;br&gt;
I got wrong in the Medium pieces last summer. This is that revisit.&lt;/p&gt;

&lt;p&gt;Last summer I wrote that healthcare AI keeps stalling for three reasons:&lt;br&gt;
fragmented data, missing audit, misaligned incentives. Ten months later,&lt;br&gt;
I still think those three failures are real. I no longer think they're&lt;br&gt;
three failures. They're symptoms of one.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I said last summer, restated
&lt;/h2&gt;

&lt;p&gt;The Medium pieces were diagnostic. They named the failures, gave each a&lt;br&gt;
name, and proposed each could be addressed separately. Fragmented data —&lt;br&gt;
fix it with better integration. Missing audit — fix it with better&lt;br&gt;
logging. Misaligned incentives — fix it with better economics. Three&lt;br&gt;
problems, three fixes, three workstreams.&lt;/p&gt;

&lt;p&gt;I still believe each of those failures is real. I have spent ten months&lt;br&gt;
trying to address them, primarily through the protocol I called HAVEN&lt;br&gt;
and the reference implementation that runs against MIMIC-IV today. What&lt;br&gt;
I have learned in those ten months is that I named them wrong. Not&lt;br&gt;
because the symptoms are wrong, but because the cause is one.&lt;/p&gt;

&lt;p&gt;Each of those three failures, when you look at what would actually fix&lt;br&gt;
it, requires the same thing: a layer of infrastructure that currently&lt;br&gt;
does not exist anywhere in healthcare. Not in a specific app. Not in&lt;br&gt;
any single regulation. Not in any platform. A layer that lives beneath&lt;br&gt;
the application layer, between the data and the things that use it,&lt;br&gt;
and is jointly governed rather than custodially owned.&lt;/p&gt;

&lt;p&gt;Healthcare hasn't built that layer. The reason the three failures are&lt;br&gt;
so persistent is that all of the actors who could build it are working&lt;br&gt;
at the wrong layer.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the three failures share
&lt;/h2&gt;

&lt;p&gt;Consider what each failure actually requires.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fragmented data&lt;/strong&gt; is a coordination problem. Each EHR holds part of a&lt;br&gt;
patient's record. Each direct-to-consumer health app holds another&lt;br&gt;
part. Each research dataset is a fixed snapshot of one institution.&lt;br&gt;
Fixing this requires not better storage but a way for the parts to&lt;br&gt;
refer to each other — to be the same record, verifiably, across systems&lt;br&gt;
that don't trust each other.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Missing audit&lt;/strong&gt; is a coordination problem. An audit log that lives&lt;br&gt;
inside the system being audited is auditable by the system's custodian&lt;br&gt;
only. To be trustworthy, the audit has to be visible from outside the&lt;br&gt;
custodian's reach. That means coordinating audit across actors who&lt;br&gt;
otherwise have no reason to cooperate.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Misaligned incentives&lt;/strong&gt; is a coordination problem. Patients contribute&lt;br&gt;
data; researchers use it; outcomes flow to neither directly. To realign&lt;br&gt;
requires value-tracking across that chain. No actor in the chain has the&lt;br&gt;
standing to track it on behalf of everyone. Value attribution at scale&lt;br&gt;
is shared accounting across systems that do not share a custodian. That&lt;br&gt;
is coordination, just at a different layer than data shape or audit&lt;br&gt;
trails.&lt;/p&gt;

&lt;p&gt;Three failures, one shape: each is a problem of coordination across&lt;br&gt;
actors who don't share a custodian. And there are four layers in&lt;br&gt;
current healthcare infrastructure where such coordination has been&lt;br&gt;
attempted: the application layer, the regulatory layer, the platform&lt;br&gt;
layer, and the standards layer. Each has tried to host the fix. Each&lt;br&gt;
has produced a layer-specific limitation worth examining in detail.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why each existing layer can't host the fix
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Application layer fails at coordination
&lt;/h3&gt;

&lt;p&gt;Most patient-data infrastructure today is application-layer. MyChart&lt;br&gt;
manages access to one health system's records. Pillpack manages&lt;br&gt;
medications. Apple Health stores a phone's sensor data. Each has&lt;br&gt;
consent UI. Each has logs. Each has some value model — even if "free"&lt;br&gt;
is the model. None of them coordinate with the others. Consent given&lt;br&gt;
in one is not visible to another. Audit logs in one are not auditable&lt;br&gt;
from another. Value accrued in one cannot be paid across them.&lt;/p&gt;

&lt;p&gt;You can build the best possible consent flow inside one application&lt;br&gt;
and still have failed at the actual problem, because the patient does&lt;br&gt;
not have one application. The patient has dozens. The data exists in&lt;br&gt;
dozens of systems. The application layer cannot, by its structural&lt;br&gt;
definition, coordinate across applications it does not contain.&lt;/p&gt;

&lt;p&gt;This is not a problem that better applications will solve. It is a&lt;br&gt;
problem that requires a layer applications can rest on, in the way that&lt;br&gt;
an HTTP server doesn't have to reimplement TCP.&lt;/p&gt;

&lt;h3&gt;
  
  
  Regulatory layer fails at latency
&lt;/h3&gt;

&lt;p&gt;HIPAA&lt;sup id="fnref1"&gt;1&lt;/sup&gt; defined privacy boundaries in 1996, before patient data was&lt;br&gt;
an AI training resource. It maps poorly onto questions like "who is&lt;br&gt;
allowed to train a model on this record," because the act of training&lt;br&gt;
does not look like the disclosure events HIPAA was designed to govern.&lt;/p&gt;

&lt;p&gt;GDPR&lt;sup id="fnref2"&gt;2&lt;/sup&gt; added the right to erasure in 2018. The right to erasure is&lt;br&gt;
a coherent demand for records held in databases. It is much less&lt;br&gt;
coherent for records held in the gradient weights of a deployed model.&lt;br&gt;
The right exists in statute; the mechanism for enforcing it for&lt;br&gt;
training data simply doesn't.&lt;/p&gt;

&lt;p&gt;The 21st Century Cures Act&lt;sup id="fnref3"&gt;3&lt;/sup&gt; and the subsequent ONC interoperability&lt;br&gt;
rules (2020–2022) mandated that patients receive access to their&lt;br&gt;
records via standardized APIs. Access is a precondition for&lt;br&gt;
sovereignty, not a substitute for it. Receiving the data is not the&lt;br&gt;
same as having rights about what is done with the data once it's&lt;br&gt;
received.&lt;/p&gt;

&lt;p&gt;What these regulations have in common is that they responded to&lt;br&gt;
whatever problem was visible at the time of drafting. By the time the&lt;br&gt;
regulation is in force, the technology has produced new problems.&lt;br&gt;
Regulation has structurally lower bandwidth than technology, which&lt;br&gt;
means whatever is built before regulation catches up will continue to&lt;br&gt;
operate, will continue to extract value, and will not be unwound by&lt;br&gt;
the eventual regulatory response. The fix has to exist before&lt;br&gt;
regulation, or it cannot exist at all.&lt;/p&gt;

&lt;h3&gt;
  
  
  Platform layer fails at consolidation
&lt;/h3&gt;

&lt;p&gt;The platform attempt is the most recent. Apple Health Records launched&lt;br&gt;
in 2018&lt;sup id="fnref4"&gt;4&lt;/sup&gt; with twelve partner health systems and now integrates with&lt;br&gt;
hundreds of US health systems. Google has had four separate goes at healthcare data&lt;br&gt;
(Google Health 2008–2011, Google Fit, DeepMind Streams, and Cloud&lt;br&gt;
Healthcare API)&lt;sup id="fnref5"&gt;5&lt;/sup&gt;, each closed or refocused. EHR vendors operate&lt;br&gt;
patient-facing portals that are platform-like at health-system scope.&lt;/p&gt;

&lt;p&gt;These platforms work, in the narrow sense that data does flow through&lt;br&gt;
them. They do not solve the sovereignty problem. They consolidate it.&lt;br&gt;
When Apple is the custodian of a unified patient-data layer, the&lt;br&gt;
patient is no longer the sovereign — Apple is, with the patient as&lt;br&gt;
user. When the EHR vendor is the custodian, the health system is.&lt;br&gt;
Sovereignty becomes mediated, which is the opposite of sovereignty.&lt;/p&gt;

&lt;p&gt;Platforms aren't bad. They're just not where the fix lives.&lt;/p&gt;

&lt;h3&gt;
  
  
  Standards layer fails at scope
&lt;/h3&gt;

&lt;p&gt;Healthcare has serious protocol-layer attempts. HL7 v2 standardized&lt;br&gt;
clinical message exchange in 1989&lt;sup id="fnref6"&gt;6&lt;/sup&gt;. HL7 FHIR has standardized&lt;br&gt;
RESTful access to clinical data since 2014&lt;sup id="fnref7"&gt;7&lt;/sup&gt;. The OMOP Common Data&lt;br&gt;
Model&lt;sup id="fnref8"&gt;8&lt;/sup&gt; codified the shape of observational research data&lt;br&gt;
across hundreds of institutions. SMART on FHIR&lt;sup id="fnref9"&gt;9&lt;/sup&gt; standardized&lt;br&gt;
authorization for clinical apps.&lt;/p&gt;

&lt;p&gt;These are real protocol-layer wins. They are not the wins the missing&lt;br&gt;
fix needs.&lt;/p&gt;

&lt;p&gt;Each of these standards governs the wire. FHIR specifies how to&lt;br&gt;
retrieve a record; it does not specify whether the retrieving party&lt;br&gt;
may train a model on it. OMOP specifies how a diagnosis is encoded;&lt;br&gt;
it does not specify who may access the cohort or what they owe the&lt;br&gt;
patients in it. SMART on FHIR specifies how an app authenticates; it&lt;br&gt;
does not specify what the patient should receive when the app's&lt;br&gt;
output is used in care.&lt;/p&gt;

&lt;p&gt;The standards layer scopes to data shape. The missing fix has to&lt;br&gt;
scope to data use. The two are complementary: a governance protocol&lt;br&gt;
operates over FHIR-shaped data and OMOP-modeled cohorts. What those&lt;br&gt;
standards don't provide.&lt;/p&gt;

&lt;h2&gt;
  
  
  What "protocol layer" means in this context
&lt;/h2&gt;

&lt;p&gt;A protocol is a set of rules that participants follow voluntarily,&lt;br&gt;
without any of them owning the rules or storing the data the rules&lt;br&gt;
govern. SMTP made email possible across institutions in 1981&lt;sup id="fnref10"&gt;10&lt;/sup&gt; — not&lt;br&gt;
because Bell Labs hosted email, but because everyone agreed on how to&lt;br&gt;
address it. HTTP made the web possible across servers in 1991&lt;sup id="fnref11"&gt;11&lt;/sup&gt; —&lt;br&gt;
not because Tim Berners-Lee hosted the web. DNS made naming possible&lt;br&gt;
without a single registrar&lt;sup id="fnref12"&gt;12&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;In each case, the protocol layer succeeded by enabling cross-system&lt;br&gt;
behavior that no single custodian could have provided. Each protocol&lt;br&gt;
was published, ratified by use, and operated without any party having&lt;br&gt;
permission to revoke it. Email has survived four decades of vendor&lt;br&gt;
consolidation because the protocol is older than the vendors.&lt;/p&gt;

&lt;p&gt;Healthcare data does not have such a layer. It has applications that&lt;br&gt;
consolidate. It has regulations that constrain disclosure. It has&lt;br&gt;
platforms that mediate. It has no shared rules for what a record is,&lt;br&gt;
what consent means, what audit consists of, or how value gets&lt;br&gt;
attributed. Each of those questions is currently answered application&lt;br&gt;
by application, regulation by regulation, platform by platform.&lt;/p&gt;

&lt;p&gt;The bet is that a protocol layer for patient-sovereign healthcare data&lt;br&gt;
could behave the way SMTP and HTTP did. Not because it solves any&lt;br&gt;
specific application problem better than that application would, but&lt;br&gt;
because it enables a class of cooperation that cannot happen without&lt;br&gt;
it. There is a second part to the bet: this layer is buildable now,&lt;br&gt;
before regulation forces a worse version of it, and before any single&lt;br&gt;
platform consolidates the territory.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this means for the next four posts
&lt;/h2&gt;

&lt;p&gt;If the missing layer is protocol, then specifying what such a protocol&lt;br&gt;
must provide is the next step. Not "consent and audit" as generic&lt;br&gt;
abstractions. Specific primitives, each with a job.&lt;/p&gt;

&lt;p&gt;The next post argues that four primitives carry the load:&lt;br&gt;
content-addressable Health Assets, programmable Consent, hash-chained&lt;br&gt;
Provenance, quality-weighted Contribution. Each maps to one of the&lt;br&gt;
failures named here. The claim is not that these four are provably&lt;br&gt;
the smallest possible set. Design spaces resist that kind of proof.&lt;br&gt;
The claim is that each one earns its place against a specific failure&lt;br&gt;
mode, and that the four cluster naturally rather than arbitrarily.&lt;/p&gt;

&lt;p&gt;That's a softer commitment than "minimum sufficient." It's the one I&lt;br&gt;
can defend. A reader who sees a natural fifth primitive should write&lt;br&gt;
back. The series is better for the pressure.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I underestimated
&lt;/h2&gt;

&lt;p&gt;When I wrote the Medium pieces last summer, I thought the field needed&lt;br&gt;
better tools. I now think it needs a layer the field hasn't built.&lt;br&gt;
That's a harder problem than the one I named.&lt;/p&gt;

&lt;p&gt;Building better tools in a missing layer is a treadmill.&lt;/p&gt;

&lt;p&gt;The next post specifies. The two after that examine the gaps that&lt;br&gt;
surfaced during the specification — gaps that became separate work&lt;br&gt;
because they live in different verification regimes. The fifth post&lt;br&gt;
commits to what would prove the whole argument wrong.&lt;/p&gt;







&lt;ol&gt;

&lt;li id="fn1"&gt;
&lt;p&gt;HIPAA, Public Law 104-191 (1996); Privacy Rule effective 2003. The statute governs disclosure of protected health information by covered entities. It is structurally about who may share what with whom, not about what may be inferred from what has been shared. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn2"&gt;
&lt;p&gt;GDPR, Regulation (EU) 2016/679, effective May 2018. Art. 17 (right to erasure) and Art. 20 (right to data portability). Art. 17 is binding on data controllers; the mechanism for applying it to data already encoded in trained model weights remains an open legal question. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn3"&gt;
&lt;p&gt;21st Century Cures Act, Public Law 114-255 (2016). Subsequent ONC interoperability rules: 85 FR 25642 (May 2020) and 89 FR 1437 (January 2024). FHIR R4 patient-access APIs mandated for certified health IT. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn4"&gt;
&lt;p&gt;Apple Health Records launched March 28, 2018. Initial 12 partner health systems; FHIR R4–based; now integrated with hundreds of US health systems. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn5"&gt;
&lt;p&gt;Google Health (consumer): 2008–2011. Google Fit: launched 2014. Google DeepMind Streams: piloted at Royal Free London 2016, criticized by UK ICO 2017, folded into Google Health 2018. Google Cloud Healthcare API: launched 2018, operational. None operate at protocol layer; all are platform plays. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn6"&gt;
&lt;p&gt;HL7 v2 (originally HL7 v2.1, 1989). Maintained by HL7 International; versions 2.3–2.7 in widespread clinical deployment. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn7"&gt;
&lt;p&gt;HL7 FHIR (Fast Healthcare Interoperability Resources). DSTU 1 published 2014; FHIR R4 became normative in 2019. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn8"&gt;
&lt;p&gt;OMOP Common Data Model, maintained by the OHDSI consortium. v5.x widely deployed across hundreds of research sites; v6.0 current. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn9"&gt;
&lt;p&gt;Mandel, J.C., Kreda, D.A., Mandl, K.D., Kohane, I.S., and Ramoni, R.B. "SMART on FHIR: A standards-based, interoperable apps platform for electronic health records." &lt;em&gt;Journal of the American Medical Informatics Association&lt;/em&gt; 23(5) (2016): 899-908. Initial profile published 2014; SMART App Launch Framework v2.0 in current use. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn10"&gt;
&lt;p&gt;Postel, J. (1981). RFC 821: Simple Mail Transfer Protocol. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn11"&gt;
&lt;p&gt;Berners-Lee, T. (1991). HTTP/0.9 first proposal; HTTP/1.0 standardized 1996, RFC 1945. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn12"&gt;
&lt;p&gt;Mockapetris, P. (1983). RFC 882, RFC 883: DNS specifications. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;/ol&gt;

</description>
      <category>thesis</category>
      <category>haven</category>
      <category>protocol</category>
    </item>
    <item>
      <title>Portfolio Post — What I'm Building: HAVEN, Prometheno</title>
      <dc:creator>Chester Guan （Ziyuan Guan）</dc:creator>
      <pubDate>Wed, 08 Apr 2026 15:24:05 +0000</pubDate>
      <link>https://forem.com/chesterguan/portfolio-post-what-im-building-haven-prometheno-2ihg</link>
      <guid>https://forem.com/chesterguan/portfolio-post-what-im-building-haven-prometheno-2ihg</guid>
      <description>&lt;p&gt;Been heads-down building the future of data governance, and I wanted to share a glimpse of what I'm working on.&lt;/p&gt;

&lt;p&gt;First, I'm developing HAVEN (Health Asset Value &amp;amp; Exchange Network) – a protocol for patient-controlled health data. It focuses on how health data is referenced, consented to, audited, and valued. Think of it as the foundational layer for a more equitable health data ecosystem. Check out the spec on GitHub: &lt;a href="https://github.com/Chesterguan/HAVEN" rel="noopener noreferrer"&gt;https://github.com/Chesterguan/HAVEN&lt;/a&gt;. Recent updates include refining the documentation and adding a logo to improve community contributions.&lt;/p&gt;

&lt;p&gt;I was also working on Prometheno, a patient-centered health data platform. The goal was to empower individuals to own, control, and benefit from their medical information while contributing to medical research on their own terms. You can find the project here: &lt;a href="https://github.com/Chesterguan/Prometheno" rel="noopener noreferrer"&gt;https://github.com/Chesterguan/Prometheno&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;These projects are all about empowering individuals with control over their data and how it's used. I'm excited to see where this journey takes me!&lt;/p&gt;

&lt;h1&gt;
  
  
  datagovernance #healthtech #opensource
&lt;/h1&gt;

</description>
      <category>linkedin</category>
      <category>projectscribe</category>
    </item>
  </channel>
</rss>
