<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: kodomonocch1</title>
    <description>The latest articles on Forem by kodomonocch1 (@kodomonocch1).</description>
    <link>https://forem.com/kodomonocch1</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3560540%2F476a7481-8c3b-48ba-87d1-c5d7ce69b5b5.png</url>
      <title>Forem: kodomonocch1</title>
      <link>https://forem.com/kodomonocch1</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/kodomonocch1"/>
    <language>en</language>
    <item>
      <title>Why I stopped measuring AI workflow validation by replies and started measuring it by real payloads</title>
      <dc:creator>kodomonocch1</dc:creator>
      <pubDate>Sun, 29 Mar 2026 13:39:13 +0000</pubDate>
      <link>https://forem.com/kodomonocch1/why-i-stopped-measuring-ai-workflow-validation-by-replies-and-started-measuring-it-by-real-payloads-4dep</link>
      <guid>https://forem.com/kodomonocch1/why-i-stopped-measuring-ai-workflow-validation-by-replies-and-started-measuring-it-by-real-payloads-4dep</guid>
      <description>&lt;p&gt;Most AI workflow demos still optimize for “looks structured.”&lt;/p&gt;

&lt;p&gt;That is not the same as “won’t break downstream.”&lt;/p&gt;

&lt;p&gt;A response can look clean, JSON-shaped, and convincing — and still be the exact thing that causes manual rework, routing mistakes, compliance issues, or downstream breakage.&lt;/p&gt;

&lt;p&gt;That’s the gap I’m trying to pressure-test.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I’m testing
&lt;/h2&gt;

&lt;p&gt;I’m building a narrow evaluator surface for high-stakes AI workflows.&lt;/p&gt;

&lt;p&gt;The terminal outcomes are intentionally constrained:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;accepted&lt;/li&gt;
&lt;li&gt;succeeded&lt;/li&gt;
&lt;li&gt;failed_safe&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The goal is simple:&lt;/p&gt;

&lt;p&gt;Either return something safe to use, or fail safely with a classification and a trust artifact.&lt;/p&gt;

&lt;p&gt;This is not a generic model wrapper.&lt;br&gt;
It is not broad “AI automation.”&lt;br&gt;
It is a narrow reliability layer for workflows where silent failure is expensive.&lt;/p&gt;

&lt;p&gt;Public evaluator kit:&lt;br&gt;&lt;br&gt;
&lt;a href="https://kodomonocch1.github.io/dlx-kernel/" rel="noopener noreferrer"&gt;https://kodomonocch1.github.io/dlx-kernel/&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Why I changed my mind
&lt;/h2&gt;

&lt;p&gt;At first, it’s easy to measure interest by replies, comments, or general reactions.&lt;/p&gt;

&lt;p&gt;But that’s not the real test.&lt;/p&gt;

&lt;p&gt;The real test is whether someone is willing to submit an actual workflow payload where bad output has a real downstream cost.&lt;/p&gt;

&lt;p&gt;That is a much better signal than opinions.&lt;/p&gt;

&lt;p&gt;If a system claims to improve reliability, it should be tested against real failure-sensitive payloads — not only polished demos.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I need now
&lt;/h2&gt;

&lt;p&gt;I need 1 real payload from a workflow where silent failure is expensive.&lt;/p&gt;

&lt;p&gt;Examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;document extraction&lt;/li&gt;
&lt;li&gt;invoice / AP automation&lt;/li&gt;
&lt;li&gt;procurement workflows&lt;/li&gt;
&lt;li&gt;ticket routing&lt;/li&gt;
&lt;li&gt;compliance classification&lt;/li&gt;
&lt;li&gt;any workflow where malformed structured output causes breakage or costly review work&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Submit here:&lt;br&gt;&lt;br&gt;
&lt;a href="https://kodomonocch1.github.io/dlx-kernel/submit-payload.html" rel="noopener noreferrer"&gt;https://kodomonocch1.github.io/dlx-kernel/submit-payload.html&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What I need from you
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;one sample payload&lt;/li&gt;
&lt;li&gt;one target schema&lt;/li&gt;
&lt;li&gt;one short note on downstream risk&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What I return
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;succeeded or failed_safe&lt;/li&gt;
&lt;li&gt;failure classification&lt;/li&gt;
&lt;li&gt;public-safe receipt / trust artifact&lt;/li&gt;
&lt;li&gt;initial evaluator review within 24 hours&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I’m not looking for broad onboarding, marketplace-style submissions, or generic support requests.&lt;/p&gt;

&lt;p&gt;I’m looking for one real payload that is worth testing against.&lt;/p&gt;

&lt;p&gt;If you have one, I’d really appreciate it.****&lt;/p&gt;

</description>
      <category>ai</category>
      <category>automation</category>
      <category>api</category>
      <category>tooling</category>
    </item>
    <item>
      <title>Searchable JSON compression: page-level random access + ms lookups (and smaller than Zstd on our dataset)</title>
      <dc:creator>kodomonocch1</dc:creator>
      <pubDate>Thu, 19 Feb 2026 19:12:38 +0000</pubDate>
      <link>https://forem.com/kodomonocch1/searchable-json-compression-page-level-random-access-ms-lookups-and-smaller-than-zstd-on-our-3k1h</link>
      <guid>https://forem.com/kodomonocch1/searchable-json-compression-page-level-random-access-ms-lookups-and-smaller-than-zstd-on-our-3k1h</guid>
      <description>&lt;h1&gt;
  
  
  Searchable JSON compression with page-level random access (and smaller than Zstd on our dataset)
&lt;/h1&gt;

&lt;p&gt;Most JSON compression stories end at “make it smaller.”&lt;br&gt;&lt;br&gt;
But in real systems, the bigger cost is often &lt;strong&gt;decompress + parse + scan&lt;/strong&gt; — repeatedly.&lt;/p&gt;

&lt;p&gt;I built &lt;strong&gt;SEE (Semantic Entropy Encoding)&lt;/strong&gt;: a &lt;strong&gt;searchable compression format for JSON/NDJSON&lt;/strong&gt; that keeps data &lt;strong&gt;queryable while compressed&lt;/strong&gt;, with &lt;strong&gt;page-level random access&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;On our dataset, SEE is &lt;strong&gt;smaller than Zstd&lt;/strong&gt; &lt;em&gt;and&lt;/em&gt; supports fast lookups (details + proof below).&lt;/p&gt;




&lt;h2&gt;
  
  
  Why this matters: the hidden “decompress+parse tax”
&lt;/h2&gt;

&lt;p&gt;If you store NDJSON as &lt;code&gt;zstd&lt;/code&gt;, most queries still pay:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;read large chunks&lt;/li&gt;
&lt;li&gt;decompress everything&lt;/li&gt;
&lt;li&gt;parse JSON&lt;/li&gt;
&lt;li&gt;scan for the field/value you need&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Even if the data is small, the &lt;strong&gt;CPU + I/O pattern&lt;/strong&gt; is brutal at scale.&lt;/p&gt;

&lt;p&gt;SEE targets workloads where you repeatedly need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;exists / pos / eq&lt;/strong&gt;-style queries&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;random access&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;low latency&lt;/strong&gt; without full decompression&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  What SEE is (in 60 seconds)
&lt;/h2&gt;

&lt;p&gt;SEE is a &lt;strong&gt;page-based&lt;/strong&gt;, &lt;strong&gt;schema-aware&lt;/strong&gt; format:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;page-level&lt;/strong&gt; layout for random access&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bloom + skip&lt;/strong&gt; to avoid touching irrelevant pages (high skip rate)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;schema-aware encoding&lt;/strong&gt; (structure + deltas + dictionary where useful)&lt;/li&gt;
&lt;li&gt;designed to reduce both:

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;data tax&lt;/strong&gt; (storage/egress)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CPU tax&lt;/strong&gt; (decompress/parse)&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;Trade-off: SEE optimizes for &lt;strong&gt;low I/O and low latency&lt;/strong&gt;, not always absolute minimum size (though it can win on size too, depending on the dataset).&lt;/p&gt;




&lt;h2&gt;
  
  
  KPI snapshot (public demo)
&lt;/h2&gt;

&lt;p&gt;These are the numbers we publish from the demo pack:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Combined size ratio:&lt;/strong&gt; ≈ &lt;strong&gt;19.5% of raw&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lookup latency (present):&lt;/strong&gt; p50 ≈ &lt;strong&gt;0.18 ms&lt;/strong&gt; / p95 ≈ &lt;strong&gt;0.28 ms&lt;/strong&gt; / p99 ≈ &lt;strong&gt;0.34 ms&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Skip ratio:&lt;/strong&gt; present ≈ &lt;strong&gt;0.99&lt;/strong&gt; / absent ≈ &lt;strong&gt;0.992&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bloom density:&lt;/strong&gt; ≈ &lt;strong&gt;0.30&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;“Combined” is the total footprint for the SEE artifact on the dataset we benchmarked.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fly7m0w7bcr8cjo58mrnb.png" alt=" " width="594" height="835"&gt;
&lt;/h2&gt;

&lt;h2&gt;
  
  
  Proof-first distribution (so you can verify without meetings)
&lt;/h2&gt;

&lt;p&gt;I intentionally ship &lt;strong&gt;reproducible packs&lt;/strong&gt;:&lt;/p&gt;

&lt;h3&gt;
  
  
  1) Demo ZIP (10 minutes)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;prebuilt wheel + sample &lt;code&gt;.see&lt;/code&gt; artifacts&lt;/li&gt;
&lt;li&gt;demo scripts that print KPIs (ratio/skip/bloom/p50–p99)&lt;/li&gt;
&lt;li&gt;OnePager PDF&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2) DD Pack (audit / repro artifacts)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;run summaries + &lt;code&gt;run_metrics.json&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;verification checklist (&lt;code&gt;pack_verify.txt&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;designed for technical diligence&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Recent robustness milestone: &lt;strong&gt;strict decode mismatch checks across multiple datasets = 0&lt;/strong&gt;&lt;br&gt;
(&lt;code&gt;decode_mismatch_count=0&lt;/code&gt;, &lt;code&gt;decode_extended_mismatch_count=0&lt;/code&gt;, audit PASS).&lt;/p&gt;




&lt;h2&gt;
  
  
  Quick start (demo)
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;see_proto
python samples/quick_demo.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
`&lt;/p&gt;

&lt;p&gt;This prints:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;compression ratio&lt;/li&gt;
&lt;li&gt;skip/bloom&lt;/li&gt;
&lt;li&gt;lookup p50/p95/p99&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;GitHub repo: &lt;a href="https://github.com/kodomonocch1/see_proto" rel="noopener noreferrer"&gt;https://github.com/kodomonocch1/see_proto&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Release (v0.1.1): &lt;a href="https://github.com/kodomonocch1/see_proto/releases/tag/v0.1.1" rel="noopener noreferrer"&gt;https://github.com/kodomonocch1/see_proto/releases/tag/v0.1.1&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want &lt;strong&gt;formal evaluation under NDA&lt;/strong&gt; (DD pack / deeper materials):&lt;br&gt;
&lt;a href="https://docs.google.com/forms/d/e/1FAIpQLScV2Ti592K3Za2r_WLUd0E6xSvCEVnlEOxYd6OGgbpJm0ADlg/viewform?usp=header" rel="noopener noreferrer"&gt;https://docs.google.com/forms/d/e/1FAIpQLScV2Ti592K3Za2r_WLUd0E6xSvCEVnlEOxYd6OGgbpJm0ADlg/viewform?usp=header&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; company email is preferred, but DMs are welcome too (no confidential data needed at first contact).&lt;/p&gt;




&lt;h2&gt;
  
  
  What I’m looking for
&lt;/h2&gt;

&lt;p&gt;SEE is not a SaaS product.&lt;br&gt;
I’m exploring &lt;strong&gt;strategic acquisition&lt;/strong&gt; or an &lt;strong&gt;exclusive license&lt;/strong&gt; with teams that have a clear integration path.&lt;/p&gt;

&lt;p&gt;To keep evaluation high-signal, I run &lt;strong&gt;up to a small number of NDA evals per month&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;If you’re on a data platform / infra / storage team and you can point to where this fits, I’d love to hear from you.&lt;/p&gt;

</description>
      <category>compression</category>
      <category>json</category>
      <category>performance</category>
      <category>rust</category>
    </item>
    <item>
      <title>Making JSON Compression Searchable — SEE (Schema-Aware Encoding)</title>
      <dc:creator>kodomonocch1</dc:creator>
      <pubDate>Sun, 12 Oct 2025 13:11:42 +0000</pubDate>
      <link>https://forem.com/kodomonocch1/making-json-compression-searchable-see-schema-aware-encoding-4ojk</link>
      <guid>https://forem.com/kodomonocch1/making-json-compression-searchable-see-schema-aware-encoding-4ojk</guid>
      <description>&lt;p&gt;The Problem: Cloud Cost Isn’t Just Storage&lt;/p&gt;

&lt;p&gt;Compression is easy — until you need it searchable.&lt;/p&gt;

&lt;p&gt;Traditional codecs like gzip and Zstd reduce storage size,&lt;br&gt;
but they do nothing for I/O and CPU cost.&lt;/p&gt;

&lt;p&gt;Every query still triggers:&lt;/p&gt;

&lt;p&gt;→ decompress → parse → filter → aggregate&lt;/p&gt;

&lt;p&gt;If your data is JSON or NDJSON, that pipeline dominates your bill.&lt;br&gt;
That’s what we call the hidden cloud tax — the cost of moving and re-reading your own data.&lt;br&gt;
The Breakthrough: Schema-Aware Compression&lt;/p&gt;

&lt;p&gt;SEE (Semantic Entropy Encoding) is a new type of codec that keeps JSON searchable while compressed.&lt;/p&gt;

&lt;p&gt;It doesn’t just shrink bytes — it understands the structure.&lt;/p&gt;

&lt;p&gt;Core idea:&lt;br&gt;
Structure × Δ (delta) × Zstd + Bloom filters + PageDir mini-index&lt;/p&gt;

&lt;p&gt;That means:&lt;/p&gt;

&lt;p&gt;You can skip 99% of irrelevant data&lt;/p&gt;

&lt;p&gt;Lookup latency ≈ 0.18 ms (p50)&lt;/p&gt;

&lt;p&gt;Combined size ≈ 19.5% of raw&lt;/p&gt;

&lt;p&gt;100% reproducible from the demo&lt;/p&gt;

&lt;p&gt;Architecture in One Picture&lt;/p&gt;

&lt;p&gt;👉 SpeakerDeck Slides&lt;/p&gt;

&lt;p&gt;SEE vs Zstd:&lt;/p&gt;

&lt;p&gt;Metric  SEE Zstd&lt;br&gt;
Combined ratio  0.194   0.137&lt;br&gt;
Lookup p50 (ms) 0.18    n/a&lt;br&gt;
Skip rate   0.99    0&lt;/p&gt;

&lt;p&gt;SEE trades 5–10% of size for 90% fewer I/O ops.&lt;br&gt;
At cloud scale, that’s not optimization — that’s an economic correction.&lt;/p&gt;

&lt;p&gt;Quick Demo (10 minutes)&lt;/p&gt;

&lt;p&gt;No build needed. Works on Windows, macOS, or Linux.&lt;/p&gt;

&lt;p&gt;pip install see_proto&lt;br&gt;
python samples/quick_demo.py&lt;/p&gt;

&lt;p&gt;Outputs:&lt;/p&gt;

&lt;p&gt;ratio_see[str] = 0.169&lt;br&gt;
ratio_see[combined] = 0.194&lt;br&gt;
skip_present = 0.99&lt;br&gt;
skip_absent = 0.992&lt;br&gt;
lookup_p50 = 0.18 ms&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7aa8f25vvzziq3io1rss.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7aa8f25vvzziq3io1rss.png" alt=" " width="587" height="824"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You’ll get the same metrics as the public benchmark.&lt;/p&gt;

&lt;p&gt;Economic Impact&lt;/p&gt;

&lt;p&gt;At $0.05/GB egress and 100 EB/month traffic:&lt;/p&gt;

&lt;p&gt;Savings = $7.2 B/year&lt;/p&gt;

&lt;p&gt;Payback = &amp;lt; 4 days&lt;/p&gt;

&lt;p&gt;ROI ≈ 11,000%&lt;/p&gt;

&lt;p&gt;Whoever controls SEE, controls cloud economics.&lt;/p&gt;

&lt;p&gt;How It’s Built&lt;/p&gt;

&lt;p&gt;Core implementation in Rust with a Zstd dictionary backend.&lt;br&gt;
Python bindings (via maturin) make the demo fully reproducible.&lt;/p&gt;

&lt;p&gt;The schema-aware layer applies:&lt;/p&gt;

&lt;p&gt;Delta + ZigZag integer encoding&lt;/p&gt;

&lt;p&gt;Shared dictionaries for string reuse&lt;/p&gt;

&lt;p&gt;PageDir and mini-index for random access&lt;/p&gt;

&lt;p&gt;Bloom filters for skip prediction&lt;/p&gt;

&lt;p&gt;Each .see file includes a compact metadata header so partial decoding is possible.&lt;/p&gt;

&lt;p&gt;Try It Yourself&lt;/p&gt;

&lt;p&gt;👉 GitHub: (&lt;a href="https://github.com/kodomonocch1/see_proto" rel="noopener noreferrer"&gt;https://github.com/kodomonocch1/see_proto&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;👉 Slides (SpeakerDeck): (&lt;a href="https://speakerdeck.com/tetsu05/see-the-hidden-cloud-tax-breaker-schema-aware-compression-beyond-zstd" rel="noopener noreferrer"&gt;https://speakerdeck.com/tetsu05/see-the-hidden-cloud-tax-breaker-schema-aware-compression-beyond-zstd&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;👉 Deep dive article (Medium): (&lt;a href="https://medium.com/@tetsutetsu11/the-hidden-cloud-tax-and-the-schema-aware-revolution-46b5038c57b8" rel="noopener noreferrer"&gt;https://medium.com/@tetsutetsu11/the-hidden-cloud-tax-and-the-schema-aware-revolution-46b5038c57b8&lt;/a&gt;&lt;br&gt;
)&lt;br&gt;
If you’ve used Parquet, Zstd, or Arrow — this fits right between them,&lt;br&gt;
but tuned for JSON-first workloads.&lt;/p&gt;

&lt;p&gt;Closing Thoughts&lt;/p&gt;

&lt;p&gt;SEE isn’t just a faster codec.&lt;br&gt;
It’s a new layer of data efficiency for the cloud economy —&lt;br&gt;
one that turns compression from a technical optimization into a financial advantage.&lt;/p&gt;

&lt;p&gt;From Bytes to Balance Sheets.&lt;/p&gt;

&lt;p&gt;PS: Discussion&lt;/p&gt;

&lt;p&gt;If you’ve tested SEE on your own dataset (logs, telemetry, NDJSON),&lt;br&gt;
share your results — we’re tracking performance across real workloads.&lt;/p&gt;

</description>
      <category>dataengineering</category>
      <category>finops</category>
      <category>cloud</category>
      <category>compression</category>
    </item>
  </channel>
</rss>
