<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: kent-tokyo</title>
    <description>The latest articles on Forem by kent-tokyo (@kent-tokyo).</description>
    <link>https://forem.com/kent-tokyo</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3936287%2Fbb5ec013-43fb-485c-b099-db72395640b5.png</url>
      <title>Forem: kent-tokyo</title>
      <link>https://forem.com/kent-tokyo</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/kent-tokyo"/>
    <language>en</language>
    <item>
      <title>Cheminformatics in Rust in 2025-2026: What Exists, What Doesn't, and Why</title>
      <dc:creator>kent-tokyo</dc:creator>
      <pubDate>Mon, 25 May 2026 13:07:26 +0000</pubDate>
      <link>https://forem.com/kent-tokyo/cheminformatics-in-rust-in-2025-2026-what-exists-what-doesnt-and-why-e3h</link>
      <guid>https://forem.com/kent-tokyo/cheminformatics-in-rust-in-2025-2026-what-exists-what-doesnt-and-why-e3h</guid>
      <description>&lt;p&gt;RDKit has been the dominant cheminformatics library since its open-source release in 2006. It is written in C++, wrapped in Python, and has accumulated nearly two decades of validated chemistry: SMILES and SMARTS parsing, multiple fingerprint types, 2D coordinate generation, 3D conformer generation, MMFF94 and UFF force fields, a PostgreSQL cartridge. Most cheminformatics pipelines assume it is present.&lt;/p&gt;

&lt;p&gt;In mid-2026, Rust's answer is &lt;code&gt;rdkit-sys&lt;/code&gt; — bindings to RDKit's C++ CFFI interface — and a collection of pure-Rust crates that stalled in 2020-2021.&lt;/p&gt;

&lt;h2&gt;
  
  
  What exists in 2025-2026
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Crate&lt;/th&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Latest&lt;/th&gt;
&lt;th&gt;Status&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;rdkit-sys&lt;/td&gt;
&lt;td&gt;C++ FFI to RDKit&lt;/td&gt;
&lt;td&gt;0.4.12 (Oct 2024)&lt;/td&gt;
&lt;td&gt;Maintained&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;openbabel&lt;/td&gt;
&lt;td&gt;C++ FFI to Open Babel&lt;/td&gt;
&lt;td&gt;0.5.4 (Jan 2025)&lt;/td&gt;
&lt;td&gt;Maintained&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;chemcore&lt;/td&gt;
&lt;td&gt;Pure Rust&lt;/td&gt;
&lt;td&gt;0.4.1 (Feb 2021)&lt;/td&gt;
&lt;td&gt;Unmaintained&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;purr&lt;/td&gt;
&lt;td&gt;Pure Rust (SMILES parser)&lt;/td&gt;
&lt;td&gt;0.9.0 (Mar 2021)&lt;/td&gt;
&lt;td&gt;Unmaintained&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;smiles-parser&lt;/td&gt;
&lt;td&gt;Pure Rust (SMILES parser)&lt;/td&gt;
&lt;td&gt;0.4.1 (Nov 2020)&lt;/td&gt;
&lt;td&gt;Unmaintained&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;cosmolkit&lt;/td&gt;
&lt;td&gt;Pure Rust (new attempt)&lt;/td&gt;
&lt;td&gt;0.2.3 (May 2026)&lt;/td&gt;
&lt;td&gt;New, unproven&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The pattern in the pure-Rust column is consistent: implementations hit a wall around 2020-2021 and stopped. The active work is FFI bindings to existing C++ tools. A new attempt (&lt;code&gt;cosmolkit&lt;/code&gt;) appeared recently with an ambitious scope — SMILES, SDF, conformers, molecular graphs — but with under 800 downloads it is too early to evaluate.&lt;/p&gt;

&lt;h2&gt;
  
  
  SMILES parsing is solved. The rest is not.
&lt;/h2&gt;

&lt;p&gt;Parsing a SMILES string is a context-free grammar problem, and Rust handles those well. &lt;code&gt;purr&lt;/code&gt; implements the full OpenSMILES specification. &lt;code&gt;smiles-parser&lt;/code&gt; does the same. Both work. Neither has had a release since 2020-2021.&lt;/p&gt;

&lt;p&gt;The problem starts after parsing.&lt;/p&gt;

&lt;p&gt;A SMILES string like &lt;code&gt;c1ccccc1&lt;/code&gt; (benzene) uses lowercase atoms to indicate aromaticity. To do anything useful — calculate molecular weight, count implicit hydrogens, check valence — you need to convert it to a Kekulé structure: alternating single and double bonds. This is kekulization, and it is a constraint-satisfaction problem on the molecular graph.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;chemcore&lt;/code&gt;, the most complete pure-Rust attempt, has supported kekulization since its initial release (v0.1.x, June 2020). A benchmark published alongside v0.3.1 in October 2020 showed it handling edge cases that RDKit cannot. But kekulization is one step. What chemcore does not have: fingerprints, 2D coordinate generation, SMARTS matching, or stereochemistry. The last release was February 2021. Getting past kekulization turned out not to be the finishing line.&lt;/p&gt;

&lt;h2&gt;
  
  
  Aromaticity: no agreed definition
&lt;/h2&gt;

&lt;p&gt;Even with kekulization in place, aromaticity perception is harder than it looks — partly because aromaticity itself has no single agreed-upon definition in cheminformatics.&lt;/p&gt;

&lt;p&gt;Hückel's rule — 4n+2 π electrons — works for monocyclic systems. For polycyclic aromatics and heteroaromatics, implementations diverge. Daylight's original SMILES aromatic model differs from RDKit's model, which differs from CDK's. An algorithm that kekulizes correctly under one model may fail under another.&lt;/p&gt;

&lt;p&gt;Any pure-Rust toolkit that wants to produce output compatible with RDKit-generated SMILES needs to match RDKit's aromaticity behavior exactly, not implement some variant of Hückel. That requires reading RDKit's source code and testing against its outputs. It is months of work before any of it is visible to end users.&lt;/p&gt;

&lt;h2&gt;
  
  
  2D coordinate generation: not attempted
&lt;/h2&gt;

&lt;p&gt;Every cheminformatics toolkit ships 2D depiction — you cannot work with molecules you cannot see. The layout problem is harder than it looks.&lt;/p&gt;

&lt;p&gt;RDKit ships its own 2D depiction engine (&lt;code&gt;rdDepictor&lt;/code&gt;) and also integrates Schrodinger's &lt;code&gt;CoordGen&lt;/code&gt; library because &lt;code&gt;rdDepictor&lt;/code&gt; alone produces clashing depictions for complex ring systems. Two tools are needed because neither is sufficient alone. CoordGen works by matching known ring scaffold templates and running iterative geometry optimization for everything else.&lt;/p&gt;

&lt;p&gt;No pure-Rust crate has attempted 2D coordinate generation. Getting it right requires ring perception, a library of scaffold templates, and an optimization pass to resolve clashes. It is a multi-month project, and the output is still wrong until enough templates are added.&lt;/p&gt;

&lt;h2&gt;
  
  
  Substructure search: the graph is not the chemistry
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;petgraph&lt;/code&gt; (v0.8.3, 377M total downloads) provides VF2-based subgraph isomorphism and is actively maintained. VF2 is the standard algorithm for this — roughly an order of magnitude faster than Ullmann on typical molecule-sized graphs. The graph infrastructure exists in Rust.&lt;/p&gt;

&lt;p&gt;SMARTS matching, which is how substructure search works in cheminformatics, requires more than graph isomorphism. A SMARTS pattern &lt;code&gt;[#6;r6]&lt;/code&gt; means "a carbon atom in a 6-membered ring." Matching it requires: parsing SMARTS syntax, knowing which atoms belong to which rings, and matching node attributes with chemical semantics — atomic number, formal charge, aromaticity flag, implicit hydrogen count.&lt;/p&gt;

&lt;p&gt;Connecting &lt;code&gt;petgraph&lt;/code&gt;'s isomorphism to a chemistry-aware molecular graph is exactly the glue code that no published Rust crate provides.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why bindings are the rational choice
&lt;/h2&gt;

&lt;p&gt;RDKit's changelog goes back to 2006. The codebase contains 200+ molecular descriptors, MMFF94 and UFF force fields with their respective validation papers, an ETKDG 3D conformer generator that uses torsion angle statistics from the Cambridge Structural Database, and a PostgreSQL cartridge for large-scale screening. The Python ecosystem wraps all of this: &lt;code&gt;chembl_webresource_client&lt;/code&gt; for ChEMBL API access, PandasTools, scikit-learn integration for ML on fingerprints.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;rdkit-sys&lt;/code&gt; exposes a fraction of this via RDKit's CFFI interface. Choosing bindings over a rewrite is not a concession. It is what you do when you look at how much chemistry is embedded in that C++ code and how long it took to get there.&lt;/p&gt;

&lt;h2&gt;
  
  
  What changed in 2024-2025, and what 2026 adds so far
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;2024-2025:&lt;/strong&gt; &lt;code&gt;rdkit-sys&lt;/code&gt; had three releases in 2024, the last in October, and moved into the &lt;code&gt;rdkit-rs/rdkit&lt;/code&gt; monorepo. &lt;code&gt;openbabel&lt;/code&gt; (Rust bindings) released 0.5.4 in January 2025 — it exposes Open Babel's &lt;code&gt;OBSmartsPattern&lt;/code&gt;, which matters if you need substructure search without pulling in RDKit.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2026:&lt;/strong&gt; The only 2026-specific addition is &lt;code&gt;cosmolkit&lt;/code&gt; (v0.2.3, May 2026, 778 downloads). It claims an ambitious scope — SMILES, SDF, conformers, molecular graphs, "AI-ready workflows" — but it is too new to evaluate. Whether it addresses aromaticity perception and 2D layout, the parts that stopped every earlier attempt, is not clear from the current documentation.&lt;/p&gt;

&lt;p&gt;As of this writing, nothing else has shipped in 2026. The structural gap between Rust and Python cheminformatics is the same as it was in 2025.&lt;/p&gt;

&lt;h2&gt;
  
  
  The actual hard part
&lt;/h2&gt;

&lt;p&gt;The challenging problems in cheminformatics are not Rust-specific. Ownership and lifetimes will slow you down on day one; aromaticity will block you on month three. The chemistry fundamentals — aromaticity perception, 2D layout, stereochemistry, substructure matching — require domain knowledge that does not come from a Rust tutorial.&lt;/p&gt;

&lt;p&gt;RDKit did not get where it is because C++ is better than Rust. It got there because a team of chemists and programmers spent two decades solving specific, hard chemistry problems. Whoever builds the Rust equivalent will need to solve the same problems.&lt;/p&gt;

&lt;p&gt;I have been working around these gaps while building &lt;a href="https://github.com/kent-tokyo/chem-wasm-lens" rel="noopener noreferrer"&gt;chem-wasm-lens&lt;/a&gt;, a pure-Rust molecular analysis library targeting the browser via WebAssembly. Restricting scope — no SMARTS, no full stereochemistry — made it possible to ship. But restricted scope is different from a general-purpose toolkit, and that distinction matters.&lt;/p&gt;

</description>
      <category>cheminformatics</category>
      <category>rust</category>
      <category>chemistry</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Rust in 2025-2026: From 'Most Loved Language' to Core Infrastructure</title>
      <dc:creator>kent-tokyo</dc:creator>
      <pubDate>Sun, 24 May 2026 04:38:04 +0000</pubDate>
      <link>https://forem.com/kent-tokyo/rust-in-2025-2026-from-most-loved-language-to-core-infrastructure-4l5k</link>
      <guid>https://forem.com/kent-tokyo/rust-in-2025-2026-from-most-loved-language-to-core-infrastructure-4l5k</guid>
      <description>&lt;p&gt;Rust has held the top spot in Stack Overflow's "most admired language" survey since 2016, nearly without interruption. But what happened in 2025 is no longer just about popularity polls. Rust is quietly but steadily becoming the language that underpins critical infrastructure.&lt;/p&gt;

&lt;h2&gt;
  
  
  "Experimental" Status Removed from the Linux Kernel
&lt;/h2&gt;

&lt;p&gt;In December 2025, at the Kernel Maintainers Summit held in Tokyo, Rust's status in the Linux kernel was elevated from experimental to an officially recognized implementation language.&lt;/p&gt;

&lt;p&gt;As &lt;a href="https://lwn.net/Articles/1049831/" rel="noopener noreferrer"&gt;LWN.net reported&lt;/a&gt;, the outcome was unambiguous: "The consensus among the assembled developers is that Rust in the kernel is no longer experimental — it is now a core part of the kernel and is here to stay." Maintainer Steven Rostedt noted there was zero pushback in the room. About five years after Linus first suggested the possibility in 2020, the matter was settled.&lt;/p&gt;

&lt;p&gt;Rust code in the kernel currently stands at around 25,000 lines (compared to 34 million lines of C) — still a small share — but the subsystems adopting it are steadily expanding.&lt;/p&gt;

&lt;h3&gt;
  
  
  Rust Code Running in Production
&lt;/h3&gt;

&lt;p&gt;Major components using Rust in the kernel today:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;PHY drivers&lt;/strong&gt; — network physical layer&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;null block driver&lt;/strong&gt; — test block device&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Android Binder driver&lt;/strong&gt; — kernel-side implementation of Android's IPC mechanism&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Apple AGX GPU driver&lt;/strong&gt; — for Apple Silicon, via the Asahi Linux project&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Nova GPU driver&lt;/strong&gt; — for NVIDIA Turing-generation hardware (RTX 20 / GTX 16 series)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The Nova driver is architecturally interesting. It is split into two crates: &lt;code&gt;nova-core&lt;/code&gt; (hardware initialization and communication) and &lt;code&gt;nova-drm&lt;/code&gt; (Linux DRM API implementation), using an adapter pattern that maps different bus types — PCI, platform, USB — to the same types. Note that as of early 2026, full enablement is still in progress; development is ongoing.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why Rust is Well-Suited for the Kernel
&lt;/h3&gt;

&lt;p&gt;The majority of kernel driver bugs stem from memory safety issues: NULL pointer dereferences, use-after-free, buffer overflows, data races. In C, the only mitigation is "write carefully." Rust eliminates these classes of bugs at compile time, by construction.&lt;/p&gt;

&lt;p&gt;Greg Kroah-Hartman has stated that Rust drivers are safer than their C counterparts, and this is exactly why. The 25,000-line figure is small, but it also means Rust is replacing the parts where "bugs would have been guaranteed if written in C."&lt;/p&gt;

&lt;h2&gt;
  
  
  The Technical Story Behind &lt;code&gt;async closures&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;Among the stabilizations in 2025, &lt;code&gt;async closures&lt;/code&gt; (Rust 1.85) were a deeper change than they appear.&lt;/p&gt;

&lt;h3&gt;
  
  
  What was wrong with the old workarounds?
&lt;/h3&gt;

&lt;p&gt;Previously, writing an "async closure" looked like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="c1"&gt;// The closure couldn't borrow captured variables inside the Future.&lt;/span&gt;
&lt;span class="c1"&gt;// You had to either move ownership in, or wrap with Arc&amp;lt;Mutex&amp;lt;T&amp;gt;&amp;gt;.&lt;/span&gt;
&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nd"&gt;vec!&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;|&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;i32&lt;/span&gt;&lt;span class="p"&gt;|&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="nf"&gt;.clone&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt; &lt;span class="c1"&gt;// clone required&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;move&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nb"&gt;usize&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The problem with &lt;code&gt;|x| async move { ... }&lt;/code&gt; was that the returned &lt;code&gt;Future&lt;/code&gt; could not borrow captures from the closure itself. Because the &lt;code&gt;Future&lt;/code&gt; has a different lifetime from the closure, you had no way to pass references — you had to either move ownership or clone.&lt;/p&gt;

&lt;h3&gt;
  
  
  The &lt;code&gt;AsyncFn&lt;/code&gt; Trait Hierarchy
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;async closures&lt;/code&gt; introduced in Rust 1.85 come with a new trait hierarchy internally: &lt;code&gt;AsyncFn&lt;/code&gt; / &lt;code&gt;AsyncFnMut&lt;/code&gt; / &lt;code&gt;AsyncFnOnce&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;AsyncFnOnce
  └─ AsyncFnMut
       └─ AsyncFn
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This mirrors the existing &lt;code&gt;Fn*&lt;/code&gt; traits, but with a critical difference: the &lt;code&gt;Future&lt;/code&gt; returned by an &lt;code&gt;async closure&lt;/code&gt; &lt;strong&gt;can borrow from the closure itself (lending)&lt;/strong&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nd"&gt;vec!&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;

&lt;span class="c1"&gt;// Rust 1.85+: the Future can borrow data directly&lt;/span&gt;
&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="p"&gt;|&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;usize&lt;/span&gt;&lt;span class="p"&gt;|&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="c1"&gt;// Usage at function boundaries&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="n"&gt;apply&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;F&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;AsyncFn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;usize&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;i32&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;F&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;usize&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;i32&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nf"&gt;f&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="k"&gt;.await&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This works because &lt;code&gt;AsyncFnMut&lt;/code&gt;'s &lt;code&gt;CallRefFuture&lt;/code&gt; associated type is designed to propagate the &lt;code&gt;&amp;amp;mut self&lt;/code&gt; lifetime into the &lt;code&gt;Future&lt;/code&gt;. Just as &lt;code&gt;FnMut&lt;/code&gt; returns &lt;code&gt;&amp;amp;mut self&lt;/code&gt; to the caller, &lt;code&gt;AsyncFnMut&lt;/code&gt; lets the &lt;code&gt;Future&lt;/code&gt; hold that lifetime.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;let chains&lt;/code&gt; — Quietly Important
&lt;/h3&gt;

&lt;p&gt;Also stabilized in 2025, &lt;code&gt;let chains&lt;/code&gt; (Rust 1.88, Rust 2024 edition only) look simple but are significant: you can now freely combine &lt;code&gt;if let&lt;/code&gt; patterns and ordinary &lt;code&gt;bool&lt;/code&gt; conditions with &lt;code&gt;&amp;amp;&amp;amp;&lt;/code&gt;. Using this requires &lt;code&gt;edition = "2024"&lt;/code&gt; in &lt;code&gt;Cargo.toml&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="c1"&gt;// before: forced to nest or introduce temp variables&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nf"&gt;Some&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_user&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="nf"&gt;.is_active&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nf"&gt;Some&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="nf"&gt;.role&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="nd"&gt;println!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"{}"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// after: flatten conditions and pattern matches on one line&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nf"&gt;Some&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_user&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="nf"&gt;.is_active&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nf"&gt;Some&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="nf"&gt;.role&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nd"&gt;println!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"{}"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The same syntax works in &lt;code&gt;while let&lt;/code&gt; and &lt;code&gt;match&lt;/code&gt; guards, visibly reducing nesting depth throughout a codebase.&lt;/p&gt;

&lt;h2&gt;
  
  
  Safety Certification: A New Frontier
&lt;/h2&gt;

&lt;p&gt;In December 2025, Ferrous Systems obtained IEC 61508 SIL 2 certification from TÜV SÜD for a subset of Rust's &lt;code&gt;core&lt;/code&gt; library, under &lt;strong&gt;Ferrocene 25.11.0&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is IEC 61508?
&lt;/h3&gt;

&lt;p&gt;IEC 61508 is an international standard for functional safety of electrical, electronic, and programmable electronic safety-related systems. SIL (Safety Integrity Level) 2 corresponds to a "probability of dangerous failure of 10^-7 to 10^-6 per hour" — the level required in safety-critical domains such as aerospace, medical devices, industrial machinery, and automotive.&lt;/p&gt;

&lt;p&gt;Historically, the de facto standard for safety-critical embedded systems has been C/C++ with MISRA. Rust achieving this level of certification means it has officially stepped into the domain where functional safety is required — not just systems programming for general use.&lt;/p&gt;

&lt;h3&gt;
  
  
  MISRA C 2025 Addendum 6
&lt;/h3&gt;

&lt;p&gt;The same year, MISRA C 2025 Addendum 6 was published: an assessment of how MISRA C rules apply to Rust. The conclusion is that many existing C-specific rules are simply not applicable to Rust — the compiler enforces them by construction via the ownership model. This document also lays the groundwork for a future Rust-specific MISRA rule set.&lt;/p&gt;

&lt;h2&gt;
  
  
  Production Adoption Approaching 50%
&lt;/h2&gt;

&lt;p&gt;According to JetBrains' &lt;a href="https://blog.jetbrains.com/rust/2026/02/11/state-of-rust-2025/" rel="noopener noreferrer"&gt;State of Rust Ecosystem 2025&lt;/a&gt;, roughly half of surveyed organizations are now using Rust in production in non-trivial ways — a significant jump from around 38-39% in 2023.&lt;/p&gt;

&lt;p&gt;Microsoft has started rewriting low-level Windows components in Rust, and Google, Amazon, and Meta have each introduced it into OS-level systems.&lt;/p&gt;

&lt;h2&gt;
  
  
  An Unexpected Fit for the LLM Era
&lt;/h2&gt;

&lt;p&gt;In an era dominated by generative AI, Rust is being reassessed in an unexpected way.&lt;/p&gt;

&lt;p&gt;LLM-generated code contains mistakes. Rust's compiler returns those mistakes immediately and concretely as build errors. The type system and borrow checker enumerate the mistakes an AI made — before the code ever runs.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;error[E0502]: cannot borrow `data` as mutable because it is also borrowed as immutable
  --&amp;gt; src/main.rs:8:5
   |
6  |     let r = &amp;amp;data;
   |             ----- immutable borrow occurs here
7  |     println!("{}", r);
8  |     data.push(4);
   |     ^^^^^^^^^^^^ mutable borrow occurs here
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The tight feedback loop — "if it compiles, memory safety is guaranteed" — is well-suited for pair programming with an LLM. Fixing Rust compile errors has higher reproducibility than debugging Python runtime errors.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Biggest Concern: Adoption May Plateau
&lt;/h2&gt;

&lt;p&gt;The &lt;a href="https://blog.rust-lang.org/2026/03/02/2025-State-Of-Rust-Survey-results/" rel="noopener noreferrer"&gt;State of Rust Survey 2025&lt;/a&gt; found that the top concern is not a technical one.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"Not enough adoption in the tech industry" came in first at 42.1%&lt;/strong&gt; (narrowly ahead of "the language is getting too complex" at 41.6%).&lt;/p&gt;

&lt;p&gt;The learning curve remains steep and unresolved. Some feel the language itself is growing more complex over time. In early-stage startups where velocity is critical, Rust's upfront cost is real.&lt;/p&gt;

&lt;p&gt;That said, this concern is also the voice of people who want to use Rust more but see adoption lagging. The fact that technical concerns did not dominate the top of the list suggests Rust has reached a meaningful level of maturity.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;In one sentence: as of 2026, Rust is in the early stages of a shift from "the language people love" to "the language people need."&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Linux kernel&lt;/strong&gt; — Experimental status removed. Real drivers like Nova and Apple AGX are running.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;async closures&lt;/strong&gt; — The &lt;code&gt;AsyncFn&lt;/code&gt; trait hierarchy lets Futures borrow from their enclosing closure.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;let chains&lt;/strong&gt; — Flatten &lt;code&gt;if let&lt;/code&gt; + bool conditions, reducing nesting without temp variables.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ferrocene / IEC 61508&lt;/strong&gt; — Rust now has a formal foothold in safety-critical domains.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LLM compatibility&lt;/strong&gt; — The compiler becomes a clear feedback loop for AI-generated code.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Top concern is adoption speed&lt;/strong&gt; — Technical concerns have receded; ecosystem breadth is now the focus.&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;Sources:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://blog.jetbrains.com/rust/2026/02/11/state-of-rust-2025/" rel="noopener noreferrer"&gt;State of Rust Ecosystem 2025 | JetBrains&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://blog.rust-lang.org/2026/03/02/2025-State-Of-Rust-Survey-results/" rel="noopener noreferrer"&gt;2025 State of Rust Survey Results | Rust Blog&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://lwn.net/Articles/1049831/" rel="noopener noreferrer"&gt;The (successful) end of the kernel Rust experiment | LWN.net&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://rust-for-linux.com/nova-gpu-driver" rel="noopener noreferrer"&gt;Nova GPU Driver | Rust for Linux&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://rust-for-linux.com/apple-agx-gpu-driver" rel="noopener noreferrer"&gt;Apple AGX GPU driver | Rust for Linux&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://rust-lang.github.io/rfcs/3668-async-closures.html" rel="noopener noreferrer"&gt;RFC 3668: async closures | Rust RFC Book&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.infoworld.com/article/3812600/async-closure-support-is-stable-for-rust-1-85.html" rel="noopener noreferrer"&gt;Async closure support is stable for Rust 1.85 | InfoWorld&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://ferrous-systems.com/blog/ferrocene-libcore-news-release/" rel="noopener noreferrer"&gt;Ferrous Systems achieves IEC 61508 SIL 2 for Rust core | Ferrous Systems&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>rust</category>
      <category>linux</category>
      <category>programming</category>
      <category>systems</category>
    </item>
    <item>
      <title>Open-source SDS tooling for Japanese MHLW compliance: the gap nobody filled</title>
      <dc:creator>kent-tokyo</dc:creator>
      <pubDate>Sat, 23 May 2026 01:34:49 +0000</pubDate>
      <link>https://forem.com/kent-tokyo/open-source-sds-tooling-for-japanese-mhlw-compliance-the-gap-nobody-filled-6o</link>
      <guid>https://forem.com/kent-tokyo/open-source-sds-tooling-for-japanese-mhlw-compliance-the-gap-nobody-filled-6o</guid>
      <description>&lt;p&gt;In March 2025, Japan's Ministry of Health, Labour and Welfare (MHLW) published a structured JSON schema for Safety Data Sheet data exchange. The schema covers roughly 200 deeply nested fields and is intended to standardize how SDS information moves between chemical management systems.&lt;/p&gt;

&lt;p&gt;Most SDS tooling was not built for this.&lt;/p&gt;

&lt;h2&gt;
  
  
  What makes Japan's SDS requirements different
&lt;/h2&gt;

&lt;p&gt;Japan's SDS requirements come from two laws: the Industrial Safety and Health Act (ISAH, 労働安全衛生法) and the Chemical Substances Control Law (化審法). Both mandate SDS for regulated chemicals, with format requirements governed by JIS Z 7253 — Japan's implementation of the UN Globally Harmonized System (GHS).&lt;/p&gt;

&lt;p&gt;JIS Z 7253 follows the standard 16-section GHS structure. In principle, any GHS-compliant SDS satisfies the content requirements. What makes Japanese compliance distinct is a digital layer: the MHLW schema specifies how SDS content should be structured as machine-readable data, with field-level granularity that PDF documents cannot capture.&lt;/p&gt;

&lt;h3&gt;
  
  
  How GHS looks different by country
&lt;/h3&gt;

&lt;p&gt;GHS uses a "building block" approach — each country adopts the elements it chooses. The result is that the same GHS-aligned document varies by jurisdiction:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Country/Region&lt;/th&gt;
&lt;th&gt;Standard&lt;/th&gt;
&lt;th&gt;GHS basis&lt;/th&gt;
&lt;th&gt;Notable difference&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Japan&lt;/td&gt;
&lt;td&gt;JIS Z 7253:2019&lt;/td&gt;
&lt;td&gt;GHS Rev. 6&lt;/td&gt;
&lt;td&gt;MHLW digital schema; revised to GHS Rev. 9 in Dec 2025&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;United States&lt;/td&gt;
&lt;td&gt;OSHA HazCom 2012&lt;/td&gt;
&lt;td&gt;GHS Rev. 3&lt;/td&gt;
&lt;td&gt;Updated to GHS Rev. 7 in 2024&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;European Union&lt;/td&gt;
&lt;td&gt;CLP Regulation&lt;/td&gt;
&lt;td&gt;GHS-aligned&lt;/td&gt;
&lt;td&gt;Stricter on environmental hazards&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;China&lt;/td&gt;
&lt;td&gt;GB 13690-2009&lt;/td&gt;
&lt;td&gt;GHS Rev. 4 equivalent&lt;/td&gt;
&lt;td&gt;Moving to GB 30000.1-2024 (GHS Rev. 8), mandatory from August 2025&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Taiwan&lt;/td&gt;
&lt;td&gt;CNS 15030&lt;/td&gt;
&lt;td&gt;GHS-aligned&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Japan-specific regulatory fields
&lt;/h3&gt;

&lt;p&gt;The MHLW schema includes fields with no equivalent in EU REACH or US OSHA HazCom formats. These are the main reason international SDS tooling does not cover the schema out of the box:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Law&lt;/th&gt;
&lt;th&gt;Example fields&lt;/th&gt;
&lt;th&gt;What they capture&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Chemical Substances Control Law (化審法)&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;CaSCL.ClassificationStatus&lt;/code&gt;, &lt;code&gt;CaSCL.RegistrationNumber&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Regulatory classification and registration numbers under this law&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Industrial Safety and Health Act (安衛法)&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;ISHAct.PublicationOfName&lt;/code&gt;, &lt;code&gt;ISHAct.Notification&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Name disclosure and notification obligations&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Poisonous and Deleterious Substances Control Law&lt;/td&gt;
&lt;td&gt;&lt;code&gt;ControlledSubstancesAct.Applicability&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Whether the substance is classified as poison, deleterious, or specific poison&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PRTR Law&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;Chemical release and transfer reporting obligations&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Section 15 (Regulatory Information) is the most complex section in the schema — it contains separate subsections for each of these laws, each with its own field structure.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this matters now: the 2022 law revision
&lt;/h2&gt;

&lt;p&gt;The MHLW published the schema in 2025, but the driver was a 2022 amendment to the Industrial Safety and Health Act. The amendment shifted Japan's chemical substance regulation from a prescriptive model (government designates specific hazardous substances) to an autonomous management model (companies assess and manage risk themselves).&lt;/p&gt;

&lt;p&gt;The practical impact:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Enforcement date&lt;/th&gt;
&lt;th&gt;Change&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;April 2023&lt;/td&gt;
&lt;td&gt;Shift to autonomous management model — all substances with confirmed GHS hazard classifications brought progressively into scope&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;April 2024&lt;/td&gt;
&lt;td&gt;SDS must now specify concentration ranges numerically (not just qualitatively)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;April 2025&lt;/td&gt;
&lt;td&gt;Protective equipment mandatory for substances with skin/eye hazards&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;April 2027&lt;/td&gt;
&lt;td&gt;Risk assessment obligations expand to all regulated substances&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;With risk assessment coverage expanding significantly, companies need to process SDS data faster and more accurately. Manual PDF entry does not scale. The JSON schema is the infrastructure layer for automating this.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where existing tools stop
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Commercial SDS platforms
&lt;/h3&gt;

&lt;p&gt;The major SDS authoring platforms — Sphera, EcoOnline, Chemwatch, Verisk 3E — have broad international coverage. Japanese is typically a supported output language. What they do not provide, as far as I have found, is export to the MHLW JSON schema. They produce Word or PDF output in the correct section structure, which satisfies the document requirement but not the structured data exchange requirement.&lt;/p&gt;

&lt;p&gt;Japanese-market products like SDS Meister and SmartSDS support MHLW JSON output, but their PDF-to-JSON conversion coverage is limited — they are primarily SDS authoring tools, not bulk conversion tools for incoming supplier documents.&lt;/p&gt;

&lt;h3&gt;
  
  
  Open-source options
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Language&lt;/th&gt;
&lt;th&gt;MHLW JSON&lt;/th&gt;
&lt;th&gt;PDF → JSON&lt;/th&gt;
&lt;th&gt;Approach&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;sds_parser&lt;/td&gt;
&lt;td&gt;Python&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Regex, per-manufacturer rules&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;tungsten&lt;/td&gt;
&lt;td&gt;Python&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Rule-based, English-only&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;sds-converter&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Rust&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Yes&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Yes&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;LLM-based extraction&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;code&gt;sds_parser&lt;/code&gt; and &lt;code&gt;tungsten&lt;/code&gt; solve a different problem: extracting SDS data in English, for specific known manufacturer formats. Neither targets the MHLW schema.&lt;/p&gt;

&lt;h2&gt;
  
  
  The format inconsistency problem
&lt;/h2&gt;

&lt;p&gt;Even within JIS Z 7253-compliant documents, format varies by manufacturer:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Source of variation&lt;/th&gt;
&lt;th&gt;Example&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Section heading labels&lt;/td&gt;
&lt;td&gt;"2. 危険有害性の要約" (JIS Z 7253) vs "2. Hazard(s) identification" (OSHA HazCom) vs "第2部分 危险性概述" (GB/T 16483) — all mean the same thing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Section order&lt;/td&gt;
&lt;td&gt;The 16 sections can appear in any order the manufacturer chooses&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Concentration notation&lt;/td&gt;
&lt;td&gt;"≥95%", "1〜5%", "約100%", "企業秘密" (trade secret) all need different handling&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Language mixing&lt;/td&gt;
&lt;td&gt;Japanese SDS documents regularly contain English chemical names and CAS numbers&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;A rule-based parser must enumerate every variant. In practice, manufacturer-specific headings add another layer of variation on top of the standard differences.&lt;/p&gt;

&lt;h2&gt;
  
  
  The schema itself
&lt;/h2&gt;

&lt;p&gt;Two properties of the MHLW schema are worth knowing before implementing against it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Section 3 (composition) is the hardest part
&lt;/h3&gt;

&lt;p&gt;Section 3 stores component information as a repeating array. Each component object has nested fields for chemical identity, concentration range, and hazard classification. The same data appears differently depending on whether the source document covers a pure substance, a mixture, or a trade secret formulation.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"Composition"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"CompositionAndConcentration"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"ChemicalIdentity"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"CASNumber"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"64-17-5"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"ISHActNotificationNumber"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2-396"&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"ConcentrationRange"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"ConcentrationRangeFrom"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;95.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"ConcentrationRangeTo"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;100.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"ConcentrationRangeUnit"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"%"&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"TradeSecretFlag"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Typos locked into v1.0
&lt;/h3&gt;

&lt;p&gt;The schema contains field name errors that are now part of the specification:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;HumanExposureAndEmergencyMeasuress  ← trailing double-s
TestGuidline                        ← missing 'e' (not Guideline)
Desclaimer                          ← transposed letters (not Disclaimer)
gazetteNo                           ← lowercase first character
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Correcting these would break all existing implementations, so they cannot be fixed in v1.0. An implementation that normalizes these to standard English spellings will fail schema validation.&lt;/p&gt;

&lt;h2&gt;
  
  
  sds-converter
&lt;/h2&gt;

&lt;p&gt;I built &lt;a href="https://github.com/kent-tokyo/sds-converter" rel="noopener noreferrer"&gt;sds-converter&lt;/a&gt; to address the MHLW schema gap. It handles both directions: PDF/DOCX/XLSX to MHLW JSON, and MHLW JSON to a JIS Z 7253-compliant Word document.&lt;/p&gt;

&lt;p&gt;The core approach: rather than enumerating format variants with rules, the tool passes raw section text and the corresponding MHLW schema fields to an LLM and asks it to map values. The LLM handles heading label variation naturally. The output is validated against the schema before writing.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;cargo &lt;span class="nb"&gt;install &lt;/span&gt;sds-converter

&lt;span class="c"&gt;# PDF → MHLW JSON&lt;/span&gt;
sds-converter to-json &lt;span class="nt"&gt;--input&lt;/span&gt; input.pdf &lt;span class="nt"&gt;--output&lt;/span&gt; output.json

&lt;span class="c"&gt;# MHLW JSON → JIS Z 7253 Word document&lt;/span&gt;
sds-converter to-docx &lt;span class="nt"&gt;--input&lt;/span&gt; output.json &lt;span class="nt"&gt;--output&lt;/span&gt; result.docx &lt;span class="nt"&gt;--lang&lt;/span&gt; ja
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The LLM backend is pluggable — Claude, GPT, Gemini, Mistral, Groq, or local models via Ollama. A &lt;code&gt;--quality&lt;/code&gt; flag adjusts cost versus accuracy for batch workloads.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Known limitations:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Issue&lt;/th&gt;
&lt;th&gt;Status&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Scanned PDFs without a text layer&lt;/td&gt;
&lt;td&gt;Not supported — requires upstream OCR&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Section 3 tables with merged cells&lt;/td&gt;
&lt;td&gt;Extraction sometimes fails on complex DOCX layouts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Precision fields mixed with "not measured" entries&lt;/td&gt;
&lt;td&gt;Occasional type errors in Section 9 output&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;These are open problems, not design decisions.&lt;/p&gt;

&lt;h2&gt;
  
  
  The open gap
&lt;/h2&gt;

&lt;p&gt;The MHLW schema represents a real need for anyone handling chemical compliance in Japan at volume. Commercial tools cover the authoring side; the bulk conversion of incoming supplier PDFs to structured data has no open-source solution targeting this schema — other than sds-converter, which I developed and which is the only implementation I am aware of.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://github.com/kent-tokyo/sds-converter" rel="noopener noreferrer"&gt;repository&lt;/a&gt; is open. Contributions on the extraction side — particularly Section 3 table handling — are welcome. If you work in cheminformatics or chemical compliance and have approached the MHLW compliance problem differently, I would be interested to hear it.&lt;/p&gt;

</description>
      <category>chemistry</category>
      <category>rust</category>
      <category>opensource</category>
      <category>cheminformatics</category>
    </item>
    <item>
      <title>sds-converter: Converting Safety Data Sheets to MHLW Standard JSON with Rust and LLMs</title>
      <dc:creator>kent-tokyo</dc:creator>
      <pubDate>Fri, 22 May 2026 23:09:11 +0000</pubDate>
      <link>https://forem.com/kent-tokyo/sds-converter-converting-safety-data-sheets-to-mhlw-standard-json-with-rust-and-llms-ihg</link>
      <guid>https://forem.com/kent-tokyo/sds-converter-converting-safety-data-sheets-to-mhlw-standard-json-with-rust-and-llms-ihg</guid>
      <description>&lt;h2&gt;
  
  
  Background
&lt;/h2&gt;

&lt;p&gt;Safety Data Sheets (SDS) are mandatory documents for every chemical product — solvents, adhesives, industrial gases, cleaning agents. Every manufacturer that supplies a hazardous chemical must provide one. In Japan, the governing standard is JIS Z 7253, which defines 16 sections covering chemical identity, hazard classification, first aid, storage, transport information, and more.&lt;/p&gt;

&lt;p&gt;The Ministry of Health, Labour and Welfare (MHLW) published a standard JSON schema in March 2025 for electronic SDS data exchange between chemical management systems. The schema has roughly 200 deeply nested fields covering all 16 sections.&lt;/p&gt;

&lt;p&gt;The problem is that real SDS documents don't arrive structured to this schema.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why SDS documents are hard to parse
&lt;/h2&gt;

&lt;p&gt;Even two documents both compliant with JIS Z 7253 will differ in ways that break rule-based parsers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Section order&lt;/strong&gt; — manufacturers arrange the 16 sections freely within the standard&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Field labeling&lt;/strong&gt; — the same data appears under different headings across JIS Z 7253, GHS/OSHA HazCom, GB/T 16483, CNS 15030, and company-specific layouts&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Value representation&lt;/strong&gt; — &lt;code&gt;"≥99.5%"&lt;/code&gt;, &lt;code&gt;"99.5% or higher"&lt;/code&gt;, &lt;code&gt;"approximately 100%"&lt;/code&gt; all mean the same thing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Language mixing&lt;/strong&gt; — Japanese SDS regularly embed English chemical names and CAS numbers mid-sentence&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Implicit information&lt;/strong&gt; — section 9 (physical/chemical properties) often has half its fields missing because manufacturers only fill in what's relevant to their product&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The MHLW schema compounds this: it has intentional typos that must be reproduced exactly. &lt;code&gt;HumanExposureAndEmergencyMeasuress&lt;/code&gt; ends in double-&lt;code&gt;s&lt;/code&gt;. &lt;code&gt;TestGuidline&lt;/code&gt; is missing an &lt;code&gt;e&lt;/code&gt;. &lt;code&gt;Desclaimer&lt;/code&gt; has transposed letters. These are in the official spec, and validation fails if you "fix" them.&lt;/p&gt;

&lt;p&gt;To handle SDS from international manufacturers (GHS/OSHA format) or Chinese suppliers (GB/T 16483 format) in the same pipeline, you'd need separate parsers for each format. Writing and maintaining those is impractical. I built &lt;a href="https://github.com/kent-tokyo/sds-converter" rel="noopener noreferrer"&gt;sds-converter&lt;/a&gt; to handle this with an LLM instead.&lt;/p&gt;




&lt;h2&gt;
  
  
  The 16 sections
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;Schema key&lt;/th&gt;
&lt;th&gt;JIS Z 7253 section&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;&lt;code&gt;Identification&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Chemical identity and company information&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;&lt;code&gt;HazardIdentification&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Hazard identification&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;&lt;code&gt;Composition&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Composition / information on ingredients&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;&lt;code&gt;FirstAidMeasures&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;First-aid measures&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;&lt;code&gt;FireFightingMeasures&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Fire-fighting measures&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;&lt;code&gt;AccidentalReleaseMeasures&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Accidental release measures&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;&lt;code&gt;HandlingAndStorage&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Handling and storage&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;&lt;code&gt;ExposureControlPersonalProtection&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Exposure controls / personal protection&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;&lt;code&gt;PhysicalChemicalProperties&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Physical and chemical properties&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;&lt;code&gt;StabilityReactivity&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Stability and reactivity&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;td&gt;&lt;code&gt;ToxicologicalInformation&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Toxicological information&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;td&gt;&lt;code&gt;EcologicalInformation&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Ecological information&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;13&lt;/td&gt;
&lt;td&gt;&lt;code&gt;DisposalConsiderations&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Disposal considerations&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;14&lt;/td&gt;
&lt;td&gt;&lt;code&gt;TransportInformation&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Transport information&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;15&lt;/td&gt;
&lt;td&gt;&lt;code&gt;RegulatoryInformation&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Regulatory information&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;16&lt;/td&gt;
&lt;td&gt;&lt;code&gt;OtherInformation&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Other information&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Installation and quick start
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;cargo &lt;span class="nb"&gt;install &lt;/span&gt;sds-converter
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# PDF → MHLW standard JSON&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;ANTHROPIC_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;sk-ant-...
sds-converter to-json &lt;span class="nt"&gt;--input&lt;/span&gt; input.pdf &lt;span class="nt"&gt;--output&lt;/span&gt; output.json

&lt;span class="c"&gt;# MHLW JSON → JIS Z 7253-compliant Word document&lt;/span&gt;
sds-converter to-docx &lt;span class="nt"&gt;--input&lt;/span&gt; output.json &lt;span class="nt"&gt;--output&lt;/span&gt; result.docx &lt;span class="nt"&gt;--lang&lt;/span&gt; ja

&lt;span class="c"&gt;# Schema validation&lt;/span&gt;
sds-converter validate &lt;span class="nt"&gt;--input&lt;/span&gt; output.json

&lt;span class="c"&gt;# Extract raw text (no LLM call — useful for debugging)&lt;/span&gt;
sds-converter extract-text &lt;span class="nt"&gt;--input&lt;/span&gt; input.pdf
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Supported input: PDF, DOCX, XLSX, TXT.&lt;/p&gt;




&lt;h2&gt;
  
  
  How the conversion works
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Step 1: Text extraction
&lt;/h3&gt;

&lt;p&gt;Text is pulled from the PDF or DOCX file. Use &lt;code&gt;extract-text&lt;/code&gt; to inspect exactly what gets sent to the LLM — useful when extraction quality is lower than expected.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; Encrypted PDFs and scan-only (image) PDFs are not supported — text extraction requires selectable text.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Parallel LLM extraction
&lt;/h3&gt;

&lt;p&gt;The 16 sections are split into two groups and extracted with two parallel LLM calls, halving per-file latency:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GROUP_A&lt;/strong&gt; (sections 1–9): identification, hazard, composition, first aid, fire fighting, accidental release, handling, exposure, physical properties&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GROUP_B&lt;/strong&gt; (sections 10–16): stability, toxicology, ecological, disposal, transport, regulatory, other&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Results from both calls are merged. Sections skipped in the first pass are automatically retried. HTTP rate-limit responses (429/529) trigger exponential backoff retries (2s → 4s → 8s, up to 3 attempts).&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: JSON output
&lt;/h3&gt;

&lt;p&gt;The merged result is written as MHLW SDS data exchange format v1.0 JSON.&lt;/p&gt;




&lt;h2&gt;
  
  
  LLM backend and quality settings
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Choosing a provider
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# OpenAI GPT (gpt-4o-mini by default)&lt;/span&gt;
sds-converter to-json &lt;span class="nt"&gt;--input&lt;/span&gt; input.pdf &lt;span class="nt"&gt;--output&lt;/span&gt; output.json &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--provider&lt;/span&gt; openai &lt;span class="nt"&gt;--api-key&lt;/span&gt; &lt;span class="nv"&gt;$OPENAI_API_KEY&lt;/span&gt;

&lt;span class="c"&gt;# Google Gemini (gemini-2.0-flash by default)&lt;/span&gt;
sds-converter to-json &lt;span class="nt"&gt;--input&lt;/span&gt; input.pdf &lt;span class="nt"&gt;--output&lt;/span&gt; output.json &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--provider&lt;/span&gt; gemini &lt;span class="nt"&gt;--api-key&lt;/span&gt; &lt;span class="nv"&gt;$GEMINI_API_KEY&lt;/span&gt;

&lt;span class="c"&gt;# Local LLM via Ollama (any OpenAI-compatible endpoint)&lt;/span&gt;
sds-converter to-json &lt;span class="nt"&gt;--input&lt;/span&gt; input.pdf &lt;span class="nt"&gt;--output&lt;/span&gt; output.json &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--provider&lt;/span&gt; &lt;span class="nb"&gt;local&lt;/span&gt; &lt;span class="nt"&gt;--base-url&lt;/span&gt; http://localhost:11434/v1 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--model&lt;/span&gt; llama3.2 &lt;span class="nt"&gt;--api-key&lt;/span&gt; dummy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;code&gt;--provider&lt;/code&gt;&lt;/th&gt;
&lt;th&gt;Default model&lt;/th&gt;
&lt;th&gt;Environment variable&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;anthropic&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;claude-haiku-4-5-20251001&lt;/code&gt; (low/medium) · &lt;code&gt;claude-sonnet-4-6&lt;/code&gt; (high)&lt;/td&gt;
&lt;td&gt;&lt;code&gt;ANTHROPIC_API_KEY&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;openai&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;gpt-4o-mini&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;OPENAI_API_KEY&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;gemini&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;gemini-2.0-flash&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;GEMINI_API_KEY&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;mistral&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;mistral-small-latest&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;MISTRAL_API_KEY&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;groq&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;llama-3.3-70b-versatile&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;GROQ_API_KEY&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;cohere&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;command-r-plus&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;COHERE_API_KEY&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;local&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;llama3&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;LOCAL_LLM_API_KEY&lt;/code&gt; (optional)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Quality preset
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;--quality&lt;/code&gt; controls both the model and how much text is sent to the LLM per call:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;code&gt;--quality&lt;/code&gt;&lt;/th&gt;
&lt;th&gt;Model (Anthropic)&lt;/th&gt;
&lt;th&gt;Max text fed to LLM&lt;/th&gt;
&lt;th&gt;Use case&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;low&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;claude-haiku-4-5&lt;/td&gt;
&lt;td&gt;15,000 chars&lt;/td&gt;
&lt;td&gt;Speed/cost priority&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;medium&lt;/code&gt; (default)&lt;/td&gt;
&lt;td&gt;claude-haiku-4-5&lt;/td&gt;
&lt;td&gt;30,000 chars&lt;/td&gt;
&lt;td&gt;Balanced&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;high&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;claude-sonnet-4-6&lt;/td&gt;
&lt;td&gt;60,000 chars&lt;/td&gt;
&lt;td&gt;Accuracy priority&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;At &lt;code&gt;high&lt;/code&gt;, the full document text including the later sections (transport information, regulatory) is included. Use &lt;code&gt;--quality high&lt;/code&gt; when complete 16-section coverage matters.&lt;/p&gt;

&lt;h3&gt;
  
  
  Batch mode
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;sds-converter to-json &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--input-dir&lt;/span&gt; ./pdfs/ &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--output-dir&lt;/span&gt; ./json/ &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--lang&lt;/span&gt; ja &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--concurrency&lt;/span&gt; 4
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Validation
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;validate&lt;/code&gt; checks structural completeness of the extracted JSON and returns warnings without hard-failing — partial results are still usable.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;sds-converter validate &lt;span class="nt"&gt;--input&lt;/span&gt; output.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Examples of what it checks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Section 1: no product name (TradeNameJP or TradeNameEN)&lt;/li&gt;
&lt;li&gt;Section 1: SupplierInformation missing&lt;/li&gt;
&lt;li&gt;Section 2: neither Classification nor HazardLabelling extracted&lt;/li&gt;
&lt;li&gt;Section 3: CompositionAndConcentration list is empty&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When using the library, &lt;code&gt;convert_to_json&lt;/code&gt; returns a &lt;code&gt;(SdsRoot, Vec&amp;lt;String&amp;gt;)&lt;/code&gt; tuple — the warnings are surfaced inline.&lt;/p&gt;




&lt;h2&gt;
  
  
  Output JSON structure
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"Datasheet"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"IssueDate"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2024-03-31"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"SDS-SchemaVersionNo"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"1.0"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"Identification"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"TradeProductIdentity"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"TradeNameJP"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Sample Product"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"SupplierInformation"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"CompanyName"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Sample Corp"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"Phone"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"03-0000-0000"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The full schema covers all 16 JIS Z 7253 sections with ~200 fields. The official spec and developer manual are on the &lt;a href="https://www.mhlw.go.jp/stf/newpage_56484.html" rel="noopener noreferrer"&gt;MHLW website&lt;/a&gt; (Japanese).&lt;/p&gt;




&lt;h2&gt;
  
  
  Using as a library
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="nn"&gt;[dependencies]&lt;/span&gt;
&lt;span class="py"&gt;sds-converter-core&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"0.1"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  PDF → JSON
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;use&lt;/span&gt; &lt;span class="nn"&gt;sds_converter_core&lt;/span&gt;&lt;span class="p"&gt;::{&lt;/span&gt;
    &lt;span class="nn"&gt;converter&lt;/span&gt;&lt;span class="p"&gt;::{&lt;/span&gt;&lt;span class="n"&gt;AnthropicBackend&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;LlmConfig&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="n"&gt;convert_to_json&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ConvertConfig&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Language&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="nd"&gt;#[tokio::main]&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nn"&gt;anyhow&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;Result&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;backend&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;AnthropicBackend&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nn"&gt;std&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;env&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;var&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"ANTHROPIC_API_KEY"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="nn"&gt;LlmConfig&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;default&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ConvertConfig&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;source_language&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;Some&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nn"&gt;Language&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Japanese&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;output_language&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nn"&gt;Language&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Japanese&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="o"&gt;..&lt;/span&gt;&lt;span class="nn"&gt;Default&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;default&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;};&lt;/span&gt;
    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sds&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;warnings&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;convert_to_json&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nn"&gt;std&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;path&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"input.pdf"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;backend&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="k"&gt;.await&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;w&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;warnings&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nd"&gt;eprintln!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"WARN: {w}"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="nn"&gt;std&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;fs&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"output.json"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nn"&gt;serde_json&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;to_string_pretty&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;sds&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="nf"&gt;Ok&lt;/span&gt;&lt;span class="p"&gt;(())&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  JSON → Word document
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;use&lt;/span&gt; &lt;span class="nn"&gt;sds_converter_core&lt;/span&gt;&lt;span class="p"&gt;::{&lt;/span&gt;&lt;span class="n"&gt;convert_from_json&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ConvertConfig&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Language&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;SdsRoot&lt;/span&gt;&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nn"&gt;anyhow&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;Result&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;sds&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;SdsRoot&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;serde_json&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;from_str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="nn"&gt;std&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;fs&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;read_to_string&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"output.json"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ConvertConfig&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;output_language&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nn"&gt;Language&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Japanese&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="o"&gt;..&lt;/span&gt;&lt;span class="nn"&gt;Default&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;default&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;};&lt;/span&gt;
    &lt;span class="nf"&gt;convert_from_json&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;sds&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nn"&gt;std&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;path&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"result.docx"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="nf"&gt;Ok&lt;/span&gt;&lt;span class="p"&gt;(())&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Custom LLM backend
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;use&lt;/span&gt; &lt;span class="nn"&gt;sds_converter_core&lt;/span&gt;&lt;span class="p"&gt;::{&lt;/span&gt;&lt;span class="n"&gt;LlmBackend&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;SdsError&lt;/span&gt;&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;MyBackend&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;impl&lt;/span&gt; &lt;span class="n"&gt;LlmBackend&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;MyBackend&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;complete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;system&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;Result&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nb"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;SdsError&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="c1"&gt;// Call your LLM API, return the raw JSON string response&lt;/span&gt;
        &lt;span class="nd"&gt;todo!&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Language support
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Language&lt;/th&gt;
&lt;th&gt;&lt;code&gt;--lang&lt;/code&gt;&lt;/th&gt;
&lt;th&gt;Source standard&lt;/th&gt;
&lt;th&gt;Output DOCX headings&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Japanese&lt;/td&gt;
&lt;td&gt;&lt;code&gt;ja&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;JIS Z 7253&lt;/td&gt;
&lt;td&gt;JIS Z 7253&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;English&lt;/td&gt;
&lt;td&gt;&lt;code&gt;en&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;GHS/OSHA HazCom&lt;/td&gt;
&lt;td&gt;GHS Rev.10 / ISO 11014&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Simplified Chinese&lt;/td&gt;
&lt;td&gt;&lt;code&gt;zh-cn&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;GB/T 16483-2012&lt;/td&gt;
&lt;td&gt;GB/T 16483-2012&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Traditional Chinese&lt;/td&gt;
&lt;td&gt;&lt;code&gt;zh-tw&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;CNS 15030&lt;/td&gt;
&lt;td&gt;CNS 15030&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Comparison with alternatives
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Open-source
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;sds-converter&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;a href="https://github.com/astepe/sds_parser" rel="noopener noreferrer"&gt;sds_parser&lt;/a&gt;&lt;/th&gt;
&lt;th&gt;&lt;a href="https://github.com/CrucibleSDS/tungsten" rel="noopener noreferrer"&gt;tungsten&lt;/a&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Language&lt;/td&gt;
&lt;td&gt;Rust&lt;/td&gt;
&lt;td&gt;Python&lt;/td&gt;
&lt;td&gt;Python&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AI/LLM&lt;/td&gt;
&lt;td&gt;Yes (pluggable)&lt;/td&gt;
&lt;td&gt;No (regex)&lt;/td&gt;
&lt;td&gt;No (rule-based)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MHLW JSON&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bidirectional&lt;/td&gt;
&lt;td&gt;Yes (↔ DOCX)&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multilingual&lt;/td&gt;
&lt;td&gt;ja / en / zh-CN / zh-TW&lt;/td&gt;
&lt;td&gt;Limited&lt;/td&gt;
&lt;td&gt;English only&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Commercial (Japan)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;sds-converter&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;SDS Meister&lt;/th&gt;
&lt;th&gt;SmartSDS&lt;/th&gt;
&lt;th&gt;Dr.EHS Chemical&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;AI&lt;/td&gt;
&lt;td&gt;Yes (your API key)&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes (translation)&lt;/td&gt;
&lt;td&gt;AI-OCR&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MHLW JSON&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PDF → JSON&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No (authoring only)&lt;/td&gt;
&lt;td&gt;Partial (JP only)&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Open-source&lt;/td&gt;
&lt;td&gt;MIT/Apache-2.0&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;sds-converter is the only open-source tool that supports the MHLW schema, runs entirely locally, and handles the full round-trip.&lt;/p&gt;




&lt;h2&gt;
  
  
  Crate structure
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;sds-converter-core&lt;/code&gt;&lt;/strong&gt; — library. LLM extraction, DOCX generation, MHLW schema types.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;sds-converter&lt;/code&gt;&lt;/strong&gt; — CLI binary. &lt;code&gt;to-json&lt;/code&gt;, &lt;code&gt;to-docx&lt;/code&gt;, &lt;code&gt;validate&lt;/code&gt;, &lt;code&gt;extract-text&lt;/code&gt; subcommands.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Feedback welcome, especially on section 3 component table extraction and non-Japanese document accuracy.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/kent-tokyo/sds-converter" rel="noopener noreferrer"&gt;https://github.com/kent-tokyo/sds-converter&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>chemistry</category>
      <category>rust</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Why Font Rendering in Rust Is Harder Than It Looks</title>
      <dc:creator>kent-tokyo</dc:creator>
      <pubDate>Fri, 22 May 2026 13:48:14 +0000</pubDate>
      <link>https://forem.com/kent-tokyo/why-font-rendering-in-rust-is-harder-than-it-looks-9db</link>
      <guid>https://forem.com/kent-tokyo/why-font-rendering-in-rust-is-harder-than-it-looks-9db</guid>
      <description>&lt;p&gt;When you want to render text in a pure Rust project — a GUI app, a document generator, a terminal — you quickly find that "render some text" is actually four distinct problems stacked on top of each other. The C ecosystem has a library for each one. Rust has options too, but coverage is uneven, and the naming conventions obscure which crates actually depend on C.&lt;/p&gt;




&lt;h2&gt;
  
  
  The four layers
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;C library&lt;/th&gt;
&lt;th&gt;What it does&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Font discovery&lt;/td&gt;
&lt;td&gt;fontconfig&lt;/td&gt;
&lt;td&gt;Finds installed fonts by name, family, or script&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Text shaping&lt;/td&gt;
&lt;td&gt;HarfBuzz&lt;/td&gt;
&lt;td&gt;Converts a Unicode string → glyph IDs with positions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Rasterization&lt;/td&gt;
&lt;td&gt;FreeType&lt;/td&gt;
&lt;td&gt;Turns outline vectors → pixel bitmaps&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Layout&lt;/td&gt;
&lt;td&gt;Pango&lt;/td&gt;
&lt;td&gt;Wraps all three, handles BiDi, line breaking&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;On macOS/iOS the shaping and rasterization layers are CoreText. On Windows they're DirectWrite. In both cases, the system library is C or Objective-C, not Rust.&lt;/p&gt;

&lt;p&gt;Each layer depends on the one below it: shaping needs the raw font bytes to read OpenType tables, rasterization needs the shaped glyph IDs, layout needs both. You can't swap in one layer without thinking about the others.&lt;/p&gt;

&lt;h3&gt;
  
  
  What text shaping actually does
&lt;/h3&gt;

&lt;p&gt;Shaping is the step between "here is a Unicode string" and "here are the glyph IDs and positions to draw." Most developers don't know it exists until something renders wrong.&lt;/p&gt;

&lt;p&gt;The string "fi" contains two Unicode codepoints (U+0066, U+0069). But most fonts substitute those two characters with a single ligature glyph — "ﬁ". In Arabic, the same letter takes a different shape depending on whether it appears at the start, middle, or end of a word. Even Latin fonts have kerning rules that adjust spacing between specific letter pairs.&lt;/p&gt;

&lt;p&gt;All of this is encoded in the font file's OpenType tables: GSUB (glyph substitution) and GPOS (glyph positioning). A shaping engine reads those tables and converts your input string into a list of (glyph ID, x, y) tuples. HarfBuzz is the de facto standard for this. Without a shaper, you get incorrect rendering for anything beyond the simplest Latin text.&lt;/p&gt;




&lt;h2&gt;
  
  
  The naming trap
&lt;/h2&gt;

&lt;p&gt;Several font crates look like pure Rust but are C wrappers:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;harfbuzz-rs&lt;/code&gt;&lt;/strong&gt; — a safe Rust wrapper around libharfbuzz. The API feels like Rust, but libharfbuzz is compiled from C++. Run &lt;code&gt;cargo tree&lt;/code&gt; and you'll see &lt;code&gt;harfbuzz-sys&lt;/code&gt; in the graph.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;freetype-rs&lt;/code&gt;&lt;/strong&gt; — same pattern. Safe API, C underneath. Look for &lt;code&gt;freetype-sys&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;fontconfig&lt;/code&gt;&lt;/strong&gt; — bindings to the C fontconfig library.&lt;/p&gt;

&lt;p&gt;If you're targeting &lt;code&gt;wasm32-unknown-unknown&lt;/code&gt; or cross-compiling without a C toolchain, these fail at build time. The &lt;code&gt;-sys&lt;/code&gt; suffix in &lt;code&gt;cargo tree&lt;/code&gt; output is the reliable signal.&lt;/p&gt;




&lt;h2&gt;
  
  
  Pure Rust, layer by layer
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Font discovery: &lt;code&gt;fontdb&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://crates.io/crates/fontdb" rel="noopener noreferrer"&gt;fontdb&lt;/a&gt; reads font directories directly, parses font metadata from binary headers, and supports querying by family name, weight, style, and script coverage. No C. Works on Linux, macOS, Windows, and WASM.&lt;/p&gt;

&lt;p&gt;It doesn't use fontconfig under the hood — it scans directories directly. For most use cases this is fine. If you need fontconfig's alias resolution ("sans-serif" mapping to a specific installed font), fontdb covers part of this logic but not all of it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Text shaping: &lt;code&gt;rustybuzz&lt;/code&gt; or &lt;code&gt;harfrust&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://crates.io/crates/rustybuzz" rel="noopener noreferrer"&gt;rustybuzz&lt;/a&gt;&lt;/strong&gt; is a complete port of HarfBuzz v10.1.0 to Rust. It passes 2,221 out of 2,252 HarfBuzz shaping tests. Performance is 1.5–2x slower than C HarfBuzz. Maintained under the harfbuzz org. Used in &lt;code&gt;cosmic-text&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://crates.io/crates/harfrust" rel="noopener noreferrer"&gt;harfrust&lt;/a&gt;&lt;/strong&gt; is a newer port, matching HarfBuzz v13.0.0. It started as a fork of rustybuzz, migrating the font parser from &lt;code&gt;ttf-parser&lt;/code&gt; to &lt;code&gt;read-fonts&lt;/code&gt; (Google's fontations project). Less than 25% slower than HarfBuzz on common fonts.&lt;/p&gt;

&lt;p&gt;For new projects, harfrust tracks HarfBuzz more closely (3 major versions ahead of rustybuzz). rustybuzz has more production field time — it's what cosmic-text ships.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Known limitations of both:&lt;/strong&gt; Arabic rendering works for common patterns, but unusual ligature constructions that require building lookup rules at runtime aren't supported. Some variable font interpolation features (avar2) are also not implemented.&lt;/p&gt;

&lt;h3&gt;
  
  
  Rasterization: &lt;code&gt;swash&lt;/code&gt; or &lt;code&gt;fontdue&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://crates.io/crates/swash" rel="noopener noreferrer"&gt;swash&lt;/a&gt;&lt;/strong&gt; handles both glyph outline extraction and rasterization. Supports ligatures and color emoji (CBDT, COLRv1). Used in &lt;code&gt;cosmic-text&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://crates.io/crates/fontdue" rel="noopener noreferrer"&gt;fontdue&lt;/a&gt;&lt;/strong&gt; focuses on rasterization only — no shaping, no color emoji. Simpler API and lighter weight if you're only rasterizing pre-shaped glyphs for basic Latin text.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layout: &lt;code&gt;cosmic-text&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://crates.io/crates/cosmic-text" rel="noopener noreferrer"&gt;cosmic-text&lt;/a&gt; assembles the full stack: fontdb for discovery, rustybuzz for shaping, swash for rasterization, and its own layout engine for line breaking and BiDi text. Version 0.14.2 was released April 2025.&lt;/p&gt;

&lt;p&gt;If you're building a GUI, a terminal emulator, or anything that needs correct multi-line Unicode text, this is the fastest path to a working stack. It's used in COSMIC (the Pop!_OS desktop environment), Iced, and Floem.&lt;/p&gt;




&lt;h2&gt;
  
  
  When the stack breaks
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Complex scripts.&lt;/strong&gt; rustybuzz handles most scripts well. Arabic works for common patterns but fails on ligature constructions that require building font lookup rules on the fly. Indic scripts (Devanagari, Bengali, Tamil) handle standard shaping but haven't had the production volume that C HarfBuzz has accumulated over 15+ years.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Font fallback.&lt;/strong&gt; If a character isn't covered by your selected font, neither fontdb nor cosmic-text handles the fallback automatically. &lt;code&gt;cosmic-text&lt;/code&gt;'s &lt;code&gt;FontSystem&lt;/code&gt; gives you font-matching utilities and codepoint coverage queries, but you write the chain logic.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Variable fonts.&lt;/strong&gt; Basic variation instance selection works. Complex interpolation between instances (avar2) is not yet fully supported.&lt;/p&gt;




&lt;h2&gt;
  
  
  Practical summary
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Goal&lt;/th&gt;
&lt;th&gt;Crate(s)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Full text rendering pipeline&lt;/td&gt;
&lt;td&gt;&lt;code&gt;cosmic-text&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Shaping only, modern (v13)&lt;/td&gt;
&lt;td&gt;&lt;code&gt;harfrust&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Shaping only, battle-tested&lt;/td&gt;
&lt;td&gt;&lt;code&gt;rustybuzz&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Rasterization with color emoji&lt;/td&gt;
&lt;td&gt;&lt;code&gt;swash&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Lightweight rasterization (Latin only)&lt;/td&gt;
&lt;td&gt;&lt;code&gt;fontdue&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Font metadata / discovery&lt;/td&gt;
&lt;td&gt;&lt;code&gt;fontdb&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Checklist before shipping:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;[ ] &lt;code&gt;cargo tree | grep -- -sys&lt;/code&gt; — verify no C dependency slipped in&lt;/li&gt;
&lt;li&gt;[ ] Test on all target platforms (font path scanning differs between Linux, macOS, Windows)&lt;/li&gt;
&lt;li&gt;[ ] Check your target script against the known limitations for rustybuzz or harfrust&lt;/li&gt;
&lt;li&gt;[ ] If targeting WASM, fontdb won't find system fonts — bundle font bytes directly&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you're doing font subsetting rather than rendering — extracting just the glyphs you need to embed in a PDF or WASM module — the relevant crates are different. But knowing that the shaping layer reads GSUB and GPOS tables directly informs which tables you need to keep when subsetting. I've been building that tooling in &lt;a href="https://github.com/kent-tokyo/harumi" rel="noopener noreferrer"&gt;harumi&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Questions welcome in the comments.&lt;/p&gt;

</description>
      <category>rust</category>
      <category>opensource</category>
      <category>webassembly</category>
    </item>
    <item>
      <title>Why Pure Rust WASM Is Harder Than It Looks</title>
      <dc:creator>kent-tokyo</dc:creator>
      <pubDate>Thu, 21 May 2026 11:24:28 +0000</pubDate>
      <link>https://forem.com/kent-tokyo/why-pure-rust-wasm-is-harder-than-it-looks-4p48</link>
      <guid>https://forem.com/kent-tokyo/why-pure-rust-wasm-is-harder-than-it-looks-4p48</guid>
      <description>&lt;p&gt;Most Rust developers expect the pitch to hold: compile to WebAssembly, ship to the browser, no C required. Add the &lt;code&gt;wasm32-unknown-unknown&lt;/code&gt; target, run &lt;code&gt;wasm-pack build&lt;/code&gt;, and you're done.&lt;/p&gt;

&lt;p&gt;Then you try it on a real project. Four traps, none of them in the getting-started docs: a deprecated allocator with a memory leak, hidden C dependencies scattered through the ecosystem, a random number crate that silently misbehaves, and bundle sizes that balloon from Emscripten glue.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem with "Just Use Rust"
&lt;/h2&gt;

&lt;p&gt;The core appeal of Rust for WASM is real: no garbage collector, predictable memory, and — in theory — no need for the C/C++ toolchain that browser WASM has historically required.&lt;/p&gt;

&lt;p&gt;But the Rust crate ecosystem has C dependencies scattered throughout it, often invisible until compile time. A project that builds fine on Linux will fail on &lt;code&gt;wasm32-unknown-unknown&lt;/code&gt; because three levels down in the dependency tree, something links to &lt;code&gt;libz&lt;/code&gt;, &lt;code&gt;openssl&lt;/code&gt;, or a system RNG.&lt;/p&gt;

&lt;p&gt;When I built &lt;a href="https://github.com/kent-tokyo/chem-wasm-lens" rel="noopener noreferrer"&gt;chem-wasm-lens&lt;/a&gt; — a molecular analysis library for the browser — I needed to stay completely C-free. The alternative, RDKit (the standard C++ cheminformatics toolkit), compiles to a ~40MB WASM bundle via Emscripten. My target was under 200KB gzipped. Getting there meant systematically hunting down every C dependency.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Allocator Story
&lt;/h2&gt;

&lt;p&gt;Every WASM tutorial from a few years ago recommended &lt;code&gt;wee_alloc&lt;/code&gt;: a tiny allocator that shrank bundle size and was the default in &lt;code&gt;wasm-pack new&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;wee_alloc&lt;/code&gt; was archived in August 2025.&lt;/strong&gt; It has a known memory leak and is no longer maintained. If you have it in an existing project, remove it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="c"&gt;# Cargo.toml — remove this line&lt;/span&gt;
&lt;span class="py"&gt;wee_alloc&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"0.4"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="c1"&gt;// lib.rs — remove these lines&lt;/span&gt;
&lt;span class="nd"&gt;#[global_allocator]&lt;/span&gt;
&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="n"&gt;ALLOC&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nn"&gt;wee_alloc&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;WeeAlloc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;wee_alloc&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;WeeAlloc&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;INIT&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The current state of WASM allocators:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Default (&lt;code&gt;dlmalloc&lt;/code&gt;)&lt;/strong&gt;: Rust's &lt;code&gt;wasm32-unknown-unknown&lt;/code&gt; target uses &lt;code&gt;dlmalloc&lt;/code&gt; as the default global allocator. It's a pure Rust implementation — no C, no system calls. For most browser WASM projects, this is fine.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;lol_alloc&lt;/code&gt;&lt;/strong&gt;: Written as a &lt;code&gt;wee_alloc&lt;/code&gt; replacement. Smaller than &lt;code&gt;dlmalloc&lt;/code&gt;, but the author documents it as not production-ready.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;talc&lt;/code&gt;&lt;/strong&gt;: A newer allocator, benchmarked as smaller and faster than &lt;code&gt;dlmalloc&lt;/code&gt;. Worth watching for size-critical projects.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For most projects: just use the default and move on. The allocator is no longer the interesting problem. The interesting problem is transitive C deps.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Hidden C Dependency Tree
&lt;/h2&gt;

&lt;p&gt;The most common &lt;code&gt;wasm-pack build&lt;/code&gt; failure isn't your code — it's a crate three levels deep that silently links to C. The build error typically looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;error: failed to run custom build command for `openssl-sys v0.9.x`
  ...
  Could not find directory of OpenSSL installation.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or for &lt;code&gt;ring&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;error: failed to run custom build command for `ring v0.17.x`
  ...
  the target `wasm32-unknown-unknown` is not supported
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;&lt;code&gt;-sys&lt;/code&gt; crates are the signal.&lt;/strong&gt; Any crate with a &lt;code&gt;-sys&lt;/code&gt; suffix wraps a native library:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;openssl-sys&lt;/code&gt; — OpenSSL&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;libz-sys&lt;/code&gt; — zlib&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;bzip2-sys&lt;/code&gt; — libbz2&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;libsqlite3-sys&lt;/code&gt; — SQLite&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;ring&lt;/code&gt; — cryptography (C code for performance-critical paths)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You probably didn't add any of these directly. &lt;code&gt;reqwest&lt;/code&gt; pulls in &lt;code&gt;openssl-sys&lt;/code&gt; by default. &lt;code&gt;flate2&lt;/code&gt; uses &lt;code&gt;libz-sys&lt;/code&gt; by default. &lt;code&gt;sqlx&lt;/code&gt; brings in &lt;code&gt;libsqlite3-sys&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How to detect them:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;cargo tree &lt;span class="nt"&gt;--target&lt;/span&gt; wasm32-unknown-unknown | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\-&lt;/span&gt;&lt;span class="s2"&gt;sys"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Example output from a project using &lt;code&gt;reqwest&lt;/code&gt; and &lt;code&gt;flate2&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;├── flate2 v1.0.x
│   └── libz-sys v1.1.x (*)       ← C dependency
├── reqwest v0.12.x
│   └── openssl-sys v0.9.x (*)    ← C dependency
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Any &lt;code&gt;-sys&lt;/code&gt; crate in the output is a potential blocker.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Common replacements:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;C-dependent crate&lt;/th&gt;
&lt;th&gt;Pure Rust alternative&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;openssl&lt;/code&gt; / &lt;code&gt;openssl-sys&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;&lt;code&gt;rustls&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;flate2&lt;/code&gt; (default features)&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;flate2&lt;/code&gt; with &lt;code&gt;default-features = false, features = ["miniz"]&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;image&lt;/code&gt; (old versions)&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;image&lt;/code&gt; 0.25+ (mostly pure Rust)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;reqwest&lt;/code&gt; with system TLS&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;reqwest&lt;/code&gt; with &lt;code&gt;default-features = false, features = ["rustls-tls"]&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Here's what a typical fix looks like in &lt;code&gt;Cargo.toml&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="nn"&gt;[dependencies]&lt;/span&gt;
&lt;span class="c"&gt;# Before (pulls in openssl-sys):&lt;/span&gt;
&lt;span class="c"&gt;# reqwest = "0.12"&lt;/span&gt;

&lt;span class="c"&gt;# After (pure Rust TLS):&lt;/span&gt;
&lt;span class="py"&gt;reqwest&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="py"&gt;version&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"0.12"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="py"&gt;default-features&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="py"&gt;features&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"rustls-tls"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"json"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c"&gt;# Before (uses libz-sys by default):&lt;/span&gt;
&lt;span class="c"&gt;# flate2 = "1.0"&lt;/span&gt;

&lt;span class="c"&gt;# After (pure Rust miniz backend):&lt;/span&gt;
&lt;span class="py"&gt;flate2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="py"&gt;version&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"1.0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="py"&gt;default-features&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="py"&gt;features&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"miniz"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The &lt;code&gt;ring&lt;/code&gt; situation&lt;/strong&gt; deserves a specific note. &lt;code&gt;ring&lt;/code&gt; powers most TLS stacks in the Rust ecosystem, and it contains C code for performance-critical paths. On &lt;code&gt;wasm32-unknown-unknown&lt;/code&gt;, it fails to compile.&lt;/p&gt;

&lt;p&gt;For browser WASM, this is usually not a problem — the browser's &lt;code&gt;fetch&lt;/code&gt; API handles TLS transparently, so Rust code never touches it. For non-browser WASM runtimes (WASI, Wasmtime), look at &lt;a href="https://github.com/RustCrypto" rel="noopener noreferrer"&gt;RustCrypto&lt;/a&gt; crates, which provide pure Rust cryptographic primitives, or &lt;code&gt;aws-lc-rs&lt;/code&gt;, which has better WASM support than &lt;code&gt;ring&lt;/code&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  The &lt;code&gt;getrandom&lt;/code&gt; Footgun
&lt;/h2&gt;

&lt;p&gt;Many crates need random numbers — UUID generation, HashMap initialization, cryptographic primitives. They all end up depending on &lt;code&gt;getrandom&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;On &lt;code&gt;wasm32-unknown-unknown&lt;/code&gt;, &lt;code&gt;getrandom&lt;/code&gt; doesn't know where it is. The target name alone says nothing about whether you're in a browser, a WASI runtime, or a bare-metal environment. Without explicit configuration, the build fails or panics at runtime, depending on the version.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;First, find which version is in your tree:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;cargo tree | &lt;span class="nb"&gt;grep &lt;/span&gt;getrandom
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The fix depends on the version:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;getrandom&lt;/code&gt; 0.2.x — add the &lt;code&gt;js&lt;/code&gt; feature:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="c"&gt;# Cargo.toml&lt;/span&gt;
&lt;span class="nn"&gt;[dependencies]&lt;/span&gt;
&lt;span class="py"&gt;getrandom&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="py"&gt;version&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"0.2"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="py"&gt;features&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"js"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;&lt;code&gt;getrandom&lt;/code&gt; 0.3.x — feature flag plus a backend declaration:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In 0.3.x, getrandom introduced a "backend" model that separates feature flags from backend selection. Both are required:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="c"&gt;# Cargo.toml&lt;/span&gt;
&lt;span class="nn"&gt;[dependencies]&lt;/span&gt;
&lt;span class="py"&gt;getrandom&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="py"&gt;version&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"0.3"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="py"&gt;features&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"wasm_js"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="c"&gt;# .cargo/config.toml&lt;/span&gt;
&lt;span class="nn"&gt;[target.wasm32-unknown-unknown]&lt;/span&gt;
&lt;span class="py"&gt;rustflags&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"--cfg"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="py"&gt;"getrandom_backend&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="err"&gt;wasm_js&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s"&gt;"]&lt;/span&gt;&lt;span class="err"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The transitive dependency trap.&lt;/strong&gt; You can't set feature flags for indirect dependencies directly. If &lt;code&gt;uuid&lt;/code&gt; depends on &lt;code&gt;getrandom&lt;/code&gt; and you don't use &lt;code&gt;getrandom&lt;/code&gt; yourself, you still need to declare it explicitly so Cargo's feature unification propagates the flag:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="nn"&gt;[dependencies]&lt;/span&gt;
&lt;span class="py"&gt;uuid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="py"&gt;version&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="py"&gt;features&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"v4"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="c"&gt;# Explicitly declare getrandom to force the js feature through the tree&lt;/span&gt;
&lt;span class="py"&gt;getrandom&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="py"&gt;version&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"0.2"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="py"&gt;features&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"js"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;getrandom&lt;/code&gt; docs also recommend against enabling &lt;code&gt;js&lt;/code&gt;/&lt;code&gt;wasm_js&lt;/code&gt; in &lt;em&gt;libraries&lt;/em&gt; — it breaks non-browser WASM builds. For a browser-only library like &lt;code&gt;chem-wasm-lens&lt;/code&gt;, enabling it unconditionally is the right call, but the documentation buries this distinction.&lt;/p&gt;




&lt;h2&gt;
  
  
  Bundle Size
&lt;/h2&gt;

&lt;p&gt;This is where C dependencies have their most visible cost.&lt;/p&gt;

&lt;p&gt;C libraries compiled to WASM via Emscripten carry significant overhead: libc, libc++, a malloc implementation, and Emscripten runtime glue — regardless of how much of it you actually use. Tree shaking doesn't cross the FFI boundary. The result:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;RDKit.js: ~40MB (Emscripten-compiled C++ cheminformatics)&lt;/li&gt;
&lt;li&gt;OpenSSL compiled to WASM: ~1–2MB just for the crypto primitives&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Pure Rust compiles lean. &lt;code&gt;chem-wasm-lens&lt;/code&gt; ships at 411KB uncompressed, ~200KB gzipped — and that includes SMILES parsing, 2D coordinate generation, SVG rendering, ECFP4 fingerprint similarity, and PDB parsing. The size difference isn't magic; it's the absence of Emscripten glue, Rust's dead code elimination working cleanly across &lt;code&gt;#[wasm_bindgen]&lt;/code&gt; exports, and &lt;code&gt;wasm-opt&lt;/code&gt; running automatically on release builds.&lt;/p&gt;

&lt;p&gt;Check your own binary size after building:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;wasm-pack build &lt;span class="nt"&gt;--release&lt;/span&gt;
&lt;span class="nb"&gt;ls&lt;/span&gt; &lt;span class="nt"&gt;-lh&lt;/span&gt; pkg/&lt;span class="k"&gt;*&lt;/span&gt;.wasm
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the &lt;code&gt;.wasm&lt;/code&gt; is unexpectedly large, profile it with &lt;a href="https://rustwasm.github.io/twiggy/" rel="noopener noreferrer"&gt;Twiggy&lt;/a&gt; to see what's taking up space:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;cargo &lt;span class="nb"&gt;install &lt;/span&gt;twiggy
twiggy top pkg/your_crate_bg.wasm
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  The Current Ecosystem State
&lt;/h2&gt;

&lt;p&gt;Where things stand, mid-2025:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Task&lt;/th&gt;
&lt;th&gt;Pure Rust option&lt;/th&gt;
&lt;th&gt;Status&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Memory allocation&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;dlmalloc&lt;/code&gt; (default)&lt;/td&gt;
&lt;td&gt;Solid&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Compression&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;flate2&lt;/code&gt; with &lt;code&gt;default-features = false, features = ["miniz"]&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Solid&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Serialization&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;serde&lt;/code&gt; + &lt;code&gt;serde-wasm-bindgen&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Solid&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;HTTP (browser)&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;gloo-net&lt;/code&gt; or JS &lt;code&gt;fetch&lt;/code&gt; via &lt;code&gt;web-sys&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Solid&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Random numbers&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;getrandom&lt;/code&gt; with &lt;code&gt;js&lt;/code&gt;/&lt;code&gt;wasm_js&lt;/code&gt; feature&lt;/td&gt;
&lt;td&gt;Works, requires explicit config&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cryptographic primitives&lt;/td&gt;
&lt;td&gt;RustCrypto crates&lt;/td&gt;
&lt;td&gt;Solid for most algorithms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;TLS (non-browser WASM)&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;Gap — &lt;code&gt;ring&lt;/code&gt; doesn't build cleanly; &lt;code&gt;aws-lc-rs&lt;/code&gt; has better WASM support&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Image processing&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;image&lt;/code&gt; 0.25+&lt;/td&gt;
&lt;td&gt;Mostly pure Rust; some format decoders still use C&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Date/time&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;web-time&lt;/code&gt; or &lt;code&gt;js-sys::Date&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Solid&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Practical Checklist
&lt;/h2&gt;

&lt;p&gt;Before shipping a Rust WASM project:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Audit with &lt;code&gt;cargo tree&lt;/code&gt;&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   cargo tree &lt;span class="nt"&gt;--target&lt;/span&gt; wasm32-unknown-unknown | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\-&lt;/span&gt;&lt;span class="s2"&gt;sys"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Any &lt;code&gt;-sys&lt;/code&gt; crate is a potential blocker.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Remove &lt;code&gt;wee_alloc&lt;/code&gt;&lt;/strong&gt; — archived in August 2025, has a memory leak. The default &lt;code&gt;dlmalloc&lt;/code&gt; is fine.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Handle &lt;code&gt;getrandom&lt;/code&gt; explicitly&lt;/strong&gt; — run &lt;code&gt;cargo tree | grep getrandom&lt;/code&gt; to find the version, then add it explicitly to &lt;code&gt;[dependencies]&lt;/code&gt; with the right feature flag, even if you don't use it directly.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Replace system TLS&lt;/strong&gt; — swap &lt;code&gt;openssl&lt;/code&gt;/&lt;code&gt;openssl-sys&lt;/code&gt; for &lt;code&gt;rustls&lt;/code&gt;, or use &lt;code&gt;reqwest&lt;/code&gt; with &lt;code&gt;default-features = false, features = ["rustls-tls"]&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Fix &lt;code&gt;flate2&lt;/code&gt;&lt;/strong&gt; — use &lt;code&gt;default-features = false, features = ["miniz"]&lt;/code&gt; to disable the &lt;code&gt;libz-sys&lt;/code&gt; backend.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Build with &lt;code&gt;--release&lt;/code&gt;&lt;/strong&gt; — &lt;code&gt;wasm-pack build --release&lt;/code&gt; runs &lt;code&gt;wasm-opt&lt;/code&gt; automatically. Debug builds skip optimization and are typically 3–5x larger.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Closing Thought
&lt;/h2&gt;

&lt;p&gt;The Rust-to-WASM path is genuinely good, and it's improving. The C ABI for &lt;code&gt;wasm32-unknown-unknown&lt;/code&gt; is being standardized (Rust blog, April 2025), &lt;code&gt;wee_alloc&lt;/code&gt;'s archival simplified the allocator story, and &lt;code&gt;wasm-pack&lt;/code&gt; keeps getting better.&lt;/p&gt;

&lt;p&gt;Most popular Rust crates predate WASM as a real target, and they picked up C dependencies along the way. The audit is tedious but a one-time fix — once you know which &lt;code&gt;-sys&lt;/code&gt; crates to watch for and what replaces them, it doesn't keep coming back.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;I hit all four of these problems building &lt;a href="https://github.com/kent-tokyo/chem-wasm-lens" rel="noopener noreferrer"&gt;chem-wasm-lens&lt;/a&gt;, a pure Rust + WASM molecular analysis library for the browser. The table above is what I found trying to go from a ~40MB Emscripten-compiled baseline down to ~200KB gzipped.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>rust</category>
      <category>webassembly</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Why CJK Support in Rust Is Hard</title>
      <dc:creator>kent-tokyo</dc:creator>
      <pubDate>Wed, 20 May 2026 11:30:23 +0000</pubDate>
      <link>https://forem.com/kent-tokyo/why-cjk-support-in-rust-is-hard-5bcf</link>
      <guid>https://forem.com/kent-tokyo/why-cjk-support-in-rust-is-hard-5bcf</guid>
      <description>&lt;p&gt;Most Rust developers don't think about CJK until they need it. Then they discover that embedding Japanese text in a PDF, building a search index over Chinese content, or normalizing Korean input involves a stack of interlocking problems that Latin-script tooling simply never had to solve.&lt;/p&gt;

&lt;p&gt;This post breaks down why CJK is genuinely hard — not just "different" — and where the Rust ecosystem still has gaps.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Scale Problem
&lt;/h2&gt;

&lt;p&gt;The first thing that surprises developers: a full CJK font file is enormous.&lt;/p&gt;

&lt;p&gt;A Latin font like Inter Regular is around 300 KB. A full Japanese font — say, Noto Sans CJK JP — is over 15 MB. That's because Unicode's CJK Unified Ideographs block alone defines over 92,000 characters, and a production font needs to cover most of them.&lt;/p&gt;

&lt;p&gt;For most use cases you don't need all 92,000 glyphs. If you're generating a PDF invoice with a customer name and address, you might use 50 distinct CJK characters. But a naive approach embeds the entire font, making a simple document balloon to 15 MB.&lt;/p&gt;

&lt;p&gt;The solution is &lt;strong&gt;font subsetting&lt;/strong&gt;: extract only the glyphs actually used, rebuild a minimal font binary, and embed that. It sounds straightforward. It isn't.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Three Hard Problems
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Font Subsetting for CJK
&lt;/h3&gt;

&lt;p&gt;Subsetting a Latin font is well-understood. Subsetting a CJK font involves:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Glyph ID remapping.&lt;/strong&gt; A font maps Unicode code points to internal Glyph IDs (GIDs). After subsetting, the GID space is compacted — the 50 glyphs you kept now have new GIDs from 0 to 49. Every reference to the old GIDs in the font binary and in your document needs to be updated.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CMap table reconstruction.&lt;/strong&gt; The font's &lt;code&gt;cmap&lt;/code&gt; table maps Unicode → GID. After subsetting, this table must be rebuilt to reflect the new GID assignments. Get this wrong and the font renders garbage or fails to load entirely.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Advance width recalculation.&lt;/strong&gt; Fonts store per-glyph advance widths (how far the cursor moves after each character). After GID remapping, the width table must be reindexed. In PDF specifically, the &lt;code&gt;/Widths&lt;/code&gt; array in the CIDFont object must match the new GIDs exactly — a mismatch causes text spacing to break in subtle, hard-to-debug ways.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Type0/CIDFont object graph.&lt;/strong&gt; PDF represents CJK fonts as a two-level structure: a Type0 (composite) font wrapping a CIDFont. The CIDFont references the embedded font stream and the ToUnicode CMap. Building this object graph correctly requires understanding the PDF spec at a level most developers would rather avoid.&lt;/p&gt;

&lt;p&gt;In pure Rust, the &lt;a href="https://github.com/nicowillis/allsorts" rel="noopener noreferrer"&gt;allsorts&lt;/a&gt; crate handles TTF subsetting. It works well for TrueType fonts. OpenType CFF fonts (&lt;code&gt;.otf&lt;/code&gt; files with PostScript outlines) are more complex and allsorts coverage is incomplete — this is a known gap in the Rust ecosystem.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. ToUnicode CMap Generation
&lt;/h3&gt;

&lt;p&gt;PDF separates &lt;strong&gt;rendering&lt;/strong&gt; (which glyph to draw) from &lt;strong&gt;semantics&lt;/strong&gt; (what Unicode character that glyph represents). Rendering uses GIDs. Semantics are stored in a separate stream called the ToUnicode CMap.&lt;/p&gt;

&lt;p&gt;Without a ToUnicode CMap:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Copy-pasting text from a PDF produces garbage characters or empty strings&lt;/li&gt;
&lt;li&gt;Search within the PDF doesn't find CJK text&lt;/li&gt;
&lt;li&gt;Screen readers can't read the document&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The CMap is a PostScript-like stream that maps GID ranges to Unicode code points. For CJK fonts with thousands of glyphs, generating this stream correctly — with proper range compression for consecutive code points — requires care. A naive one-entry-per-glyph approach technically works but produces unnecessarily large streams.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Normalization and Variant Characters
&lt;/h3&gt;

&lt;p&gt;CJK text has an encoding problem that Latin scripts largely don't: the same logical character can have multiple valid representations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Unicode normalization forms (NFC, NFD, NFKC, NFKD)&lt;/strong&gt; affect how composed characters are stored. Japanese text in particular mixes hiragana, katakana, kanji, and Latin characters, each with their own normalization quirks. Fullwidth ASCII (&lt;code&gt;Ａ&lt;/code&gt;, &lt;code&gt;Ｂ&lt;/code&gt;, &lt;code&gt;Ｃ&lt;/code&gt;) and halfwidth katakana (&lt;code&gt;ｱ&lt;/code&gt;, &lt;code&gt;ｲ&lt;/code&gt;, &lt;code&gt;ｳ&lt;/code&gt;) are canonically equivalent to their standard forms under NFKC but not NFC.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CJK Compatibility Ideographs&lt;/strong&gt; (U+F900–U+FAFF) are compatibility mappings for characters that appear in legacy encodings. U+FA30 (㌍) is canonically equivalent to U+30AD U+30ED (キロ). Depending on whether you normalize before indexing, the same string might or might not match a query.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Variant selectors&lt;/strong&gt; add another layer. CJK Unified Ideographs sometimes have multiple visual forms (simplified vs. traditional Chinese, Japanese vs. Korean glyph shapes). Unicode encodes this with Variation Selectors — invisible code points that follow a base character to select a specific glyph. &lt;code&gt;葛&lt;/code&gt; followed by VS17 (U+E0100) selects a specific variant used in place names. A text search that isn't VS-aware will fail to match these strings.&lt;/p&gt;

&lt;p&gt;For fuzzy matching over CJK content, you need to decide which of these equivalences to collapse before indexing. The right answer depends on the use case: a legal document system probably wants exact glyph matching; a general search index probably wants NFKC normalization.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Legacy Encoding Problem
&lt;/h2&gt;

&lt;p&gt;Modern CJK text is Unicode, but a significant amount of real-world content is still encoded in legacy formats:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Encoding&lt;/th&gt;
&lt;th&gt;Used for&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Shift-JIS&lt;/td&gt;
&lt;td&gt;Legacy Japanese text, older Windows software&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;EUC-JP&lt;/td&gt;
&lt;td&gt;Unix-era Japanese&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GB2312 / GBK / GB18030&lt;/td&gt;
&lt;td&gt;Simplified Chinese&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Big5&lt;/td&gt;
&lt;td&gt;Traditional Chinese (Taiwan/Hong Kong)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;EUC-KR / CP949&lt;/td&gt;
&lt;td&gt;Korean&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Converting these to Unicode isn't just a lookup table — legacy CJK encodings have overlapping code spaces, vendor extensions, and edge cases that differ between Windows, macOS, and Linux implementations.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://github.com/nicowillis/encoding_rs" rel="noopener noreferrer"&gt;encoding_rs&lt;/a&gt; crate (originally written for Firefox) is the authoritative pure Rust implementation of the WHATWG Encoding Standard and handles most of these correctly. This is one area where the Rust ecosystem is actually in good shape.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Everything Still Leans on C
&lt;/h2&gt;

&lt;p&gt;The elephant in the room: most production CJK text processing still depends on C or C++ libraries.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;HarfBuzz&lt;/strong&gt; — text shaping (converting Unicode to positioned glyphs) — is C++. For CJK, shaping is relatively simple compared to Arabic or Indic scripts (no complex ligatures or bidirectional reordering), but HarfBuzz is still the de facto standard.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;FreeType&lt;/strong&gt; — font rasterization — is C. If you're rendering CJK text to a bitmap, you're almost certainly using FreeType bindings.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ICU (International Components for Unicode)&lt;/strong&gt; — normalization, collation, locale-aware string comparison — is C++. The &lt;code&gt;icu4x&lt;/code&gt; project is a ground-up Rust rewrite led by the Unicode Consortium, and it's making solid progress, but it's not yet a drop-in replacement for all ICU use cases.&lt;/p&gt;

&lt;p&gt;The consequence for Rust developers: if you need CJK support and reach for crates that wrap these C libraries, you give up WASM compatibility, you complicate cross-compilation, and you add a build-time dependency on the system libraries or vendored C sources.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Current State of Pure Rust CJK
&lt;/h2&gt;

&lt;p&gt;Here's an honest assessment of the pure Rust ecosystem for CJK work:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Task&lt;/th&gt;
&lt;th&gt;Pure Rust option&lt;/th&gt;
&lt;th&gt;Status&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Legacy encoding conversion&lt;/td&gt;
&lt;td&gt;&lt;code&gt;encoding_rs&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Solid&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Unicode normalization&lt;/td&gt;
&lt;td&gt;&lt;code&gt;unicode-normalization&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Solid&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;TTF font subsetting&lt;/td&gt;
&lt;td&gt;&lt;code&gt;allsorts&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Good for TrueType; OTF/CFF partial&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PDF CJK text embedding&lt;/td&gt;
&lt;td&gt;&lt;code&gt;harumi&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;New, niche&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Text shaping&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;Gap — no pure Rust HarfBuzz equivalent yet&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Font rasterization&lt;/td&gt;
&lt;td&gt;&lt;code&gt;fontdue&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Works but limited CJK testing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Full Unicode collation&lt;/td&gt;
&lt;td&gt;&lt;code&gt;icu4x&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;In progress&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The gaps are real. Text shaping in particular is a hard open problem for pure Rust — for simple CJK rendering you can get away without a full shaper, but for mixed CJK/Latin text with proper kerning and ligatures, you eventually need something HarfBuzz-level.&lt;/p&gt;




&lt;h2&gt;
  
  
  What This Means in Practice
&lt;/h2&gt;

&lt;p&gt;If you're building something that needs to handle CJK text in Rust:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;For &lt;strong&gt;encoding conversion&lt;/strong&gt;, use &lt;code&gt;encoding_rs&lt;/code&gt;. Don't roll your own.&lt;/li&gt;
&lt;li&gt;For &lt;strong&gt;normalization&lt;/strong&gt;, use &lt;code&gt;unicode-normalization&lt;/code&gt; and decide up front which form you want. For search, NFKC is usually the right default.&lt;/li&gt;
&lt;li&gt;For &lt;strong&gt;PDF with CJK&lt;/strong&gt;, the pure Rust path exists but requires understanding the subsetting pipeline. Wrapping pdfium or a C-based library is currently the easier path if WASM compatibility isn't a requirement.&lt;/li&gt;
&lt;li&gt;For &lt;strong&gt;fuzzy search over CJK&lt;/strong&gt;, normalization before indexing matters more than the search algorithm itself.&lt;/li&gt;
&lt;li&gt;For &lt;strong&gt;text rendering&lt;/strong&gt;, if you can accept a C dependency, HarfBuzz + FreeType is the proven path. Pure Rust rendering is possible for simple cases.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Closing Thought
&lt;/h2&gt;

&lt;p&gt;CJK support isn't a single feature — it's a stack of problems that compound. The good news is that the Rust ecosystem is making real progress on each layer. The bad news is that each layer requires understanding the layer below it, which is why CJK support tends to be either "works perfectly" or "completely broken" with little middle ground.&lt;/p&gt;

&lt;p&gt;If you're working on any of these problems — subsetting, normalization, collation, shaping — I'd love to compare notes.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;I've run into most of these problems while building &lt;a href="https://github.com/kent-tokyo/harumi" rel="noopener noreferrer"&gt;harumi&lt;/a&gt;, a pure Rust PDF library with CJK font subsetting. The gaps in the table above are the ones I've personally hit.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>rust</category>
      <category>opensource</category>
      <category>unicode</category>
      <category>cjk</category>
    </item>
    <item>
      <title>I Built a Next-Gen DNS Diagnostic CLI in Rust That Visualizes DNSSEC Trust Chains</title>
      <dc:creator>kent-tokyo</dc:creator>
      <pubDate>Wed, 20 May 2026 09:32:59 +0000</pubDate>
      <link>https://forem.com/kent-tokyo/i-built-a-next-gen-dns-diagnostic-cli-in-rust-that-visualizes-dnssec-trust-chains-237h</link>
      <guid>https://forem.com/kent-tokyo/i-built-a-next-gen-dns-diagnostic-cli-in-rust-that-visualizes-dnssec-trust-chains-237h</guid>
      <description>&lt;p&gt;If you've ever tried to debug a DNSSEC misconfiguration using &lt;code&gt;dig&lt;/code&gt;, you know the pain. You're staring at a wall of raw text, manually cross-referencing DS records against DNSKEY records, tracing through TLD delegations one query at a time. It works — but it's exhausting.&lt;/p&gt;

&lt;p&gt;I wanted something better. So I built &lt;strong&gt;shohei&lt;/strong&gt; — a Rust-powered DNS diagnostic CLI that makes the invisible visible.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/kent-tokyo/shohei" rel="noopener noreferrer"&gt;github.com/kent-tokyo/shohei&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  What Is shohei?
&lt;/h2&gt;

&lt;p&gt;shohei is a next-generation DNS diagnostic tool that goes well beyond what &lt;code&gt;dig&lt;/code&gt; or &lt;code&gt;drill&lt;/code&gt; offer. It renders DNS resolution as color-coded terminal trees, walks you through DNSSEC trust chains step by step, and supports modern transports like DoH and DoT — all from a single command.&lt;/p&gt;

&lt;p&gt;Think of it as &lt;code&gt;dig&lt;/code&gt; if &lt;code&gt;dig&lt;/code&gt; was designed for humans first.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Three Hard Problems shohei Solves
&lt;/h2&gt;

&lt;p&gt;Building a DNS diagnostic tool sounds straightforward until you try to make it actually useful. Here are the problems that motivated most of the design.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. DNSSEC Trust Chains Are Invisible in Standard Tools
&lt;/h3&gt;

&lt;p&gt;DNSSEC forms a cryptographic chain from the root zone down to individual records. The chain looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Root KSK (trust anchor, hardcoded in resolvers)
  └── Root ZSK (signs root zone records)
        └── DS record for .com  ← hash of .com's KSK
              └── .com KSK
                    └── .com ZSK (signs .com zone)
                          └── DS record for example.com
                                └── example.com KSK
                                      └── example.com ZSK
                                            └── RRSIG on A record ← what you care about
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When something breaks in this chain — a DS mismatch, an expired RRSIG, a missing DNSKEY — &lt;code&gt;dig&lt;/code&gt; gives you raw records and leaves you to piece together the failure yourself. shohei walks the whole chain and tells you exactly where validation fails and why.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;. (Root)
└── com.  [DS 30909 → DNSKEY 30909 ✓]
    └── example.com.  [DS 12345 → DNSKEY 12345 ✓]
        └── www.example.com.  A 93.184.216.34
            └── RRSIG (ZSK tag 12345) expires 2026-06-01 ✓
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A broken chain looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;. (Root)
└── com.  [DS 30909 → DNSKEY 30909 ✓]
    └── example.com.  [DS 99999 → DNSKEY ✗ NO MATCHING KEY FOUND]
        └── ⚠ Validation failed: DS/DNSKEY mismatch at example.com.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Iterative Resolution Is a Black Box
&lt;/h3&gt;

&lt;p&gt;Standard resolvers are recursive — they do the work for you and return an answer. That's convenient, but useless when you're debugging a delegation problem, a glue record mismatch, or propagation that hasn't reached a specific nameserver yet.&lt;/p&gt;

&lt;p&gt;shohei's &lt;code&gt;--trace&lt;/code&gt; mode sends queries the way a resolver actually would — manually, hop by hop:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;[1] Query root servers for example.com. A
    → a.root-servers.net. (198.41.0.4)  4ms
    ← REFERRAL: com. NS [a.gtld-servers.net., ...]

[2] Query TLD servers for example.com. A
    → a.gtld-servers.net. (192.5.6.30)  11ms
    ← REFERRAL: example.com. NS [ns1.example.com., ns2.example.com.]
       GLUE: ns1.example.com. A 205.251.196.1

[3] Query authoritative servers for example.com. A
    → ns1.example.com. (205.251.196.1)  8ms
    ← ANSWER: www.example.com. A 93.184.216.34  TTL 3600
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each hop shows which server was queried, its IP, round-trip time, and exactly what was returned. Delegation loops, missing glue records, and lame delegations all become immediately visible.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Resolver Disagreements Are Hard to Diagnose
&lt;/h3&gt;

&lt;p&gt;Split-horizon DNS, stale caches, and regional differences mean two resolvers can return completely different answers for the same query. Figuring out &lt;em&gt;why&lt;/em&gt; usually involves running &lt;code&gt;dig @8.8.8.8&lt;/code&gt; and &lt;code&gt;dig @1.1.1.1&lt;/code&gt; separately, then comparing the output by hand.&lt;/p&gt;

&lt;p&gt;shohei's &lt;code&gt;--compare&lt;/code&gt; mode queries both resolvers concurrently and shows a unified diff:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;Comparing 8.8.8.8 vs 1.1.1.1 for example.com A

  ANSWER
  8.8.8.8:   93.184.216.34  TTL 3521
  1.1.1.1:   93.184.216.34  TTL 120

  AUTHORITY
+ 8.8.8.8:   example.com. NS ns1.example.com.  TTL 172800
- 1.1.1.1:   (none)

  FLAGS
  8.8.8.8:   QR AA TC RD RA AD  ← AD set (DNSSEC validated)
  1.1.1.1:   QR AA TC RD RA     ← AD not set
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The TTL difference here tells you 1.1.1.1 has a nearly-fresh cache entry while 8.8.8.8 is serving a stale one. The missing AD flag on 1.1.1.1 is immediately visible.&lt;/p&gt;




&lt;h2&gt;
  
  
  Feature Overview
&lt;/h2&gt;

&lt;h3&gt;
  
  
  DNSSEC Chain Visualization
&lt;/h3&gt;

&lt;p&gt;Full trust chain from root to record, color-coded by validation status. Green for valid, red for failures, yellow for warnings (e.g., signature expiring within 7 days).&lt;/p&gt;

&lt;h3&gt;
  
  
  Iterative Resolution Tracing (&lt;code&gt;--trace&lt;/code&gt;)
&lt;/h3&gt;

&lt;p&gt;Manual hop-by-hop resolution with timing, referral details, and glue record display. Shows exactly what a recursive resolver would do.&lt;/p&gt;

&lt;h3&gt;
  
  
  DoH and DoT Support
&lt;/h3&gt;

&lt;p&gt;Modern DNS transports with no external dependencies:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;shohei example.com &lt;span class="nt"&gt;--doh&lt;/span&gt; https://cloudflare-dns.com/dns-query
shohei example.com &lt;span class="nt"&gt;--dot&lt;/span&gt; 1.1.1.1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Useful for testing whether a domain resolves correctly over encrypted transports, or for bypassing a local resolver that might be interfering.&lt;/p&gt;

&lt;h3&gt;
  
  
  Resolver Comparison (&lt;code&gt;--compare&lt;/code&gt;)
&lt;/h3&gt;

&lt;p&gt;Side-by-side diff of two resolvers. Flags TTL differences, missing records, DNSSEC validation status mismatches, and flag differences:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;shohei example.com &lt;span class="nt"&gt;--compare&lt;/span&gt; 8.8.8.8 1.1.1.1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Multi-Type Queries
&lt;/h3&gt;

&lt;p&gt;Query multiple record types in a single command instead of running &lt;code&gt;dig&lt;/code&gt; four times:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;shohei example.com A AAAA MX TXT SOA
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output is grouped by type with consistent formatting.&lt;/p&gt;

&lt;h3&gt;
  
  
  Watch Mode (&lt;code&gt;--watch&lt;/code&gt;)
&lt;/h3&gt;

&lt;p&gt;Auto-refresh at a set interval. Useful for monitoring TTL countdown, propagation in real time, or catching flapping records:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;shohei example.com &lt;span class="nt"&gt;--watch&lt;/span&gt; 5  &lt;span class="c"&gt;# refresh every 5 seconds&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The display updates in-place, so changes are immediately visible without scrolling.&lt;/p&gt;

&lt;h3&gt;
  
  
  Interactive TUI (&lt;code&gt;--tui&lt;/code&gt;)
&lt;/h3&gt;

&lt;p&gt;Built with &lt;a href="https://github.com/ratatui-org/ratatui" rel="noopener noreferrer"&gt;ratatui&lt;/a&gt;, the TUI gives you three navigable panels in a single terminal window:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Records panel&lt;/strong&gt; — all RRsets for the queried domain, sorted by type&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;DNSSEC panel&lt;/strong&gt; — the full trust chain with validation status&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Trace panel&lt;/strong&gt; — iterative resolution hops with timing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Switch between panels with Tab, scroll with arrow keys, and exit with &lt;code&gt;q&lt;/code&gt;. Useful when you need to explore a domain's DNS health interactively rather than in a single pass.&lt;/p&gt;

&lt;h3&gt;
  
  
  Reverse DNS (&lt;code&gt;--ptr&lt;/code&gt;)
&lt;/h3&gt;

&lt;p&gt;PTR lookups for IPv4 and IPv6, with automatic in-addr.arpa / ip6.arpa formatting:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;shohei &lt;span class="nt"&gt;--ptr&lt;/span&gt; 8.8.8.8
shohei &lt;span class="nt"&gt;--ptr&lt;/span&gt; 2001:4860:4860::8888
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Batch Mode &amp;amp; Script-Friendly Output
&lt;/h3&gt;

&lt;p&gt;Pipe a list of domains for bulk querying, with JSON output for downstream processing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cat &lt;/span&gt;domains.txt | shohei &lt;span class="nt"&gt;--batch&lt;/span&gt; &lt;span class="nt"&gt;--json&lt;/span&gt; | jq &lt;span class="s1"&gt;'.[] | select(.dnssec_valid == false)'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;--minimal&lt;/code&gt; flag strips color and tree formatting for plain-text pipelines.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Rust?
&lt;/h2&gt;

&lt;p&gt;Three reasons:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Performance.&lt;/strong&gt; DNS diagnostics involve parallel resolution paths — querying multiple nameservers concurrently, running resolver comparisons simultaneously, tracing iterative hops while fetching DNSSEC records in parallel. Rust's async ecosystem via &lt;code&gt;tokio&lt;/code&gt; handles this efficiently without GC pauses that would skew timing measurements.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Correctness.&lt;/strong&gt; DNSSEC validation is stateful and involves cryptographic operations across multiple record types. Getting it wrong silently is worse than getting it wrong loudly. Rust's type system lets you encode validation state as types — a validated chain is a different type than an unvalidated one, and the compiler enforces that distinction.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;hickory-dns.&lt;/strong&gt; The &lt;a href="https://github.com/hickory-dns/hickory-dns" rel="noopener noreferrer"&gt;hickory-dns&lt;/a&gt; crate (formerly trust-dns) is a mature pure-Rust DNS implementation with first-class DNSSEC, DoH, and DoT support. It handles wire-format parsing, cryptographic validation, and transport negotiation — the parts that are easy to get subtly wrong — so I could focus on the UX layer.&lt;/p&gt;




&lt;h2&gt;
  
  
  Core Dependencies
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Crate&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;hickory-dns&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;DNS resolution, DNSSEC validation, DoH/DoT transport&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;clap&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;CLI argument parsing with derive macros&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;ratatui&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Interactive TUI framework&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;owo-colors&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Terminal colorization without ANSI escape string building&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;comfy-table&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Table rendering with Unicode box-drawing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;tokio&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Async runtime for concurrent resolver queries&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;serde_json&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;JSON output serialization&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Installation
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;cargo &lt;span class="nb"&gt;install &lt;/span&gt;shohei
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or build from source:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/kent-tokyo/shohei
&lt;span class="nb"&gt;cd &lt;/span&gt;shohei
cargo build &lt;span class="nt"&gt;--release&lt;/span&gt;
./target/release/shohei example.com
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The TUI is included by default. If you want a smaller binary without ratatui, build with &lt;code&gt;--no-default-features&lt;/code&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Quick Reference
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Basic query — records + DNSSEC chain&lt;/span&gt;
shohei example.com

&lt;span class="c"&gt;# Multiple record types in one shot&lt;/span&gt;
shohei example.com A AAAA MX TXT SOA

&lt;span class="c"&gt;# Iterative resolution trace (hop by hop)&lt;/span&gt;
shohei example.com &lt;span class="nt"&gt;--trace&lt;/span&gt;

&lt;span class="c"&gt;# DNSSEC chain only&lt;/span&gt;
shohei example.com &lt;span class="nt"&gt;--dnssec&lt;/span&gt;

&lt;span class="c"&gt;# Compare two resolvers&lt;/span&gt;
shohei example.com &lt;span class="nt"&gt;--compare&lt;/span&gt; 8.8.8.8 1.1.1.1

&lt;span class="c"&gt;# Query over DoH&lt;/span&gt;
shohei example.com &lt;span class="nt"&gt;--doh&lt;/span&gt; https://cloudflare-dns.com/dns-query

&lt;span class="c"&gt;# Query over DoT&lt;/span&gt;
shohei example.com &lt;span class="nt"&gt;--dot&lt;/span&gt; 1.1.1.1

&lt;span class="c"&gt;# Watch mode (refresh every 5s)&lt;/span&gt;
shohei example.com &lt;span class="nt"&gt;--watch&lt;/span&gt; 5

&lt;span class="c"&gt;# Reverse DNS&lt;/span&gt;
shohei &lt;span class="nt"&gt;--ptr&lt;/span&gt; 8.8.8.8

&lt;span class="c"&gt;# Interactive TUI&lt;/span&gt;
shohei example.com &lt;span class="nt"&gt;--tui&lt;/span&gt;

&lt;span class="c"&gt;# Batch mode with JSON output&lt;/span&gt;
&lt;span class="nb"&gt;cat &lt;/span&gt;domains.txt | shohei &lt;span class="nt"&gt;--batch&lt;/span&gt; &lt;span class="nt"&gt;--json&lt;/span&gt;

&lt;span class="c"&gt;# Minimal output for scripting&lt;/span&gt;
shohei example.com &lt;span class="nt"&gt;--minimal&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;DANE/TLSA validation&lt;/strong&gt; — cross-reference TLS certificate fingerprints against DNS to verify DANE configurations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;NSEC/NSEC3 zone walking&lt;/strong&gt; — for security research and CTF scenarios&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;HTML report export&lt;/strong&gt; — a shareable snapshot of a domain's full DNS health, including DNSSEC chain and trace output&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CAA record checking&lt;/strong&gt; — verify Certification Authority Authorization records alongside DNSSEC&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Feedback Welcome
&lt;/h2&gt;

&lt;p&gt;shohei is MIT licensed and open to contributions. If you work with DNS professionally — or just find yourself reaching for &lt;code&gt;dig&lt;/code&gt; more than you'd like — I'd love to hear what features would be most useful.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/kent-tokyo/shohei" rel="noopener noreferrer"&gt;github.com/kent-tokyo/shohei&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Built with Rust, a healthy obsession with DNSSEC, and too many late nights staring at zone files.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>rust</category>
      <category>dns</category>
      <category>cli</category>
      <category>opensource</category>
    </item>
    <item>
      <title>I Built "harumi" — A Pure Rust PDF Editing Library with CJK Support</title>
      <dc:creator>kent-tokyo</dc:creator>
      <pubDate>Mon, 18 May 2026 09:58:14 +0000</pubDate>
      <link>https://forem.com/kent-tokyo/i-built-harumi-a-pure-rust-pdf-editing-library-with-cjk-support-4n2n</link>
      <guid>https://forem.com/kent-tokyo/i-built-harumi-a-pure-rust-pdf-editing-library-with-cjk-support-4n2n</guid>
      <description>&lt;h2&gt;
  
  
  Overview
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;harumi&lt;/strong&gt; is a Pure Rust library that lets you dynamically add CJK text (Japanese, Chinese, Korean) to existing PDFs. Unlike bindings-based solutions, it has zero C dependencies and handles font subsetting automatically.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;crates.io&lt;/strong&gt;: &lt;a href="https://crates.io/crates/harumi" rel="noopener noreferrer"&gt;https://crates.io/crates/harumi&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GitHub&lt;/strong&gt;: &lt;a href="https://github.com/kent-tokyo/harumi" rel="noopener noreferrer"&gt;https://github.com/kent-tokyo/harumi&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Why Another Rust PDF Crate?
&lt;/h2&gt;

&lt;p&gt;The existing Rust PDF ecosystem leaves a gap:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Crate&lt;/th&gt;
&lt;th&gt;Limitation&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;lopdf&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Low-level; no font subsetting or CMap generation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;printpdf&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Create-only; can't edit existing PDFs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;pdfium-render&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Requires linking against the C-based PDFium library&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;harumi&lt;/strong&gt; fills that gap: append-only editing of existing PDFs, Pure Rust, with automatic CJK font subsetting and ToUnicode CMap generation built in.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Three Hard Problems of CJK in PDF
&lt;/h2&gt;

&lt;p&gt;Getting Japanese (and CJK in general) right inside a PDF isn't just about "embedding a font." There are three distinct challenges:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Font Subsetting
&lt;/h3&gt;

&lt;p&gt;A full Japanese font file can easily exceed 10 MB. For practical file sizes you must extract only the glyphs actually used and rebuild the font binary — this is subsetting. harumi does this automatically at save time.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. ToUnicode CMap Generation
&lt;/h3&gt;

&lt;p&gt;PDFs separate &lt;em&gt;rendering&lt;/em&gt; (Glyph IDs) from &lt;em&gt;semantics&lt;/em&gt; (Unicode code points). Without a ToUnicode CMap, copy-paste and text search produce garbled output. harumi generates this mapping for every font it embeds.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Glyph Advance Width Recalculation
&lt;/h3&gt;

&lt;p&gt;After subsetting, Glyph IDs are reassigned. The advance widths stored in the PDF must be recalculated to match — otherwise text spacing breaks. harumi handles this as part of the save pipeline.&lt;/p&gt;




&lt;h2&gt;
  
  
  Lazy Subsetting Pipeline
&lt;/h2&gt;

&lt;p&gt;harumi uses a &lt;strong&gt;lazy subsetting&lt;/strong&gt; design to handle all three problems in one pass:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;embed_font()&lt;/code&gt; — store raw font bytes; no processing yet&lt;/li&gt;
&lt;li&gt;Collect all text draw calls across all pages&lt;/li&gt;
&lt;li&gt;Walk every page at &lt;code&gt;save()&lt;/code&gt; time, gathering the complete set of used characters&lt;/li&gt;
&lt;li&gt;Subset the font to only those glyphs&lt;/li&gt;
&lt;li&gt;Reassign Glyph IDs&lt;/li&gt;
&lt;li&gt;Build the ToUnicode CMap&lt;/li&gt;
&lt;li&gt;Recalculate advance widths and write the final CIDFont object&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This single-pass approach avoids redundant font processing and keeps the implementation straightforward.&lt;/p&gt;




&lt;h2&gt;
  
  
  Feature Overview
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;use&lt;/span&gt; &lt;span class="nn"&gt;harumi&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Document&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="k"&gt;mut&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;Document&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"input.pdf"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;// Append text (including invisible text for search layers)&lt;/span&gt;
&lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="nf"&gt;.page&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="nf"&gt;.add_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Hello, 世界！"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;font&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;12.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;// Draw shapes and embed images&lt;/span&gt;
&lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="nf"&gt;.page&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="nf"&gt;.draw_rect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;width&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;height&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;color&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="nf"&gt;.page&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="nf"&gt;.embed_image&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image_bytes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;width&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;height&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;// Page operations&lt;/span&gt;
&lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="nf"&gt;.rotate_page&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;90&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="nf"&gt;.delete_page&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="nf"&gt;.reorder_pages&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;// Merge and split&lt;/span&gt;
&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;other&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;Document&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"other.pdf"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="nf"&gt;.merge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;other&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;parts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="nf"&gt;.split_at&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;// Extract text&lt;/span&gt;
&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="nf"&gt;.extract_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;// Metadata&lt;/span&gt;
&lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="nf"&gt;.set_title&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"My Document"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="nf"&gt;.save&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"output.pdf"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Current Status &amp;amp; Roadmap
&lt;/h2&gt;

&lt;p&gt;harumi is published on &lt;a href="https://crates.io/crates/harumi" rel="noopener noreferrer"&gt;crates.io&lt;/a&gt; and the source is available on &lt;a href="https://github.com/kent-tokyo/harumi" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Planned improvements:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Broader CJK font format support&lt;/li&gt;
&lt;li&gt;Form field editing&lt;/li&gt;
&lt;li&gt;Performance optimizations for large documents&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Feedback, issues, and contributions are very welcome!&lt;/p&gt;

</description>
      <category>rust</category>
      <category>cjk</category>
      <category>pdf</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
