<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Vladislav Kashin</title>
    <description>The latest articles on Forem by Vladislav Kashin (@vloldik).</description>
    <link>https://forem.com/vloldik</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3809992%2F258359f3-6691-4153-a83f-dd03998b9772.jpeg</url>
      <title>Forem: Vladislav Kashin</title>
      <link>https://forem.com/vloldik</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/vloldik"/>
    <language>en</language>
    <item>
      <title>The Anatomy of a 500ns Parser: Porting libphonenumber to Rust</title>
      <dc:creator>Vladislav Kashin</dc:creator>
      <pubDate>Fri, 06 Mar 2026 19:03:39 +0000</pubDate>
      <link>https://forem.com/vloldik/the-anatomy-of-a-500ns-parser-porting-libphonenumber-to-rust-3daa</link>
      <guid>https://forem.com/vloldik/the-anatomy-of-a-500ns-parser-porting-libphonenumber-to-rust-3daa</guid>
      <description>&lt;p&gt;It started with a freelance project. I was writing a backend service in Rust and needed to validate international phone numbers. Like any Rust developer, I headed to crates.io and pulled the most popular library for the job. &lt;/p&gt;

&lt;p&gt;Then, I opened their GitHub issues. &lt;/p&gt;

&lt;p&gt;What I saw was a graveyard of long-standing, unhandled bugs. "National part of the number is truncated in some cases" (Open). "Numbers starting with the same sequence as country prefix are parsed incorrectly" (Open). "00-prefixed international numbers don't parse" (Open). &lt;/p&gt;

&lt;p&gt;Porting Google's massive C++ &lt;code&gt;libphonenumber&lt;/code&gt; library is an incredibly complex task, and I deeply respect the authors of that crate for undertaking it. But I couldn't ship my client's project with those bugs. So I decided to do something slightly crazy: I was going to write a bug-to-bug compatible port of the C++ libphonenumber, built from the ground up for maximum performance.&lt;/p&gt;

&lt;p&gt;Here is the story of how making the library progressively "dumber" about regular expressions made it blazingly fast.&lt;/p&gt;

&lt;h3&gt;
  
  
  Phase 1: The Initial Prototype and the Protobuf Trap
&lt;/h3&gt;

&lt;p&gt;The first challenge was simply navigating Google's C++ codebase. It's not easy when the language isn't your daily driver. I didn't want to blindly copy 5,000-line files, so I started breaking the logic down into idiomatic Rust modules. &lt;/p&gt;

&lt;p&gt;My first strategic mistake was how I handled regular expressions. Google's library uses RE2 and relies heavily on exact bounds checking like &lt;code&gt;FullMatch&lt;/code&gt; and &lt;code&gt;FindAndConsume&lt;/code&gt;. I naïvely assumed I could just replace these with standard &lt;code&gt;regex&lt;/code&gt; crate calls, checking if &lt;code&gt;match.start == 0 &amp;amp;&amp;amp; end == strlen&lt;/code&gt;. &lt;/p&gt;

&lt;p&gt;Then came the metadata. Phone number validation relies on a massive set of rules for every country on earth. I initially thought I could hand-write the Rust structs for this data. I was wrong. The C++ metadata generation script outputs a raw, binary protobuf array. To read it, I had to use &lt;code&gt;protobuf-gen&lt;/code&gt; (though I am currently planning a migration to &lt;code&gt;prost&lt;/code&gt; or something even more custom). Another early hurdle was string normalization. The library needs to convert any Unicode character in the "Decimal Number" (&lt;code&gt;Nd&lt;/code&gt;) category into its standard ASCII digit. I didn't want to drag in a massive ICU library just for this. My solution was to write a separate lightweight crate: &lt;code&gt;dec_from_char&lt;/code&gt;. In its early days, it was just a giant &lt;code&gt;match&lt;/code&gt; statement generated via a macro reading &lt;code&gt;UnicodeProps.txt&lt;/code&gt;. Since there aren't &lt;em&gt;that&lt;/em&gt; many &lt;code&gt;Nd&lt;/code&gt; characters, it kept the binary size small while maintaining acceptable performance. I had a working prototype. The basic test suite passed. So, I took a long break from the project.&lt;/p&gt;

&lt;h3&gt;
  
  
  Phase 2: The Hiatus and the Warning Sign
&lt;/h3&gt;

&lt;p&gt;During my time away from the codebase, a helpful contributor opened a Pull Request. They had directly ported the C++ &lt;code&gt;RegexBasedMatcher&lt;/code&gt; to Rust, complete with a global &lt;code&gt;RegexCache&lt;/code&gt; to handle the exact anchored matching that Google's library used. It was perfectly accurate to the upstream logic. But when I ran the benchmarks locally, I gasped.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Formatting Comparison/rlibphonenumber: format(National)
                        time:   [3.5683 ms 3.5991 ms 3.6337 ms]
                        change: [+22240% +22556% +22875%] (p = 0.00 &amp;lt; 0.05)
                        Performance has regressed.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Performance had regressed by over 22,000%. Formatting a single number suddenly took milliseconds. I politely declined the PR, thinking: &lt;em&gt;"Exact C++ internal compatibility isn't worth destroying performance."&lt;/em&gt; I didn't realize at the time that this PR was a glaring warning sign of a much deeper flaw in my own code. I brushed it off.&lt;/p&gt;

&lt;h3&gt;
  
  
  Phase 3: The Return and the Differential Fuzzer
&lt;/h3&gt;

&lt;p&gt;Months later, I returned to the library. Before publishing a v1.0, I wanted absolute proof that my library behaved identically to Google's. Writing manual edge cases was impossible, so I turned to differential fuzzing. I linked the original C++ library via &lt;code&gt;cxx&lt;/code&gt; and fed both implementations millions of random strings to compare their outputs.&lt;/p&gt;

&lt;p&gt;The fuzzer crashed almost immediately. A string like &lt;code&gt;CD(+48X666666644&lt;/code&gt; was being marked as invalid by my Rust code, but valid by C++. &lt;/p&gt;

&lt;p&gt;Then it clicked. The contributor from months ago wasn't just translating C++ for the sake of it. My naïve &lt;code&gt;start == 0 &amp;amp;&amp;amp; end == strlen&lt;/code&gt; boundary checks were fundamentally broken for complex edge cases. Because of how the internal regex patterns were structured, my simple boundary checks were not allowing valid strings to slip through. I needed the exactness of C++'s anchored regexes. But I absolutely refused to accept the 22,000% performance penalty of a &lt;code&gt;RegexCache&lt;/code&gt; allocating and compiling strings at runtime.&lt;/p&gt;

&lt;h3&gt;
  
  
  Phase 4: Fixing Regex the Fast Way
&lt;/h3&gt;

&lt;p&gt;How do you anchor thousands of regexes without allocating strings at runtime or locking a global cache? You move the work to build time. I modified my Java build script (which parses the XML metadata) to pre-wrap the patterns in &lt;code&gt;^(?:...)$&lt;/code&gt;. But initializing three different variations of a regex at runtime would bloat memory. Instead, I created a structure called &lt;code&gt;RegexTriplets&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="nd"&gt;#[derive(Debug,&lt;/span&gt; &lt;span class="nd"&gt;Clone)]&lt;/span&gt;
&lt;span class="k"&gt;pub&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;RegexTriplets&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;pub&lt;/span&gt; &lt;span class="n"&gt;pattern_base&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Option&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nb"&gt;String&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;pub&lt;/span&gt; &lt;span class="n"&gt;original&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;OnceLock&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nb"&gt;Result&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nb"&gt;Option&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;Regex&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;crate&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;regexp&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Error&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;pub&lt;/span&gt; &lt;span class="n"&gt;anchor_start&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;OnceLock&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nb"&gt;Result&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nb"&gt;Option&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;Regex&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;crate&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;regexp&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Error&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;pub&lt;/span&gt; &lt;span class="n"&gt;anchor_full&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;OnceLock&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nb"&gt;Result&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nb"&gt;Option&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;Regex&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;crate&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;regexp&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Error&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This struct sits on the stack and weighs only 84 bytes. We keep the raw string via &lt;code&gt;pattern_base&lt;/code&gt; (since the internal logic sometimes needs to slice it) and lazily initialize the actual &lt;code&gt;Regex&lt;/code&gt; objects only if a specific exact-match variation is requested. By leveraging Rust's string slicing (&lt;code&gt;[..]&lt;/code&gt;), which is basically free, I achieved the exactness of the C++ implementation with minimal runtime overhead.&lt;/p&gt;

&lt;h3&gt;
  
  
  Phase 5: WASM, Regex-lite, and Custom Unicode Tries
&lt;/h3&gt;

&lt;p&gt;One of my primary goals was WebAssembly support. I wanted a live web preview without paying for Rust backend hosting. However, my initial WASM builds were megabytes in size. The main offender was the &lt;code&gt;regex&lt;/code&gt; crate; compiling DFA state machines into WASM takes a massive amount of space. I switched to &lt;code&gt;regex-lite&lt;/code&gt;, which dropped the binary size to a beautiful ~500kB. &lt;/p&gt;

&lt;p&gt;But there was a catch: &lt;code&gt;regex-lite&lt;/code&gt; does not support Unicode categories like &lt;code&gt;\p{L}&lt;/code&gt; (Letters) or &lt;code&gt;\p{N}&lt;/code&gt; (Numbers). Dropping full Unicode support was not an option for an international phone library. To fix this, I completely rewrote my &lt;code&gt;dec_from_char&lt;/code&gt; crate into a new code generator. Instead of a giant match statement, my &lt;code&gt;build.rs&lt;/code&gt; now pre-computes a Trie-like lookup table for the necessary Unicode properties. Using fixed chunk sizes, it generates arrays that grant O(1) lookups for any character up to &lt;code&gt;0x10FFFF&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;impl&lt;/span&gt; &lt;span class="n"&gt;Category&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nd"&gt;#[inline(always)]&lt;/span&gt;
    &lt;span class="k"&gt;pub&lt;/span&gt; &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;from_char&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;char&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;std&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;option&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;Option&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="k"&gt;Self&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;cp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nb"&gt;u32&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;cp&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;MAX_CODEPOINT&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nb"&gt;None&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;index_idx&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cp&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;SHIFT&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nb"&gt;usize&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

        &lt;span class="c1"&gt;// SAFETY: Arrays are generated to cover up to 0x10FFFF&lt;/span&gt;
        &lt;span class="k"&gt;unsafe&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;block_idx&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;CATEGORY_INDICES&lt;/span&gt;&lt;span class="nf"&gt;.get_unchecked&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;index_idx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nb"&gt;usize&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
            &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;offset&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cp&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;MASK&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nb"&gt;usize&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
            &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;final_pos&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;block_idx&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;SHIFT&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;offset&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
            &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;CATEGORY_BLOCKS&lt;/span&gt;&lt;span class="nf"&gt;.get_unchecked&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;final_pos&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This allowed me to rip out regular expressions entirely from critical hot-paths. Stripping unwanted trailing characters went from a slow regex match inside a loop:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="c1"&gt;// The old, slow regex way&lt;/span&gt;
&lt;span class="k"&gt;pub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;crate&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="n"&gt;trim_unwanted_end_chars&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nv"&gt;'a&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;phone_number&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="nv"&gt;'a&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="nv"&gt;'a&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// ... loop with regex.full_match() ...&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To a single, native Rust iterator method leveraging the generated tables:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="c1"&gt;// The new, instant way&lt;/span&gt;
&lt;span class="k"&gt;pub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;crate&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="n"&gt;trim_unwanted_end_chars&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nv"&gt;'a&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;phone_number&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="nv"&gt;'a&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="nv"&gt;'a&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;phone_number&lt;/span&gt;&lt;span class="nf"&gt;.trim_end_matches&lt;/span&gt;&lt;span class="p"&gt;(|&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;|&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="sc"&gt;'#'&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nn"&gt;uniprops_without_nl&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;uniprops&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;Category&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;from_char&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="nf"&gt;.is_some&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Phase 6: Zero-Allocation Formatting
&lt;/h3&gt;

&lt;p&gt;At this point, parsing was incredibly fast, but formatting still required a heap allocation when appending leading zeroes to national numbers. To achieve true zero-allocation formatting, I wrote a custom integer-to-string formatter called &lt;code&gt;zeroes_itoa&lt;/code&gt;, by clonning and modifying &lt;a href="https://github.com/dtolnay/itoa" rel="noopener noreferrer"&gt;This popular crate&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;pub&lt;/span&gt; &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="k"&gt;mut&lt;/span&gt; &lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;mut&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;u64&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;leading_zero_count&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;usize&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Cow&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nv"&gt;'_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="k"&gt;mut&lt;/span&gt; &lt;span class="n"&gt;curr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="py"&gt;.bytes&lt;/span&gt;&lt;span class="nf"&gt;.len&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;buf_ptr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="py"&gt;.bytes&lt;/span&gt;&lt;span class="nf"&gt;.as_mut_ptr&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="k"&gt;mut&lt;/span&gt; &lt;span class="nb"&gt;u8&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;lut_ptr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;DEC_DIGITS_LUT&lt;/span&gt;&lt;span class="nf"&gt;.as_ptr&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

    &lt;span class="c1"&gt;// ...&lt;/span&gt;
    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;final_len&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="py"&gt;.bytes&lt;/span&gt;&lt;span class="nf"&gt;.len&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;curr&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;bytes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;unsafe&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nn"&gt;slice&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;from_raw_parts&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;buf_ptr&lt;/span&gt;&lt;span class="nf"&gt;.add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;curr&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;final_len&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
    &lt;span class="k"&gt;unsafe&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nn"&gt;str&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;from_utf8_unchecked&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bytes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="nf"&gt;.into&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Unless the required padding exceeds the 64-byte stack buffer (which is essentially impossible for a phone number), this returns a &lt;code&gt;Cow::Borrowed(&amp;amp;str)&lt;/code&gt;, entirely avoiding the system allocator.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Final Results
&lt;/h3&gt;

&lt;p&gt;Through differential fuzzing, build-time regex compilation, custom Unicode Tries, and stack-allocated integer formatting, &lt;code&gt;rlibphonenumber&lt;/code&gt; is now entirely stable, bug-for-bug compatible with Google's upstream, and incredibly fast.&lt;/p&gt;

&lt;p&gt;Here is the final performance comparison against Google's upstream C++ implementation:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Operation&lt;/th&gt;
&lt;th&gt;C++ (&lt;code&gt;libphonenumber&lt;/code&gt; + RE2)&lt;/th&gt;
&lt;th&gt;Rust (&lt;code&gt;rlibphonenumber&lt;/code&gt;)&lt;/th&gt;
&lt;th&gt;Speedup&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Parsing&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;2279 ns&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;506 ns&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~ 4.5x&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Format (E.164)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;63 ns&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;36 ns&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~ 1.7x&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Format (International)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;2028 ns&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;447 ns&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~ 4.5x&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Format (National)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;2484 ns&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;578 ns&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~ 4.3x&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;We hit ~500 nanoseconds for a full parse, and just over 30 nanoseconds for E.164 formatting. &lt;/p&gt;

&lt;p&gt;If you are dealing with high-throughput backend services you can find the repository here: &lt;a href="https://github.com/vloldik/rlibphonenumber" rel="noopener noreferrer"&gt;github.com/vloldik/rlibphonenumber&lt;/a&gt;&lt;/p&gt;

</description>
      <category>rust</category>
      <category>cpp</category>
      <category>performance</category>
    </item>
  </channel>
</rss>
