<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Yu Qian Yang</title>
    <description>The latest articles on Forem by Yu Qian Yang (@yu_qianyang_db00999789f2).</description>
    <link>https://forem.com/yu_qianyang_db00999789f2</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3878715%2Fd15d7afc-e5d1-40ad-a32f-da57f1977247.png</url>
      <title>Forem: Yu Qian Yang</title>
      <link>https://forem.com/yu_qianyang_db00999789f2</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/yu_qianyang_db00999789f2"/>
    <language>en</language>
    <item>
      <title>How I Used Bit Manipulation to Speed Up Float-to-Int Conversion in a Storage Engine</title>
      <dc:creator>Yu Qian Yang</dc:creator>
      <pubDate>Wed, 15 Apr 2026 02:36:14 +0000</pubDate>
      <link>https://forem.com/yu_qianyang_db00999789f2/how-i-used-bit-manipulation-to-speed-up-float-to-int-conversion-in-a-storage-engine-4p5e</link>
      <guid>https://forem.com/yu_qianyang_db00999789f2/how-i-used-bit-manipulation-to-speed-up-float-to-int-conversion-in-a-storage-engine-4p5e</guid>
      <description>&lt;p&gt;In a columnar time-series database, one of the most effective compression tricks is&lt;br&gt;
deceptively simple: &lt;strong&gt;if a float value is actually an integer, store it as one.&lt;/strong&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Why Integers Compress Better Than Floats
&lt;/h2&gt;

&lt;p&gt;Integer compression algorithms like Delta-of-Delta, ZigZag, and Simple8b work by&lt;br&gt;
exploiting predictable bit patterns — small deltas between adjacent values, values&lt;br&gt;
that fit in fewer than 64 bits, and so on. They can pack multiple values into a&lt;br&gt;
single 64-bit word.&lt;/p&gt;

&lt;p&gt;Floats don't cooperate with these schemes. Even &lt;code&gt;1.0&lt;/code&gt; and &lt;code&gt;2.0&lt;/code&gt; have completely&lt;br&gt;
different IEEE 754 bit representations (&lt;code&gt;0x3FF0000000000000&lt;/code&gt; and &lt;code&gt;0x4000000000000000&lt;/code&gt;).&lt;br&gt;
Their XOR is large, their delta is meaningless as an integer, and bit-packing is useless.&lt;/p&gt;

&lt;p&gt;So when a column is declared as &lt;code&gt;FLOAT&lt;/code&gt; but actually contains values like &lt;code&gt;12.0&lt;/code&gt;,&lt;br&gt;
&lt;code&gt;18.0&lt;/code&gt;, &lt;code&gt;25.0&lt;/code&gt; — which happens more often than you'd expect, either because the schema&lt;br&gt;
was designed generically or because the upstream system always emits &lt;code&gt;.0&lt;/code&gt; values — you're&lt;br&gt;
leaving significant compression headroom on the table.&lt;/p&gt;

&lt;p&gt;The fix: &lt;strong&gt;detect these integer-valued floats at encode time, convert them losslessly&lt;br&gt;
to integers, and route them through the integer compression path.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A temperature sensor that reports &lt;code&gt;21.0&lt;/code&gt;, &lt;code&gt;21.5&lt;/code&gt;, &lt;code&gt;22.0&lt;/code&gt; is a good example. Multiply&lt;br&gt;
by 10 and you get &lt;code&gt;210&lt;/code&gt;, &lt;code&gt;215&lt;/code&gt;, &lt;code&gt;220&lt;/code&gt; — plain integers with small, predictable deltas.&lt;br&gt;
Delta-of-Delta or Simple8b will compress these far more efficiently than any&lt;br&gt;
float-specific scheme.&lt;/p&gt;

&lt;p&gt;The challenge: before converting, you need to check whether the scaled value can be&lt;br&gt;
losslessly represented as an integer. The naive check — &lt;code&gt;std::isnan&lt;/code&gt; + range comparison —&lt;br&gt;
works but it's slower than it needs to be on the hot encoding path.&lt;/p&gt;

&lt;p&gt;Here's the faster approach I implemented, using nothing but bit manipulation.&lt;/p&gt;


&lt;h2&gt;
  
  
  The Setup: Scaling Floats to Integers
&lt;/h2&gt;

&lt;p&gt;The encoding scheme works in two steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Scale&lt;/strong&gt;: multiply the float by &lt;code&gt;10^scale&lt;/code&gt; (configurable per column)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Convert&lt;/strong&gt;: cast the scaled value to integer using &lt;code&gt;std::lround&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;For example, with &lt;code&gt;scale = 2&lt;/code&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;1.23&lt;/code&gt; → &lt;code&gt;1.23 * 100 = 123.0&lt;/code&gt; → &lt;code&gt;123&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;45.678&lt;/code&gt; → &lt;code&gt;45.678 * 100 = 4567.8&lt;/code&gt; → overflow risk or precision loss&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Step 2 only makes sense if the scaled value actually fits in the target integer type.&lt;br&gt;
That's the overflow check.&lt;/p&gt;


&lt;h2&gt;
  
  
  The Overflow Check
&lt;/h2&gt;

&lt;p&gt;The function takes a pointer to the raw float bytes and the target integer width in bytes.&lt;br&gt;
It returns non-zero if the value would overflow.&lt;/p&gt;

&lt;p&gt;Called before every conversion — if it fires, skip the integer path and fall back to&lt;br&gt;
float encoding.&lt;/p&gt;

&lt;p&gt;The key insight: &lt;strong&gt;you can determine whether a float overflows a given integer type&lt;br&gt;
purely from the float's exponent bits, without doing any arithmetic.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Here's why.&lt;/p&gt;


&lt;h2&gt;
  
  
  IEEE 754 in One Paragraph
&lt;/h2&gt;

&lt;p&gt;A double-precision float is stored as 64 bits:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[ sign: 1 bit ][ exponent: 11 bits ][ fraction: 52 bits ]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The value is: &lt;code&gt;1.fraction × 2^(exponent − 1023)&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;1023&lt;/code&gt; is the bias — it allows the 11-bit exponent field to represent negative&lt;br&gt;
exponents. The real exponent is &lt;code&gt;stored_exponent − 1023&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;For 32-bit floats: 8 exponent bits, bias 127, fraction 23 bits.&lt;/p&gt;


&lt;h2&gt;
  
  
  Extracting the Exponent
&lt;/h2&gt;

&lt;p&gt;For a &lt;code&gt;double&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="kt"&gt;uint64_t&lt;/span&gt; &lt;span class="n"&gt;bits&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="n"&gt;memcpy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;bits&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;src&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;                        &lt;span class="cm"&gt;/* safe type-pun, no UB */&lt;/span&gt;
&lt;span class="kt"&gt;int16_t&lt;/span&gt; &lt;span class="n"&gt;real_exp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int16_t&lt;/span&gt;&lt;span class="p"&gt;)((&lt;/span&gt;&lt;span class="n"&gt;bits&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;52&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="mh"&gt;0x07ff&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1023&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Step by step:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;memcpy&lt;/code&gt; into a &lt;code&gt;uint64_t&lt;/code&gt; — reinterpret the 8 bytes as a 64-bit integer (no arithmetic, just bits)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;&amp;gt;&amp;gt; 52&lt;/code&gt; — shift right past the 52 fraction bits, bringing the exponent to the low end&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;&amp;amp; 0x07ff&lt;/code&gt; — mask off the sign bit, keep only the 11 exponent bits&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;- 1023&lt;/code&gt; — subtract the bias to get the real exponent&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;For a &lt;code&gt;float&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="kt"&gt;uint32_t&lt;/span&gt; &lt;span class="n"&gt;bits&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="n"&gt;memcpy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;bits&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;src&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kt"&gt;int16_t&lt;/span&gt; &lt;span class="n"&gt;real_exp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int16_t&lt;/span&gt;&lt;span class="p"&gt;)((&lt;/span&gt;&lt;span class="n"&gt;bits&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;23&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="mh"&gt;0xff&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;127&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same logic: shift past 23 fraction bits, mask 8 exponent bits, subtract bias 127.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Overflow Condition
&lt;/h2&gt;

&lt;p&gt;Once you have the real exponent, the overflow check is one comparison:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="n"&gt;is_overflow&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;real_exp&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;int_typewidth&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Where does &lt;code&gt;- 2&lt;/code&gt; come from?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;−1 for the sign bit&lt;/strong&gt;: a signed integer of N bits can hold values up to &lt;code&gt;2^(N-1) - 1&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;−1 for the implicit leading 1&lt;/strong&gt;: in IEEE 754, the fraction is &lt;code&gt;1.fraction&lt;/code&gt;, not &lt;code&gt;0.fraction&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So a float with real exponent &lt;code&gt;E&lt;/code&gt; represents a value with &lt;code&gt;E + 1&lt;/code&gt; significant bits&lt;br&gt;
(the implicit 1 plus E fraction bits). For it to fit in a signed N-bit integer, you&lt;br&gt;
need &lt;code&gt;E + 1 ≤ N - 1&lt;/code&gt;, which simplifies to &lt;code&gt;E ≤ N - 2&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Full implementation in C:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="cp"&gt;#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;stdint.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;string.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
&lt;/span&gt;
&lt;span class="cm"&gt;/* Returns 1 if the double at src overflows a signed integer of int_bytes bytes. */&lt;/span&gt;
&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="kr"&gt;inline&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;double_overflow_check&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;src&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;int_bytes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;uint64_t&lt;/span&gt; &lt;span class="n"&gt;bits&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="n"&gt;memcpy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;bits&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;src&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="kt"&gt;int16_t&lt;/span&gt; &lt;span class="n"&gt;real_exp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int16_t&lt;/span&gt;&lt;span class="p"&gt;)((&lt;/span&gt;&lt;span class="n"&gt;bits&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;52&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="mh"&gt;0x07ff&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1023&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;real_exp&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;int_bytes&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="cm"&gt;/* Returns 1 if the float at src overflows a signed integer of int_bytes bytes. */&lt;/span&gt;
&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="kr"&gt;inline&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;float_overflow_check&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;src&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;int_bytes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;uint32_t&lt;/span&gt; &lt;span class="n"&gt;bits&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="n"&gt;memcpy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;bits&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;src&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="kt"&gt;int16_t&lt;/span&gt; &lt;span class="n"&gt;real_exp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int16_t&lt;/span&gt;&lt;span class="p"&gt;)((&lt;/span&gt;&lt;span class="n"&gt;bits&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;23&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="mh"&gt;0xff&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;127&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;real_exp&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;int_bytes&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Total cost: one &lt;code&gt;memcpy&lt;/code&gt;, one shift, one AND, one subtract, one compare.&lt;br&gt;
No floating-point arithmetic, no branches on the value itself.&lt;/p&gt;


&lt;h2&gt;
  
  
  How It Fits into the Encoder
&lt;/h2&gt;

&lt;p&gt;The encoder scales the value first, then calls the overflow check on the scaled result:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="kt"&gt;double&lt;/span&gt; &lt;span class="n"&gt;scaled&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;orig&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;scaler&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;               &lt;span class="cm"&gt;/* scale: e.g. orig * 100.0 */&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;double_overflow_check&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;scaled&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;sizeof&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int64_t&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;ENCODE_OVERFLOW&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;                  &lt;span class="cm"&gt;/* fall back to float encoding */&lt;/span&gt;

&lt;span class="kt"&gt;int64_t&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;llround&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;scaled&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;            &lt;span class="cm"&gt;/* safe: overflow already ruled out */&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The scale factor is stored in the column header so the decoder can reverse the&lt;br&gt;
operation: &lt;code&gt;decoded = (double)stored_integer / pow(10, scale)&lt;/code&gt;.&lt;/p&gt;


&lt;h2&gt;
  
  
  Why Not Just Use &lt;code&gt;std::isnan&lt;/code&gt; + Range Check?
&lt;/h2&gt;

&lt;p&gt;The conventional approach:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;isnan&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nb"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;INT64_MAX&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;INT64_MIN&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nb"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nb"&gt;true&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This involves floating-point comparisons, which on many architectures require the&lt;br&gt;
value to be loaded into a float register before comparison. On a hot encoding path&lt;br&gt;
processing millions of values, the difference adds up.&lt;/p&gt;

&lt;p&gt;The bit manipulation approach operates entirely on integer registers. The float's&lt;br&gt;
bytes are reinterpreted as an integer — no floating-point unit involved until the&lt;br&gt;
final &lt;code&gt;std::lround&lt;/code&gt; conversion, which only happens when you've already confirmed&lt;br&gt;
no overflow.&lt;/p&gt;


&lt;h2&gt;
  
  
  What This Enables
&lt;/h2&gt;

&lt;p&gt;This check is the entry gate for the full encoding chain:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;float column
    ↓
check_float_overflow   ← this article
    ↓ (passes)
float → integer cast
    ↓
Delta+ZigZag encoding
    ↓
Simple8b bit-packing
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Without a cheap overflow gate, the chain can't run on untrusted float data. With it,&lt;br&gt;
each value costs one check before entering the integer compression path — which can&lt;br&gt;
achieve far better compression ratios than float-specific schemes on "integer-like"&lt;br&gt;
time-series data.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;This article is part of a series on compression engineering in time-series databases:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Part 1&lt;/strong&gt;: Runtime adaptive compression — how the system selects the best algorithm
without scanning all data &lt;em&gt;(published)&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Part 3&lt;/strong&gt;: Chained encoding — the full float-to-integer → Delta+ZigZag → Simple8b pipeline&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Part 4&lt;/strong&gt;: An improved floating-point compression algorithm based on ELF&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;I'm currently available for freelance work on backend systems, storage engineering,&lt;br&gt;
and systems integration. Feel free to reach out.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>cpp</category>
      <category>performance</category>
      <category>database</category>
      <category>algorithms</category>
    </item>
    <item>
      <title>How I Built a Runtime Adaptive Compression System That Selects the Best Algorithm Without Scanning All Data</title>
      <dc:creator>Yu Qian Yang</dc:creator>
      <pubDate>Tue, 14 Apr 2026 13:31:07 +0000</pubDate>
      <link>https://forem.com/yu_qianyang_db00999789f2/how-i-built-a-runtime-adaptive-compression-system-that-selects-the-best-algorithm-without-scanning-j30</link>
      <guid>https://forem.com/yu_qianyang_db00999789f2/how-i-built-a-runtime-adaptive-compression-system-that-selects-the-best-algorithm-without-scanning-j30</guid>
      <description>&lt;p&gt;When I was working on a columnar time-series database built on Greenplum, we faced a&lt;br&gt;
problem that every time-series storage engineer eventually hits:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;No single compression algorithm works best for all data.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;A sensor reading temperature every second looks nothing like a financial tick stream.&lt;br&gt;
A column of integer IDs compresses completely differently from a column of floating-point&lt;br&gt;
measurements. And in the real world, the same column can change character over time —&lt;br&gt;
steady signal for hours, then suddenly a burst of noise.&lt;/p&gt;

&lt;p&gt;The naive answer is: let the user pick an algorithm. But that puts the burden on the user,&lt;br&gt;
and in practice they almost always pick wrong — or just leave the default.&lt;/p&gt;

&lt;p&gt;So I designed &lt;strong&gt;automode&lt;/strong&gt;: a runtime analyzer that samples incoming data, detects its&lt;br&gt;
statistical character, and selects the best encoding chain automatically.&lt;/p&gt;

&lt;p&gt;Here's how it works.&lt;/p&gt;


&lt;h2&gt;
  
  
  The Encoding Chain
&lt;/h2&gt;

&lt;p&gt;Before talking about the analyzer, I need to briefly explain what it's selecting between.&lt;/p&gt;

&lt;p&gt;We built a set of encoding algorithms, each optimized for a specific data pattern:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Run-Length Encoding&lt;/strong&gt; — all values are identical&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Delta-of-Delta&lt;/strong&gt; — slowly changing values (timestamps, smooth sensors)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Delta + ZigZag&lt;/strong&gt; — values oscillating around a baseline&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bit Packing&lt;/strong&gt; — small integers that can be bit-packed&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gorilla&lt;/strong&gt; — floating-point values with small XOR between adjacent values&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The insight that made automode possible: these algorithms operate on fundamentally&lt;br&gt;
different &lt;em&gt;statistical features&lt;/em&gt; of data. If I can identify the dominant feature of a&lt;br&gt;
data stream, I can map it directly to the right algorithm.&lt;/p&gt;


&lt;h2&gt;
  
  
  The Design Problem
&lt;/h2&gt;

&lt;p&gt;The obvious approach — scan all the data and compute statistics — doesn't work in a&lt;br&gt;
storage engine. The whole point of compression is to be fast. Scanning every value&lt;br&gt;
before encoding defeats the purpose.&lt;/p&gt;

&lt;p&gt;I needed a way to characterize a data stream with &lt;strong&gt;minimal overhead&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The solution: &lt;strong&gt;sample windows&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Instead of scanning the full column, I divide the input stream into fixed-size windows&lt;br&gt;
and analyze only a small number of items from each window. Within each window, the&lt;br&gt;
sample start position is randomized to avoid being fooled by local patterns.&lt;/p&gt;

&lt;p&gt;Why randomize? Consider this sequence:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[1 1 1 1 1 1 1 1 1 1 1 1 2 3 2 3 2 3 2 3 2 3 2 3]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you always sample from the start, you'd classify this as "all equal" and pick&lt;br&gt;
Run-Length Encoding. But if you look at the full stream, it's clearly an alternating&lt;br&gt;
pattern — RLE would be terrible. Random sampling within the window gives a much more&lt;br&gt;
representative picture.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Five Features
&lt;/h2&gt;

&lt;p&gt;Each sampled window is analyzed for five statistical features:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;All Equal&lt;/strong&gt; — all sampled values are identical&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Values bitwise small&lt;/strong&gt; — the values themselves fit in few bits&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;1st-order delta small&lt;/strong&gt; — differences between adjacent values are small&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;2nd-order delta small&lt;/strong&gt; — differences between differences are small (delta-of-delta)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Float XOR small&lt;/strong&gt; — XOR of adjacent float values is small (Gorilla pattern)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For each window, the analyzer computes 0th, 1st, and 2nd order differentials, plus&lt;br&gt;
XOR pairs for float data. "Small" is defined bitwise: a value is small if its&lt;br&gt;
significant bit length is below a configurable threshold. This makes the check&lt;br&gt;
robust against occasional spike values — a single outlier doesn't ruin the&lt;br&gt;
classification.&lt;/p&gt;




&lt;h2&gt;
  
  
  Strong vs. Weak Features
&lt;/h2&gt;

&lt;p&gt;Not all features are equal. I introduced a &lt;strong&gt;Strong/Weak classification&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Strongest&lt;/strong&gt;: All Equal → Run-Length Encoding&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Strong&lt;/strong&gt;: 2nd-order delta small → Delta-of-Delta&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Strong&lt;/strong&gt;: Float XOR small → Gorilla&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Weak&lt;/strong&gt;: Values bitwise small → Bit Packing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Weak&lt;/strong&gt;: 1st-order delta small → Delta + ZigZag&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Why does this matter? Consider a case where Delta-of-Delta and Bit Packing score&lt;br&gt;
similarly. Delta-of-Delta is fundamentally better for smoothly varying time-series&lt;br&gt;
data, even with a slightly lower score. The weight system ensures strong methods&lt;br&gt;
are preferred when they're competitive.&lt;/p&gt;




&lt;h2&gt;
  
  
  One Edge Case Worth Noting
&lt;/h2&gt;

&lt;p&gt;The alternating pattern &lt;code&gt;[A B A B A B]&lt;/code&gt; is a trap.&lt;/p&gt;

&lt;p&gt;It looks like "values bitwise small", but Bit Packing performs poorly on alternating&lt;br&gt;
sequences. A general-purpose compressor actually handles this pattern better.&lt;/p&gt;

&lt;p&gt;I added an explicit check: if odd-indexed and even-indexed items within the sample&lt;br&gt;
are each internally equal, it's an alternating pattern — skip the Bit Packing&lt;br&gt;
classification and fall back to general compression.&lt;/p&gt;

&lt;p&gt;This kind of edge case only appears when you test against real production data. In our&lt;br&gt;
case, it showed up in sensor data from a deployment at a large financial institution.&lt;/p&gt;




&lt;h2&gt;
  
  
  Feature → Algorithm Mapping
&lt;/h2&gt;

&lt;p&gt;Once features are aggregated across all sample windows, the dominant feature maps to&lt;br&gt;
an algorithm:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;All Equal&lt;/strong&gt; → Run-Length Encoding&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;2nd-order delta small&lt;/strong&gt; → Delta-of-Delta&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Values bitwise small&lt;/strong&gt; → Bit Packing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;1st-order delta small&lt;/strong&gt; → Delta + ZigZag&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Float XOR small&lt;/strong&gt; → Gorilla&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No clear pattern&lt;/strong&gt; → General Compression (fallback)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For floating-point columns, the encoding chain adds a float-to-integer conversion pass&lt;br&gt;
first — converting floats that happen to be integers (like &lt;code&gt;1.0&lt;/code&gt;, &lt;code&gt;2.0&lt;/code&gt;) into actual&lt;br&gt;
integers before passing to the integer encoding path. This is worth a separate article.&lt;/p&gt;




&lt;h2&gt;
  
  
  Two Modes
&lt;/h2&gt;

&lt;p&gt;The system supports two optimization priorities:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Compression-rate mode&lt;/strong&gt;: maximize storage efficiency&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Speed mode&lt;/strong&gt;: maximize decompression speed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In practice, compression-rate mode is used for cold storage and analytics workloads,&lt;br&gt;
and speed mode for hot data that's frequently queried. The same feature detection runs&lt;br&gt;
in both cases — only the final algorithm selection weights differ.&lt;/p&gt;




&lt;h2&gt;
  
  
  Results
&lt;/h2&gt;

&lt;p&gt;The system is deployed in production at scale, including a large-scale installation&lt;br&gt;
at a major financial institution managing thousands of PBs of time-series data. In&lt;br&gt;
testing, automode consistently matched or outperformed manually-tuned single-algorithm&lt;br&gt;
compression across diverse real-world datasets.&lt;/p&gt;

&lt;p&gt;The key insight that made it work: &lt;strong&gt;you don't need to scan all the data to understand&lt;br&gt;
its character&lt;/strong&gt;. A few dozen samples, taken randomly from fixed-size windows, are&lt;br&gt;
enough to make a reliable classification — as long as you handle the edge cases.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;This article is part of a series on compression engineering in time-series databases:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Part 2&lt;/strong&gt;: Float-to-integer encoding — how I used IEEE 754 bit manipulation to detect
convertibility with near-zero overhead&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Part 3&lt;/strong&gt;: Chained encoding — a unified float compression pipeline&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Part 4&lt;/strong&gt;: An improved floating-point compression algorithm based on ELF&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;I'm currently available for freelance work on backend systems, storage engineering,&lt;br&gt;
and systems integration. Feel free to reach out.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>database</category>
      <category>performance</category>
      <category>cpp</category>
      <category>architecture</category>
    </item>
  </channel>
</rss>
