<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Dharrsan Amarnath</title>
    <description>The latest articles on Forem by Dharrsan Amarnath (@dharrsan-hq).</description>
    <link>https://forem.com/dharrsan-hq</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3887343%2F8646bea5-9a36-4ce9-8bc7-f7cd7df568e6.jpeg</url>
      <title>Forem: Dharrsan Amarnath</title>
      <link>https://forem.com/dharrsan-hq</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/dharrsan-hq"/>
    <language>en</language>
    <item>
      <title>Your Struct is Wasting Memory and You Don't Know It</title>
      <dc:creator>Dharrsan Amarnath</dc:creator>
      <pubDate>Sat, 25 Apr 2026 00:46:38 +0000</pubDate>
      <link>https://forem.com/dharrsan-hq/your-struct-is-wasting-memory-and-you-dont-know-it-12m</link>
      <guid>https://forem.com/dharrsan-hq/your-struct-is-wasting-memory-and-you-dont-know-it-12m</guid>
      <description>&lt;p&gt;We write structs by listing fields in whatever order feels readable. Name, then age, then score. It compiles. It runs. The compiler silently bloats it, misaligns it, or both, and you ship it without ever checking.&lt;/p&gt;

&lt;p&gt;Here are three structs holding the exact same six fields:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="cp"&gt;#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;stdio.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;stdint.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;stddef.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
&lt;/span&gt;
&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;Good&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;double&lt;/span&gt; &lt;span class="n"&gt;balance&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kt"&gt;uint64_t&lt;/span&gt; &lt;span class="n"&gt;transaction_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kt"&gt;int32_t&lt;/span&gt; &lt;span class="n"&gt;account_type&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kt"&gt;int16_t&lt;/span&gt; &lt;span class="n"&gt;region_code&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="n"&gt;currency&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;Bad&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kt"&gt;double&lt;/span&gt; &lt;span class="n"&gt;balance&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kt"&gt;int16_t&lt;/span&gt; &lt;span class="n"&gt;region_code&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kt"&gt;uint64_t&lt;/span&gt; &lt;span class="n"&gt;transaction_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="n"&gt;currency&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
    &lt;span class="kt"&gt;int32_t&lt;/span&gt; &lt;span class="n"&gt;account_type&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;__attribute__&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;packed&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="n"&gt;PackedBad&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kt"&gt;double&lt;/span&gt; &lt;span class="n"&gt;balance&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kt"&gt;int16_t&lt;/span&gt; &lt;span class="n"&gt;region_code&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kt"&gt;uint64_t&lt;/span&gt; &lt;span class="n"&gt;transaction_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="n"&gt;currency&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
    &lt;span class="kt"&gt;int32_t&lt;/span&gt; &lt;span class="n"&gt;account_type&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Good:      %zu bytes&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;sizeof&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;Good&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
    &lt;span class="n"&gt;printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Bad:       %zu bytes&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;sizeof&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;Bad&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
    &lt;span class="n"&gt;printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"PackedBad: %zu bytes&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;sizeof&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;PackedBad&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Good:      32 bytes
Bad:       40 bytes
PackedBad: 27 bytes
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same fields. 27, 32, and 40 bytes. The difference is not the data. It is the order and whether you let the compiler do its job.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Happens When You Read One Byte
&lt;/h2&gt;

&lt;p&gt;Before touching any struct, you need to understand how the CPU actually talks to RAM. There are three buses connecting them.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;address bus&lt;/strong&gt; carries the memory address the CPU wants to read. It is 48 to 52 physical wires on a modern system. The CPU puts a number on these wires and RAM listens.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;data bus&lt;/strong&gt; carries the actual bytes back. It is 64 bits wide, so 8 bytes travel in parallel per transfer. But the CPU does not stop at 8 bytes. It keeps bursting transfers across the data bus until it has filled a full &lt;strong&gt;cache line&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;A cache line is 64 bytes. That is the only unit of communication between RAM and your L1/L2 cache. The CPU never fetches 1 byte. It never fetches 8 bytes. It always fetches 64 bytes. When you read a single &lt;code&gt;char&lt;/code&gt;, the CPU puts that char's address on the address bus, pulls the entire 64-byte block containing it across the data bus, stores it in cache, and then gives you your one byte out of it.&lt;/p&gt;

&lt;p&gt;Every cache line starts at an address that is a multiple of 64. Cache line 0 covers &lt;code&gt;0x0000&lt;/code&gt; to &lt;code&gt;0x003F&lt;/code&gt; (0 to 63). Cache line 1 covers &lt;code&gt;0x0040&lt;/code&gt; to &lt;code&gt;0x007F&lt;/code&gt; (64 to 127). Cache line 2 covers &lt;code&gt;0x0080&lt;/code&gt; to &lt;code&gt;0x00BF&lt;/code&gt; (128 to 191). The boundaries are fixed and always at multiples of 64.&lt;/p&gt;

&lt;p&gt;This is the rule everything else in this post follows.&lt;/p&gt;




&lt;h2&gt;
  
  
  Natural Alignment and Why It Matters
&lt;/h2&gt;

&lt;p&gt;Every data type has an alignment requirement equal to its own size. A &lt;code&gt;double&lt;/code&gt; (8 bytes) must start at an address divisible by 8. A &lt;code&gt;uint32_t&lt;/code&gt; (4 bytes) must start at an address divisible by 4. A &lt;code&gt;char&lt;/code&gt; (1 byte) can go anywhere.&lt;/p&gt;

&lt;p&gt;When a field sits at a naturally aligned address the CPU reads it in one bus transaction. It fits cleanly inside one cache line fetch.&lt;/p&gt;

&lt;p&gt;When a field is misaligned it can straddle a cache line boundary. Say a &lt;code&gt;double&lt;/code&gt; starts at &lt;code&gt;0x003C&lt;/code&gt; (60). It is 8 bytes, so it occupies &lt;code&gt;0x003C&lt;/code&gt; to &lt;code&gt;0x0043&lt;/code&gt; (60 to 67). Cache line 0 ends at &lt;code&gt;0x003F&lt;/code&gt; (63). Cache line 1 starts at &lt;code&gt;0x0040&lt;/code&gt; (64). Your &lt;code&gt;double&lt;/code&gt; is split across both. The CPU issues an address request for cache line 0, waits for the data bus to deliver, then issues a second address request for cache line 1, waits again, and stitches both halves together in hardware. Two full round trips to memory for one field read.&lt;/p&gt;

&lt;p&gt;Now think about why &lt;code&gt;address mod data_size == 0&lt;/code&gt; prevents this. Cache line boundaries sit at multiples of 64. A naturally aligned &lt;code&gt;double&lt;/code&gt; sits at a multiple of 8. The worst case is a &lt;code&gt;double&lt;/code&gt; at &lt;code&gt;0x0038&lt;/code&gt; (56), occupying bytes 56 to 63. It ends exactly at the cache line boundary, never crossing it. This works because 64 is itself a multiple of 8. A field aligned to its own size mathematically cannot straddle a boundary that is also a multiple of that same size. So &lt;code&gt;address mod data_size == 0&lt;/code&gt; is not a style convention. It is the condition that guarantees your field lives inside exactly one cache line, fetched in exactly one bus transaction, with no possibility of being split.&lt;/p&gt;

&lt;p&gt;The compiler inserts padding between fields to maintain this guarantee. Bad field ordering forces it to insert a lot of padding. And &lt;code&gt;packed&lt;/code&gt; removes all of it.&lt;/p&gt;




&lt;h2&gt;
  
  
  Good : 32 bytes, nothing wasted
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;0x0000      0x0008      0x0010 0x0014 0x0016 0x0017  0x001B  0x001F
(0)         (8)         (16)   (20)   (22)   (23)    (27)    (31)
|-----------|-----------|------|----|--|------|--------|
balance     trans_id    acct   reg  st currency  pad
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;balance&lt;/code&gt; at &lt;code&gt;0x0000&lt;/code&gt; (0). &lt;code&gt;0 mod 8 = 0&lt;/code&gt;. Aligned.&lt;br&gt;
&lt;code&gt;transaction_id&lt;/code&gt; at &lt;code&gt;0x0008&lt;/code&gt; (8). &lt;code&gt;8 mod 8 = 0&lt;/code&gt;. Aligned.&lt;br&gt;
&lt;code&gt;account_type&lt;/code&gt; at &lt;code&gt;0x0010&lt;/code&gt; (16). &lt;code&gt;16 mod 4 = 0&lt;/code&gt;. Aligned.&lt;br&gt;
&lt;code&gt;region_code&lt;/code&gt; at &lt;code&gt;0x0014&lt;/code&gt; (20). &lt;code&gt;20 mod 2 = 0&lt;/code&gt;. Aligned.&lt;br&gt;
&lt;code&gt;status&lt;/code&gt; at &lt;code&gt;0x0016&lt;/code&gt; (22). Char, goes anywhere.&lt;br&gt;
&lt;code&gt;currency&lt;/code&gt; at &lt;code&gt;0x0017&lt;/code&gt; (23). Char array, goes anywhere.&lt;/p&gt;

&lt;p&gt;Every field starts exactly where the previous one ended. Zero internal padding.&lt;/p&gt;

&lt;p&gt;The 5 bytes at the end are tail padding. In an array the second element must start at an address divisible by 8, the largest field alignment. Without tail padding the second element begins at &lt;code&gt;0x001B&lt;/code&gt; (27) and its &lt;code&gt;balance&lt;/code&gt; field lands there too. &lt;code&gt;27 mod 8 = 3&lt;/code&gt;. Misaligned. So the compiler rounds 27 up to 32. The second element starts at &lt;code&gt;0x0020&lt;/code&gt; (32). &lt;code&gt;32 mod 8 = 0&lt;/code&gt;. Clean.&lt;/p&gt;

&lt;p&gt;Zero bytes wasted internally. The tail padding is structural and unavoidable.&lt;/p&gt;


&lt;h2&gt;
  
  
  Bad : 40 bytes, 13 bytes of dead space
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;0x0000 0x0001    0x0008      0x0010 0x0012    0x0018      0x0020 0x0024 0x0028
(0)    (1)       (8)         (16)   (18)      (24)        (32)   (36)   (40)
|------|---------|-----------|------|---------|-----------|------|----|
st     7B pad    balance     reg    6B pad    trans_id    curr   acct
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;&lt;code&gt;status&lt;/code&gt; at &lt;code&gt;0x0000&lt;/code&gt; (0), one byte. The next field is &lt;code&gt;balance&lt;/code&gt;, a &lt;code&gt;double&lt;/code&gt; that needs a multiple of 8. After byte 1, the nearest multiple of 8 is &lt;code&gt;0x0008&lt;/code&gt; (8). The compiler inserts 7 bytes of padding between them that store nothing and do nothing.&lt;/p&gt;

&lt;p&gt;Then &lt;code&gt;region_code&lt;/code&gt; lands at &lt;code&gt;0x0010&lt;/code&gt; (16), two bytes, ending at &lt;code&gt;0x0011&lt;/code&gt; (17). The field after it is &lt;code&gt;transaction_id&lt;/code&gt;, which needs a multiple of 8. The nearest is &lt;code&gt;0x0018&lt;/code&gt; (24). Six more bytes gone.&lt;/p&gt;

&lt;p&gt;13 bytes wasted purely from putting &lt;code&gt;char status&lt;/code&gt; first. In an array of a million of these structs that is 13MB of RAM holding nothing. The struct is 25% larger than it needs to be, meaning fewer elements fit per cache line and more trips to RAM on every access pattern.&lt;/p&gt;


&lt;h2&gt;
  
  
  PackedBad : 27 bytes, zero padding, four misaligned fields
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;0x0000 0x0001    0x0009 0x000B      0x0013  0x0017 0x001B
(0)    (1)       (9)    (11)        (19)    (23)   (27)
|------|---------|------|-----------|--------|------|
st     balance   reg    trans_id    curr    acct
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;&lt;code&gt;__attribute__((packed))&lt;/code&gt; removes all padding. Fields sit back to back. 27 bytes. But look at where each field actually lands:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;balance&lt;/code&gt; at &lt;code&gt;0x0001&lt;/code&gt; (1). &lt;code&gt;1 mod 8 = 1&lt;/code&gt;. Not 0. Misaligned.&lt;br&gt;
&lt;code&gt;region_code&lt;/code&gt; at &lt;code&gt;0x0009&lt;/code&gt; (9). &lt;code&gt;9 mod 2 = 1&lt;/code&gt;. Not 0. Misaligned.&lt;br&gt;
&lt;code&gt;transaction_id&lt;/code&gt; at &lt;code&gt;0x000B&lt;/code&gt; (11). &lt;code&gt;11 mod 8 = 3&lt;/code&gt;. Not 0. Misaligned.&lt;br&gt;
&lt;code&gt;account_type&lt;/code&gt; at &lt;code&gt;0x0017&lt;/code&gt; (23). &lt;code&gt;23 mod 4 = 3&lt;/code&gt;. Not 0. Misaligned.&lt;/p&gt;

&lt;p&gt;Four fields, zero aligned. In an array, whether a given element straddles a cache line boundary depends on its index. You can check with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;(index x struct_size) mod 64 + struct_size &amp;gt; 64
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For element 2 of &lt;code&gt;PackedBad&lt;/code&gt;: &lt;code&gt;(2 x 27) mod 64 + 27 = 54 + 27 = 81&lt;/code&gt;. Since 81 &amp;gt; 64, element 2 straddles. Its bytes run from &lt;code&gt;0x0036&lt;/code&gt; (54) to &lt;code&gt;0x0054&lt;/code&gt; (84), crossing the cache line boundary at &lt;code&gt;0x0040&lt;/code&gt; (64). The CPU issues an address request for cache line 0 (&lt;code&gt;0x0000&lt;/code&gt; to &lt;code&gt;0x003F&lt;/code&gt;), waits for the data bus, issues a request for cache line 1 (&lt;code&gt;0x0040&lt;/code&gt; to &lt;code&gt;0x007F&lt;/code&gt;), waits again, and stitches both halves. Two full round trips for one struct read. You saved 13 bytes on paper and doubled your memory traffic in practice.&lt;/p&gt;




&lt;h2&gt;
  
  
  Tearing
&lt;/h2&gt;

&lt;p&gt;The straddle is slow. In single-threaded code it is just slower. In multithreaded code it is also wrong.&lt;/p&gt;

&lt;p&gt;The CPU guarantees a memory access is atomic, meaning indivisible and instantaneous from every other thread's perspective, only when:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;address mod data_size == 0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That condition guarantees the field sits inside one cache line and the CPU fetches it in one bus transaction. One transaction means no window for another thread to slip in.&lt;/p&gt;

&lt;p&gt;When &lt;code&gt;balance&lt;/code&gt; sits at &lt;code&gt;0x0001&lt;/code&gt; (1) in &lt;code&gt;PackedBad&lt;/code&gt;, &lt;code&gt;1 mod 8 = 1&lt;/code&gt;. The condition fails. The CPU fetches the first portion of &lt;code&gt;balance&lt;/code&gt; in one bus transaction, then the second portion in a separate bus transaction. There is a real time gap between them.&lt;/p&gt;

&lt;p&gt;If another thread writes to that same &lt;code&gt;balance&lt;/code&gt; field inside that gap, the reading thread gets the first half from before the write and the second half from after it. A value assembled from two different points in time. A number that was never logically written anywhere in your program.&lt;/p&gt;

&lt;p&gt;No segfault. No assertion. No log line. The field silently reads as garbage. In a monitoring system this corrupts your metrics. In a financial system this is a balance that never existed reaching your business logic.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;Good&lt;/code&gt; and &lt;code&gt;Bad&lt;/code&gt; are both padded by the compiler so every field satisfies &lt;code&gt;address mod data_size == 0&lt;/code&gt;. Tearing cannot happen. &lt;code&gt;PackedBad&lt;/code&gt; has four fields that fail this condition in every element.&lt;/p&gt;




&lt;h2&gt;
  
  
  All Three Side by Side
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Struct&lt;/th&gt;
&lt;th&gt;Size&lt;/th&gt;
&lt;th&gt;Internal padding&lt;/th&gt;
&lt;th&gt;Misaligned fields&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Good&lt;/td&gt;
&lt;td&gt;32 bytes&lt;/td&gt;
&lt;td&gt;0 bytes&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bad&lt;/td&gt;
&lt;td&gt;40 bytes&lt;/td&gt;
&lt;td&gt;13 bytes&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PackedBad&lt;/td&gt;
&lt;td&gt;27 bytes&lt;/td&gt;
&lt;td&gt;0 bytes&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;code&gt;Bad&lt;/code&gt; pays in memory. &lt;code&gt;PackedBad&lt;/code&gt; pays in correctness. &lt;code&gt;Good&lt;/code&gt; pays nothing.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Fix Is Just Field Order
&lt;/h2&gt;

&lt;p&gt;Order fields from largest alignment requirement to smallest:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;Good&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;double&lt;/span&gt; &lt;span class="n"&gt;balance&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;          &lt;span class="c1"&gt;// 8 bytes&lt;/span&gt;
    &lt;span class="kt"&gt;uint64_t&lt;/span&gt; &lt;span class="n"&gt;transaction_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// 8 bytes&lt;/span&gt;
    &lt;span class="kt"&gt;int32_t&lt;/span&gt; &lt;span class="n"&gt;account_type&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;    &lt;span class="c1"&gt;// 4 bytes&lt;/span&gt;
    &lt;span class="kt"&gt;int16_t&lt;/span&gt; &lt;span class="n"&gt;region_code&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;     &lt;span class="c1"&gt;// 2 bytes&lt;/span&gt;
    &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;             &lt;span class="c1"&gt;// 1 byte&lt;/span&gt;
    &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="n"&gt;currency&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;        &lt;span class="c1"&gt;// 1 byte alignment&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The compiler has nothing to pad because each field naturally follows the previous one without any gap. No attributes, no pragmas. Just ordering.&lt;/p&gt;

&lt;p&gt;Verify with &lt;code&gt;sizeof&lt;/code&gt;. Inspect individual field positions with &lt;code&gt;__builtin_offsetof(struct Foo, field)&lt;/code&gt; when something looks off.&lt;/p&gt;




&lt;h2&gt;
  
  
  When packed Is Actually Correct
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;__attribute__((packed))&lt;/code&gt; has one valid use: serializing data onto a network socket or disk, where you control both ends and the CPU never does arithmetic directly on the packed bytes.&lt;/p&gt;

&lt;p&gt;You pack the struct, write the raw bytes to the wire, and on the receiving end you copy into a properly aligned struct before reading any field. The packed struct is a transport container, not a data structure your code operates on. The moment you read fields out of a packed struct in a running program you pay the straddle penalty on every access and you are one concurrent write away from tearing.&lt;/p&gt;




&lt;h2&gt;
  
  
  False Sharing
&lt;/h2&gt;

&lt;p&gt;You fix your field order. You remove &lt;code&gt;packed&lt;/code&gt;. Everything is aligned. You go multithreaded and all cores pin at 100% while throughput collapses.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;struct Good&lt;/code&gt; is 32 bytes. Two of them fit inside one 64-byte cache line. Say your array starts at &lt;code&gt;0x1000&lt;/code&gt; (4096). &lt;code&gt;arr[0]&lt;/code&gt; lives at &lt;code&gt;0x1000&lt;/code&gt; to &lt;code&gt;0x101F&lt;/code&gt; (4096 to 4127). &lt;code&gt;arr[1]&lt;/code&gt; lives at &lt;code&gt;0x1020&lt;/code&gt; to &lt;code&gt;0x103F&lt;/code&gt; (4128 to 4159). Both sit inside the single cache line spanning &lt;code&gt;0x1000&lt;/code&gt; to &lt;code&gt;0x103F&lt;/code&gt; (4096 to 4159).&lt;/p&gt;

&lt;p&gt;Thread 1 writes to &lt;code&gt;arr[0]&lt;/code&gt;. Thread 2 writes to &lt;code&gt;arr[1]&lt;/code&gt;. Different structs. No shared fields. No mutex involved. But both live in the same 64-byte cache line.&lt;/p&gt;

&lt;p&gt;Every time Thread 1 writes to &lt;code&gt;arr[0]&lt;/code&gt;, the CPU's MESI cache coherency protocol broadcasts an invalidation across the ring bus to every other core: the cache line at &lt;code&gt;0x1000&lt;/code&gt; was modified, your copies are stale, drop them. Thread 2 has its L1 cache entry for &lt;code&gt;arr[1]&lt;/code&gt; ripped away even though nobody touched &lt;code&gt;arr[1]&lt;/code&gt;. It takes an L1 miss, goes out to L3, fetches the 64-byte line again, modifies &lt;code&gt;arr[1]&lt;/code&gt;, and now Thread 1 gets invalidated. Back and forth. The cores spend the vast majority of their time passing one cache line across the ring bus and almost no time doing actual work.&lt;/p&gt;

&lt;p&gt;The fix is to give each struct its own cache line:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;__attribute__&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;aligned&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt; &lt;span class="n"&gt;NodeMetrics&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;double&lt;/span&gt; &lt;span class="n"&gt;balance&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kt"&gt;uint64_t&lt;/span&gt; &lt;span class="n"&gt;transaction_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kt"&gt;int32_t&lt;/span&gt; &lt;span class="n"&gt;account_type&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kt"&gt;int16_t&lt;/span&gt; &lt;span class="n"&gt;region_code&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="n"&gt;currency&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now &lt;code&gt;arr[0]&lt;/code&gt; owns &lt;code&gt;0x1000&lt;/code&gt; to &lt;code&gt;0x103F&lt;/code&gt; (4096 to 4159) entirely. &lt;code&gt;arr[1]&lt;/code&gt; owns &lt;code&gt;0x1040&lt;/code&gt; to &lt;code&gt;0x107F&lt;/code&gt; (4160 to 4223) entirely. Thread 1 and Thread 2 never touch the same cache line and the coherency protocol never fires between them. You waste 32 bytes per struct. You get linear scaling across every core.&lt;/p&gt;




&lt;h2&gt;
  
  
  Takeaway
&lt;/h2&gt;

&lt;p&gt;Order fields largest to smallest. Verify with &lt;code&gt;sizeof&lt;/code&gt;. Check offsets with &lt;code&gt;__builtin_offsetof&lt;/code&gt; when something feels off. Use &lt;code&gt;packed&lt;/code&gt; only for wire or disk formats where you control both ends. Pad to 64 bytes with &lt;code&gt;aligned(64)&lt;/code&gt; only when multiple threads write to adjacent elements of an array.&lt;/p&gt;

</description>
      <category>assembly</category>
      <category>c</category>
      <category>architecture</category>
    </item>
    <item>
      <title>Why Blockchains Exclude Floating Point at the Architecture Level</title>
      <dc:creator>Dharrsan Amarnath</dc:creator>
      <pubDate>Mon, 20 Apr 2026 00:51:11 +0000</pubDate>
      <link>https://forem.com/dharrsan-hq/why-blockchains-exclude-floating-point-at-the-architecture-level-8a4</link>
      <guid>https://forem.com/dharrsan-hq/why-blockchains-exclude-floating-point-at-the-architecture-level-8a4</guid>
      <description>&lt;blockquote&gt;
&lt;h2&gt;
  
  
  I ran the same C program on three machines. Same code. Same inputs. Three different answers. Here's exactly why  
&lt;/h2&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The Experiment
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="cp"&gt;#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;stdio.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;long&lt;/span&gt; &lt;span class="kt"&gt;double&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;1L&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2L&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="n"&gt;printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"%.20Lf&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="kt"&gt;unsigned&lt;/span&gt; &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;unsigned&lt;/span&gt; &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="k"&gt;sizeof&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"%02x "&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
    &lt;span class="n"&gt;printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three machines. All running the same binary-equivalent logic:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Machine&lt;/th&gt;
&lt;th&gt;OS&lt;/th&gt;
&lt;th&gt;Architecture&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;A&lt;/td&gt;
&lt;td&gt;Linux&lt;/td&gt;
&lt;td&gt;AMD x86_64&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;B&lt;/td&gt;
&lt;td&gt;Linux&lt;/td&gt;
&lt;td&gt;Raspberry Pi ARMv8&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;C&lt;/td&gt;
&lt;td&gt;macOS&lt;/td&gt;
&lt;td&gt;Apple Silicon M4 (ARM64)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  The Results
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Machine A: AMD x86_64 Linux (GCC)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;0.30000000000000001665
9f 93 54 5d e9 52 49 81 ff 3f 00 00 00 00 00 00
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;sizeof(long double)&lt;/code&gt; = &lt;strong&gt;16 bytes&lt;/strong&gt; on this machine. But only the first 10 bytes hold actual data: the remaining 6 are padding added for alignment. The meaningful precision lives in an 80-bit format called &lt;strong&gt;x87 extended precision&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Machine B: Raspberry Pi ARM Linux (GCC)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;0.30000000000000004441
34 33 33 33 33 33 33 33 33 33 33 33 33 33 fd 3f
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;sizeof(long double)&lt;/code&gt; = &lt;strong&gt;16 bytes&lt;/strong&gt; here too but the byte layout is completely different. On ARM Linux, GCC implements &lt;code&gt;long double&lt;/code&gt; as &lt;strong&gt;software-emulated 128-bit quad precision&lt;/strong&gt; (IEEE-754 binary128). The bytes are not compatible with Machine A's output, even though both are nominally "16 bytes."&lt;/p&gt;

&lt;h3&gt;
  
  
  Machine C: Apple M4 (ARM64, Clang)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;0.30000000000000004
9a 99 99 99 99 99 d3 3f
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;sizeof(long double)&lt;/code&gt; = &lt;strong&gt;8 bytes&lt;/strong&gt;. On Apple Silicon, Clang maps &lt;code&gt;long double&lt;/code&gt; to the same 64-bit &lt;code&gt;double&lt;/code&gt; type. There is no extended precision. What you write is exactly what you compute.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why They Disagree: The IEEE-754 Representation Problem
&lt;/h2&gt;

&lt;p&gt;This is not a hardware quality issue. It is a &lt;strong&gt;representation&lt;/strong&gt; issue.&lt;/p&gt;

&lt;h3&gt;
  
  
  The core problem: not all decimals fit in binary
&lt;/h3&gt;

&lt;p&gt;The decimal number &lt;code&gt;0.1&lt;/code&gt; in binary is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;0.0001100110011001100110011001100110011001100110011001100110...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It repeats infinitely. A computer must cut it off at a finite number of bits and round. In IEEE-754 double (64-bit), that cutoff is at &lt;strong&gt;52 bits of mantissa&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The layout of a 64-bit IEEE-754 double is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─────────┬───────────────────┬──────────────────────────────────────────────────────┐
│  Sign   │     Exponent      │                    Mantissa                          │
│  1 bit  │     11 bits       │                    52 bits                           │
└─────────┴───────────────────┴──────────────────────────────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So before addition even happens:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;0.1  ≈  0.1000000000000000055511151231257827021181583404541015625
0.2  ≈  0.2000000000000000111022302462515654042363166809082031250
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These are not 0.1 and 0.2. They are the &lt;strong&gt;closest representable binary fractions&lt;/strong&gt;. The rounding error is baked in before a single arithmetic operation runs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why addition makes it worse across machines
&lt;/h3&gt;

&lt;p&gt;When you add the two rounded approximations, the machine has to round again and &lt;em&gt;where&lt;/em&gt; that second rounding happens depends on how wide the intermediate register is.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Machine&lt;/th&gt;
&lt;th&gt;Intermediate register width&lt;/th&gt;
&lt;th&gt;What this means&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;x86 Linux (A)&lt;/td&gt;
&lt;td&gt;x87 80-bit extended&lt;/td&gt;
&lt;td&gt;Computation happens with 64 bits of mantissa; rounded back down when written to memory&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ARM Linux (B)&lt;/td&gt;
&lt;td&gt;Software 128-bit&lt;/td&gt;
&lt;td&gt;The rounding rules of a software IEEE-754 quad implementation are used; produces a different truncation point&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Apple M4 (C)&lt;/td&gt;
&lt;td&gt;64-bit strict&lt;/td&gt;
&lt;td&gt;No intermediate widening at all; the mantissa is 52 bits throughout, start to finish&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The rounding path is different. So the final bit pattern is different.&lt;/p&gt;

&lt;h3&gt;
  
  
  What the hex reveals
&lt;/h3&gt;

&lt;p&gt;Machine A's 16-byte hex: &lt;code&gt;9f 93 54 5d e9 52 49 81 ff 3f 00 00 00 00 00 00&lt;/code&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Bytes 0–9: the 80-bit extended value&lt;/li&gt;
&lt;li&gt;Bytes 10–15: compiler-inserted padding (&lt;code&gt;00 00 ...&lt;/code&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Machine B's 16-byte hex: &lt;code&gt;34 33 33 33 33 33 33 33 33 33 33 33 33 33 fd 3f&lt;/code&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;All 16 bytes carry data this is a real 128-bit float&lt;/li&gt;
&lt;li&gt;The repeating &lt;code&gt;33&lt;/code&gt; pattern is the binary encoding of &lt;code&gt;0.3333...&lt;/code&gt; the internal representation of the rounded result at 128-bit precision&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Machine C's 8-byte hex: &lt;code&gt;9a 99 99 99 99 99 d3 3f&lt;/code&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A standard IEEE-754 double, little-endian&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;3f d3 99 99 99 99 99 9a&lt;/code&gt; in big-endian: sign=0, exponent=01111111101 (= -2), mantissa = &lt;code&gt;0011001100110011...&lt;/code&gt; the truncated binary of 0.3 at 52 bits&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Why This Is Catastrophic for Distributed Systems
&lt;/h2&gt;

&lt;p&gt;Consider a simple balance operation repeated across nodes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;balance&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;balance&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mf"&gt;1.000000001&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After 10 million such operations on a real bank ledger:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Node A (x86): &lt;code&gt;$1,000.00000823...&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Node B (ARM): &lt;code&gt;$1,000.00000847...&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Node C (M4):  &lt;code&gt;$1,000.00000819...&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The states have diverged. Each node believes a different truth. There is no consensus.&lt;/p&gt;

&lt;p&gt;In a traditional distributed database, this is serious but recoverable a primary node's value wins, replicas sync. But in a blockchain, &lt;strong&gt;there is no primary node&lt;/strong&gt;. Every node is equal. Every node must independently arrive at the exact same bit-for-bit result. If they don't, the network fractures.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Blockchain Solution: Integer Arithmetic Only
&lt;/h2&gt;

&lt;p&gt;Blockchains don't try to fix floating point. They remove it.&lt;/p&gt;

&lt;h3&gt;
  
  
  How integers solve the problem
&lt;/h3&gt;

&lt;p&gt;Integer arithmetic has no mantissa, no exponent, no rounding mode. &lt;code&gt;100 + 200 = 300&lt;/code&gt; on x86, ARMv8, RISC-V, MIPS, and every other architecture, identically, always. There is nothing to round. There are no intermediate registers with different widths.&lt;/p&gt;

&lt;p&gt;Integers are &lt;strong&gt;bit-for-bit deterministic across all architectures&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  How major chains implement this
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Ethereum&lt;/strong&gt; represents all value in &lt;strong&gt;wei&lt;/strong&gt;, stored as &lt;code&gt;uint256&lt;/code&gt;. 1 ETH = 10¹⁸ wei. The Ethereum Virtual Machine (EVM) has explicit opcodes for integer arithmetic and deliberately &lt;strong&gt;has no floating-point opcode&lt;/strong&gt;. Smart contract developers who want decimal semantics must implement fixed-point arithmetic manually using integer scaling.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solana&lt;/strong&gt; represents all value in &lt;strong&gt;lamports&lt;/strong&gt;, stored as &lt;code&gt;uint64&lt;/code&gt;. 1 SOL = 10⁹ lamports. Programs running in the Sealevel runtime must use integer arithmetic for any computation that enters the ledger.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Polkadot&lt;/strong&gt; represents all value in &lt;strong&gt;planck&lt;/strong&gt;, stored as &lt;code&gt;u128&lt;/code&gt;. 1 DOT = 10¹⁰ planck. Logic runs inside WebAssembly-based runtimes where all balance and governance arithmetic is handled exclusively through integer types from Rust's standard library &lt;code&gt;u128&lt;/code&gt;, &lt;code&gt;u64&lt;/code&gt;, never floats.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;Chain       | Unit      | Type    | Scale
------------|-----------|---------|------------------------
Ethereum    | wei       | uint256 | 10^18 per ETH
Solana      | lamport   | uint64  | 10^9 per SOL
Polkadot    | planck    | u128    | 10^10 per DOT
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  What about real-world prices? (The oracle problem)
&lt;/h3&gt;

&lt;p&gt;Real-world prices ETH/USD, BTC/EUR are inherently decimal data. How do oracle networks like Chainlink handle this without introducing float?&lt;/p&gt;

&lt;p&gt;Floating point exists off-chain, integers cross the boundary.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Price data is collected off-chain from exchanges as human-readable decimals&lt;/li&gt;
&lt;li&gt;Chainlink converts them to integers using &lt;code&gt;parseUnits()&lt;/code&gt; passing the value as a &lt;strong&gt;string&lt;/strong&gt;, not a float, to avoid precision loss at the conversion step itself&lt;/li&gt;
&lt;li&gt;The resulting integer is submitted on-chain&lt;/li&gt;
&lt;li&gt;Smart contracts only ever see and operate on the scaled integer
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// WRONG — multiplying a float loses precision before it even hits the chain&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;amount&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.1&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="nx"&gt;e18&lt;/span&gt;  &lt;span class="c1"&gt;// imprecise&lt;/span&gt;

&lt;span class="c1"&gt;// CORRECT — string-based conversion, no precision loss&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;amount&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;parseUnits&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;0.1&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;18&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;// → 100000000000000000n (exact)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The reverse works the same way &lt;code&gt;formatUnits()&lt;/code&gt; converts the on-chain integer back to a human-readable string for display, without ever passing through a float.&lt;/p&gt;




&lt;h2&gt;
  
  
  Take away:
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;Blockchains reject floating point not because it is inaccurate, but because &lt;strong&gt;it is not reproducible across machines at the bit level&lt;/strong&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>web3</category>
      <category>blockchain</category>
      <category>programming</category>
      <category>architecture</category>
    </item>
  </channel>
</rss>
