<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Aman Prasad</title>
    <description>The latest articles on Forem by Aman Prasad (@amanprasad).</description>
    <link>https://forem.com/amanprasad</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3670273%2Fd36dd1b5-380a-4509-8105-e4d21070e012.png</url>
      <title>Forem: Aman Prasad</title>
      <link>https://forem.com/amanprasad</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/amanprasad"/>
    <language>en</language>
    <item>
      <title>The Inline Myth: Why the inline Keyword is Just a Suggestion</title>
      <dc:creator>Aman Prasad</dc:creator>
      <pubDate>Mon, 23 Feb 2026 14:01:49 +0000</pubDate>
      <link>https://forem.com/amanprasad/the-inline-myth-why-the-inline-keyword-is-just-a-suggestion-4gfn</link>
      <guid>https://forem.com/amanprasad/the-inline-myth-why-the-inline-keyword-is-just-a-suggestion-4gfn</guid>
      <description>&lt;p&gt;Inline functions are functions that the compiler &lt;em&gt;may&lt;/em&gt; expand directly at the place where they are called, instead of performing a normal function call.&lt;/p&gt;

&lt;p&gt;Inline functions are often misunderstood especially by beginners who assume that writing the &lt;code&gt;inline&lt;/code&gt; keyword forces the compiler to inline a function.&lt;/p&gt;

&lt;p&gt;In reality, &lt;strong&gt;inline is only a suggestion&lt;/strong&gt; and modern compilers are far smarter than we realize.&lt;br&gt;
For optimization purposes, inline is only a suggestion. (In C99, it also has defined linkage semantics.)&lt;/p&gt;

&lt;p&gt;This post explains:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What &lt;code&gt;inline&lt;/code&gt; actually means&lt;/li&gt;
&lt;li&gt;Why it is only a hint&lt;/li&gt;
&lt;li&gt;Inline vs macros&lt;/li&gt;
&lt;li&gt;How modern compilers decide to inline&lt;/li&gt;
&lt;li&gt;Performance and binary size trade-offs&lt;/li&gt;
&lt;li&gt;Real compiler behavior using assembly output&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Inline Is a Suggestion, Not a Command
&lt;/h2&gt;

&lt;p&gt;When you write:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="kr"&gt;inline&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;you are &lt;strong&gt;not instructing&lt;/strong&gt; the compiler to inline this function.&lt;/p&gt;

&lt;p&gt;You are merely &lt;strong&gt;suggesting&lt;/strong&gt; that inlining &lt;em&gt;may&lt;/em&gt; be beneficial.&lt;/p&gt;

&lt;p&gt;The compiler is completely free to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Inline the function&lt;/li&gt;
&lt;li&gt;Ignore the suggestion&lt;/li&gt;
&lt;li&gt;Inline it in some call sites but not others&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Why?
&lt;/h3&gt;

&lt;p&gt;Because &lt;strong&gt;the C/C++ standards do not require compilers to perform any optimization at all&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Inlining is an optimization, and therefore it &lt;strong&gt;cannot be mandatory&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;If &lt;code&gt;inline&lt;/code&gt; were a mandatory, optimization itself would no longer be optional and this would violate the language standard.&lt;/p&gt;

&lt;h2&gt;
  
  
  Modern Compilers Inline Even Without &lt;code&gt;inline&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;A very common misconception is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;If I don’t write &lt;code&gt;inline&lt;/code&gt;, the function won’t be inlined.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is &lt;strong&gt;false&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Modern compilers (GCC, Clang, MSVC):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Perform &lt;strong&gt;automatic inlining&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Analyze function size, call frequency, and context&lt;/li&gt;
&lt;li&gt;Inline functions even if the &lt;code&gt;inline&lt;/code&gt; keyword is not used&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;(){&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With optimizations enabled (&lt;code&gt;-O2&lt;/code&gt;, &lt;code&gt;-O3&lt;/code&gt;), the compiler will &lt;strong&gt;very likely&lt;/strong&gt; inline such a small function.&lt;/p&gt;

&lt;p&gt;Today, &lt;code&gt;inline&lt;/code&gt; is more of a semantic hint than a performance switch.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Inline Exists at All
&lt;/h2&gt;

&lt;p&gt;Inlining was originally introduced to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Reduce &lt;strong&gt;function call overhead&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Improve performance in tight loops&lt;/li&gt;
&lt;li&gt;Replace unsafe macros&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A traditional function call involves:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pushing arguments onto the stack&lt;/li&gt;
&lt;li&gt;Saving registers&lt;/li&gt;
&lt;li&gt;Jumping to another memory location&lt;/li&gt;
&lt;li&gt;Returning back&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Inlining eliminates this overhead by &lt;strong&gt;expanding the function body at the call site&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  But Call Overhead Isn’t That Expensive Anymore
&lt;/h2&gt;

&lt;p&gt;Modern CPUs are highly optimized for function calls through:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Branch prediction&lt;/li&gt;
&lt;li&gt;Instruction pipelining&lt;/li&gt;
&lt;li&gt;Speculative execution&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As a result, the overhead of a &lt;strong&gt;well-predicted function call is often very small&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;In many cases, aggressive inlining does not yield significant performance gains and can even &lt;strong&gt;hurt performance&lt;/strong&gt; due to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Increased code size&lt;/li&gt;
&lt;li&gt;Instruction cache pressure&lt;/li&gt;
&lt;li&gt;Register pressure&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Today, the primary benefit of inlining is &lt;strong&gt;not eliminating the call itself&lt;/strong&gt;, but &lt;strong&gt;enabling further compiler optimizations&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Compilers Decide to Inline
&lt;/h2&gt;

&lt;p&gt;Compilers use heuristics. They compare:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Cost of the function call&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Size of the function body&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If the &lt;strong&gt;cost of call&lt;/strong&gt; &amp;gt; &lt;strong&gt;cost of expanded code&lt;/strong&gt; then the compiler may inline.&lt;/p&gt;

&lt;h3&gt;
  
  
  Likely to Be Inlined
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Very small functions&lt;/li&gt;
&lt;li&gt;Simple calculations&lt;/li&gt;
&lt;li&gt;Getters/setters&lt;/li&gt;
&lt;li&gt;Functions called inside loops&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Unlikely to Be Inlined
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Large functions&lt;/li&gt;
&lt;li&gt;Functions with loops&lt;/li&gt;
&lt;li&gt;Functions with static variables&lt;/li&gt;
&lt;li&gt;Functions called via function pointers&lt;/li&gt;
&lt;li&gt;Recursive functions&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Recursive Functions Cannot Be Fully Inlined
&lt;/h2&gt;

&lt;p&gt;Inlining requires the compiler to expand the function body.&lt;/p&gt;

&lt;p&gt;For recursion:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;fact&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;fact&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Inlining would require:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Infinite expansion&lt;/li&gt;
&lt;li&gt;Unlimited code generation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Compilers cannot infinitely expand recursive calls, &lt;br&gt;
though they may still inline limited cases or optimize tail recursion.&lt;/p&gt;
&lt;h2&gt;
  
  
  Inline vs Macros
&lt;/h2&gt;

&lt;p&gt;Macros were the original “inline mechanism,” but they come with serious problems.&lt;/p&gt;
&lt;h3&gt;
  
  
  Macro Example
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="cp"&gt;#define ADD(a, b) a + b
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Usage:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="mi"&gt;4&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;ADD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Expansion:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="mi"&gt;4&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;  &lt;span class="c1"&gt;//  Wrong result&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Inline Function Equivalent
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="kr"&gt;inline&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Usage:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="mi"&gt;4&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;//  Correct&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Too Much Inline Increases Binary Size
&lt;/h2&gt;

&lt;p&gt;Inlining duplicates code at every call site.&lt;/p&gt;

&lt;p&gt;If a function is used in many places:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Binary size increases&lt;/li&gt;
&lt;li&gt;Instruction cache pressure increases&lt;/li&gt;
&lt;li&gt;Performance may actually degrade&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This phenomenon is known as &lt;strong&gt;code bloat&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;So:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Inlining trades space for speed.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Experiment: Verifying Inlining Across Optimization Levels
&lt;/h2&gt;

&lt;p&gt;We test a simple program with and &lt;strong&gt;without the &lt;code&gt;inline&lt;/code&gt; keyword&lt;/strong&gt; to observe how the compiler behaves at different optimization levels.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Test Code&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="kr"&gt;inline&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Case 1: Compilation with &lt;code&gt;-O0&lt;/code&gt; (No Optimization)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="n"&gt;gcc&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;S&lt;/span&gt;  &lt;span class="n"&gt;test&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;O0&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdfs71j9rdwt4qtsf2o13.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdfs71j9rdwt4qtsf2o13.png" alt="assembly generated for the inline function with O0" width="539" height="787"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Assembly Observation
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight nasm"&gt;&lt;code&gt;&lt;span class="nf"&gt;call&lt;/span&gt;    &lt;span class="nv"&gt;_add&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;main()&lt;/code&gt; explicitly &lt;strong&gt;calls &lt;code&gt;_add&lt;/code&gt;&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Stack frame setup is visible&lt;/li&gt;
&lt;li&gt;No inlining occurs&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Symbol Table
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;nm a.exe | &lt;span class="nb"&gt;grep &lt;/span&gt;add
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="mi"&gt;00401&lt;/span&gt;&lt;span class="n"&gt;b40&lt;/span&gt; &lt;span class="n"&gt;T&lt;/span&gt; &lt;span class="n"&gt;____w64_mingwthr_add_key_dtor&lt;/span&gt;
&lt;span class="mi"&gt;00403880&lt;/span&gt; &lt;span class="n"&gt;T&lt;/span&gt; &lt;span class="n"&gt;___mingw_readdir&lt;/span&gt;
&lt;span class="mi"&gt;00401460&lt;/span&gt; &lt;span class="n"&gt;T&lt;/span&gt; &lt;span class="n"&gt;_add&lt;/span&gt;   &lt;span class="c1"&gt;// we can see the add symbol with -O0&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Conclusion:
&lt;/h3&gt;

&lt;p&gt;At -O0, GCC prioritizes debuggability. Almost no inlining happens even if inline is written.&lt;/p&gt;

&lt;h3&gt;
  
  
  ⚠️ Important Warning About &lt;code&gt;inline&lt;/code&gt; at &lt;code&gt;-O0&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;In &lt;strong&gt;C&lt;/strong&gt;, writing &lt;code&gt;inline&lt;/code&gt; does &lt;strong&gt;not&lt;/strong&gt; create a normal, externally callable function.&lt;/p&gt;

&lt;p&gt;At &lt;code&gt;-O0&lt;/code&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The compiler does not inline&lt;/li&gt;
&lt;li&gt;A function call may still be generated&lt;/li&gt;
&lt;li&gt;No external definition is emitted for the &lt;code&gt;inline&lt;/code&gt; function&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This leads to a linker error:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Call exists, but function does not.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;To avoid this issue in C, always use:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="kr"&gt;inline&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;or provide a separate external definition.&lt;/p&gt;

&lt;p&gt;⚠️ Important Clarification About C99 &lt;code&gt;inline&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;In C99, &lt;code&gt;inline&lt;/code&gt; is not only about optimization — it also affects linkage and symbol emission.&lt;/p&gt;

&lt;p&gt;There are three forms:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;inline&lt;/code&gt; → provides an inline definition but does not emit an external definition.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;extern inline&lt;/code&gt; → forces emission of the external definition in one translation unit.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;static inline&lt;/code&gt; → gives internal linkage (each translation unit gets its own copy).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To avoid linker issues at low optimization levels, you can either:&lt;/p&gt;

&lt;p&gt;• Use &lt;code&gt;static inline&lt;/code&gt; in headers (common and simple), or&lt;br&gt;&lt;br&gt;
• Use &lt;code&gt;inline&lt;/code&gt; in a header and &lt;code&gt;extern inline&lt;/code&gt; in exactly one .c file (the strict C99 model).&lt;/p&gt;
&lt;h2&gt;
  
  
  Case 2: Compilation with &lt;code&gt;-O2&lt;/code&gt; (Optimized Build)
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcc &lt;span class="nt"&gt;-S&lt;/span&gt; test.c &lt;span class="nt"&gt;-O2&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  Assembly Observation
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyczr9pcmwysgyidekefw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyczr9pcmwysgyidekefw.png" alt="assembly generated for the inline function with O0" width="539" height="674"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No &lt;code&gt;call _add&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Function is fully inlined&lt;/li&gt;
&lt;li&gt;Constant folding reduces &lt;code&gt;add(2,3)&lt;/code&gt; to &lt;code&gt;5&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Symbol Table
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;nm a.exe | &lt;span class="nb"&gt;grep &lt;/span&gt;add
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;00401b10 T ____w64_mingwthr_add_key_dtor
00403850 T ___mingw_readdir
00401460 T _add 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;⚠️ &lt;strong&gt;Important Observation&lt;/strong&gt;&lt;br&gt;
 &lt;code&gt;_add&lt;/code&gt; still exists, but it is &lt;strong&gt;never called&lt;/strong&gt;.&lt;/p&gt;
&lt;h3&gt;
  
  
  Why &lt;code&gt;_add&lt;/code&gt; Exists but Is Never Called
&lt;/h3&gt;

&lt;p&gt;This is the &lt;strong&gt;core question&lt;/strong&gt;, and the answer is subtle but fundamental.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reason 1: External Linkage&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="kr"&gt;inline&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Functions have external linkage by default, meaning another translation unit might call add().&lt;br&gt;
The compiler must therefore keep the symbol.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reason 2: No Whole-Program Visibility&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Without Link Time Optimization (LTO), the compiler cannot prove the function is unused globally.&lt;/p&gt;
&lt;h3&gt;
  
  
  Forcing Removal of &lt;code&gt;_add&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Option 1: Make the Function &lt;code&gt;static&lt;/code&gt;&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Internal linkage allows the compiler to remove the symbol.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Option 2: Enable Link Time Optimization (LTO)&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcc &lt;span class="nt"&gt;-O2&lt;/span&gt; &lt;span class="nt"&gt;-flto&lt;/span&gt; test.c
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  ⚠️ Important Note
&lt;/h3&gt;

&lt;p&gt;Although the example uses the &lt;code&gt;inline&lt;/code&gt; keyword for explanation, I also tested the &lt;strong&gt;same code without &lt;code&gt;inline&lt;/code&gt;&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;When compiled with &lt;code&gt;-O2&lt;/code&gt;, the compiler &lt;strong&gt;still inlined the function automatically&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This confirms that &lt;strong&gt;inlining at higher optimization levels is driven by the compiler’s heuristics&lt;/strong&gt;, not by the presence of the &lt;code&gt;inline&lt;/code&gt; keyword.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;inline&lt;/code&gt; is a &lt;strong&gt;hint&lt;/strong&gt;, not a guarantee&lt;/li&gt;
&lt;li&gt;The compiler may inline even without &lt;code&gt;inline&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;- Inline primarily enables optimization, but in C99 it also affects linkage&lt;/li&gt;
&lt;li&gt;Macros are unsafe; inline functions are type-safe&lt;/li&gt;
&lt;li&gt;Recursive functions cannot be inlined&lt;/li&gt;
&lt;li&gt;Excessive inlining increases binary size&lt;/li&gt;
&lt;li&gt;Modern CPUs reduce the benefit of aggressive inlining&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>discuss</category>
      <category>learning</category>
      <category>c</category>
      <category>beginners</category>
    </item>
    <item>
      <title>char str1[] = "hello world"; vs char *str2 = "hello world"; – The Memory Story Every C Programmer Must Know</title>
      <dc:creator>Aman Prasad</dc:creator>
      <pubDate>Thu, 19 Feb 2026 14:38:25 +0000</pubDate>
      <link>https://forem.com/amanprasad/char-str1-hello-world-vs-char-str2-hello-world-the-memory-story-every-c-programmer-28hd</link>
      <guid>https://forem.com/amanprasad/char-str1-hello-world-vs-char-str2-hello-world-the-memory-story-every-c-programmer-28hd</guid>
      <description>&lt;p&gt;A &lt;strong&gt;string literal&lt;/strong&gt; is a sequence of characters stored in &lt;strong&gt;read-only memory&lt;/strong&gt; and automatically terminated by a null character &lt;code&gt;'\0'&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="n"&gt;str1&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"hello world"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;   &lt;span class="c1"&gt;//  str1 is a character array stored on stack&lt;/span&gt;
&lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;str2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"hello world"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;    &lt;span class="c1"&gt;// str2 is a pointer stored on the stack; it points to a string literal in read-only memory&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Although they look similar, these two lines behave &lt;strong&gt;very differently in memory&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  char str1[] = “hello world”;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;The compiler allocates an array of size &lt;strong&gt;12 bytes&lt;/strong&gt; (&lt;code&gt;11 characters + '\0'&lt;/code&gt;) &lt;strong&gt;on the stack&lt;/strong&gt; (for local variables)&lt;strong&gt;.&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;At runtime it &lt;strong&gt;copies&lt;/strong&gt; the 12 bytes from the string literal (typically stored in &lt;code&gt;.rodata&lt;/code&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So you end up with &lt;strong&gt;two copies&lt;/strong&gt; of the string:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;One immutable in &lt;code&gt;.rodata&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;One mutable on the stack&lt;/li&gt;
&lt;li&gt;&lt;em&gt;(If &lt;code&gt;str1&lt;/code&gt; were global or static, it would be stored in &lt;code&gt;.data&lt;/code&gt; instead of the stack.)&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;so if we try to change it something like that&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="n"&gt;str1&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sc"&gt;'a'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;       &lt;span class="c1"&gt;// works fine because it is mutable and located on stack&lt;/span&gt;
&lt;span class="n"&gt;str1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"something"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;  &lt;span class="c1"&gt;// compilation error array name is not assignable&lt;/span&gt;
&lt;span class="n"&gt;str1&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;              &lt;span class="c1"&gt;// not allowed: array names are non-modifiable lvalues&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In the above snippet, str1 = "something" is not allowed because str1 is an array and array names cannot be reassigned.&lt;br&gt;
If we want to write new data into the existing array str1, we must copy the contents using &lt;code&gt;strcpy&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="n"&gt;strcpy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;str1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"some"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;   &lt;span class="c1"&gt;// '\0' is copied automatically&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;⚠️ &lt;code&gt;strcpy&lt;/code&gt; assumes the destination array is large enough; otherwise it causes buffer overflow.&lt;/p&gt;

&lt;p&gt;Alternatively, we can copy from another character array:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="n"&gt;new_str&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"new value"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="n"&gt;strcpy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;str1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;new_str&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="c1"&gt;// (Here str1 is the destination array and new_str is the source string.)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, if we check the size of the variable str1, it gives 12 bytes&lt;br&gt;
(11 characters + one null character &lt;code&gt;'\0'&lt;/code&gt;):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="k"&gt;sizeof&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;str1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;  &lt;span class="c1"&gt;// 12 bytes (11 chars + 1 null char '\0')&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;str1&lt;/code&gt; is a &lt;strong&gt;real array&lt;/strong&gt;, and &lt;code&gt;sizeof&lt;/code&gt; returns the &lt;strong&gt;total allocated size of the array&lt;/strong&gt;, not the length of the string stored in it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why can’t I do &lt;code&gt;str1 = str2&lt;/code&gt;, but I can do &lt;code&gt;strcpy(str1, str2)&lt;/code&gt;?
&lt;/h2&gt;

&lt;p&gt;At first glance, both statements look like they should “copy” a string. However, they do two very different things in C.&lt;/p&gt;

&lt;p&gt;Internally, &lt;code&gt;strcpy&lt;/code&gt; does something like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;str2&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="sc"&gt;'\0'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;str1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;str2&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="n"&gt;str1&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="n"&gt;str2&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;str1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sc"&gt;'\0'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It &lt;strong&gt;never changes the address of &lt;code&gt;str1&lt;/code&gt;&lt;/strong&gt; It &lt;strong&gt;writes into the memory owned by &lt;code&gt;str1&lt;/code&gt;&lt;/strong&gt; That is why it is allowed.&lt;/p&gt;

&lt;h3&gt;
  
  
  char *s2 = "hello world";
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;The compiler allocates &lt;strong&gt;only the pointer&lt;/strong&gt; (8 bytes on 64-bit, 4 bytes on 32-bit) on the stack.&lt;/li&gt;
&lt;li&gt;The pointer’s value is the address of the string literal (typically stored in &lt;code&gt;.rodata&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;we cannot modify the &lt;strong&gt;data that &lt;code&gt;str2&lt;/code&gt; points to&lt;/strong&gt;, because the string literal is stored in the read-only &lt;code&gt;.rodata&lt;/code&gt; segment
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="n"&gt;str2&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sc"&gt;'a'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;         &lt;span class="c1"&gt;// undefined behavior (often results in segmentation fault)&lt;/span&gt;
&lt;span class="n"&gt;str2&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;                &lt;span class="c1"&gt;// allowed: moves the pointer, not the string data&lt;/span&gt;
&lt;span class="n"&gt;printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"%s"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;str2&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;    &lt;span class="c1"&gt;// ello world&lt;/span&gt;
&lt;span class="n"&gt;str2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"bye world"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;    &lt;span class="c1"&gt;// allowed — repoints to another literal&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;The &lt;strong&gt;string literal&lt;/strong&gt; &lt;code&gt;"hello world"&lt;/code&gt; is placed in &lt;strong&gt;read-only memory&lt;/strong&gt; (&lt;code&gt;.rodata&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;str2&lt;/code&gt; (a pointer) stores the &lt;strong&gt;address&lt;/strong&gt; of that literal&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The const Habit we Should Adopt
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;str2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"hello world"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;   &lt;span class="c1"&gt;// this is what we should write&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now the compiler will give you a compile-time error if you try str2[0] = 'x'; instead of a runtime crash.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Happens If Two Variables Use the Same String Literal?
&lt;/h2&gt;

&lt;p&gt;Consider the following code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"hello"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"hello"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="n"&gt;printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"%p %p&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On many systems, this program prints &lt;strong&gt;the same address&lt;/strong&gt; for both &lt;code&gt;a&lt;/code&gt; and &lt;code&gt;b&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;0x0040507D  0x0040507D
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Most modern compilers perform an optimization called &lt;strong&gt;string literal pooling&lt;/strong&gt; (or &lt;strong&gt;string interning&lt;/strong&gt;):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Identical string literals are &lt;strong&gt;stored only once&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Multiple pointers reference the &lt;strong&gt;same memory location&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;This saves memory and improves cache usage&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As a result:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;a&lt;/code&gt; and &lt;code&gt;b&lt;/code&gt; point to the &lt;strong&gt;same string literal&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Modifying either (which is illegal anyway) would affect both&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Important Standard Note
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;⚠️ &lt;strong&gt;The C standard does NOT guarantee this behavior.&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Compilers are &lt;strong&gt;allowed&lt;/strong&gt; to merge identical literals&lt;/li&gt;
&lt;li&gt;Compilers are also &lt;strong&gt;allowed&lt;/strong&gt; to keep them separate&lt;/li&gt;
&lt;li&gt;You must &lt;strong&gt;never rely on their addresses being equal&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;p&gt;So this is valid C:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;   &lt;span class="c1"&gt;// may be true or false&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Both outcomes are legal.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why This Matters in Practice
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Never compare string literals using pointer equality&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;   &lt;span class="c1"&gt;// ❌ wrong&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Always use &lt;code&gt;strcmp&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;strcmp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;  &lt;span class="c1"&gt;// ✅ correct&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Never attempt to modify string literals&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Treat all string literals as &lt;strong&gt;read-only shared objects&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="cp"&gt;#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;stdio.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;string.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
&lt;/span&gt;
&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;

    &lt;span class="cm"&gt;/* =========================================================
                PART 1: char str1[] = "hello world";
    ========================================================= */&lt;/span&gt;

    &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="n"&gt;str1&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"hello world"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="cm"&gt;/* str1 is a CHARACTER ARRAY.
    Memory for the array is allocated on the stack.
    The string literal "hello world" is COPIED into this array.
    Size allocated = 11 characters + 1 null terminator = 12 bytes. */&lt;/span&gt;

    &lt;span class="c1"&gt;// Since str1 owns writable memory, modifying characters is VALID.&lt;/span&gt;
    &lt;span class="n"&gt;str1&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sc"&gt;'a'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;   &lt;span class="c1"&gt;// changes 'h' to 'a'&lt;/span&gt;

    &lt;span class="c1"&gt;// Prints the modified string stored in stack memory&lt;/span&gt;
    &lt;span class="n"&gt;printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"str1: %s&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;str1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;   &lt;span class="c1"&gt;// Output -&amp;gt; str1: aello world&lt;/span&gt;

    &lt;span class="n"&gt;strcpy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;str1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"aman"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"str1: %s&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;str1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;   &lt;span class="c1"&gt;// Output-&amp;gt; str1: aman&lt;/span&gt;

    &lt;span class="c1"&gt;// Array names are NOT pointers and are NOT modifiable lvalues.&lt;/span&gt;
    &lt;span class="c1"&gt;// str1++;              //  INVALID: cannot change base address of an array&lt;/span&gt;
    &lt;span class="c1"&gt;// str1 = "bye world";  //  INVALID: array cannot be reassigned&lt;/span&gt;

    &lt;span class="c1"&gt;// sizeof(str1) gives the TOTAL SIZE of the array in bytes&lt;/span&gt;
    &lt;span class="c1"&gt;// because str1 is a real array.&lt;/span&gt;
    &lt;span class="n"&gt;printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"size of str1: %zu&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;sizeof&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;str1&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;   &lt;span class="c1"&gt;// 12 bytes&lt;/span&gt;

    &lt;span class="cm"&gt;/* =========================================================
       PART 2: char *str2 = "hello world";
       ========================================================= */&lt;/span&gt;

       &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;str2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"hello world"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="cm"&gt;/* str2 is a POINTER to char.
    The string literal "hello world" is stored in READ-ONLY memory (.rodata).
    str2 only stores the ADDRESS of the first character of the literal. */&lt;/span&gt;

    &lt;span class="cm"&gt;/* Attempting to modify a string literal is UNDEFINED BEHAVIOR.
    On most systems this causes a segmentation fault or crash.
    str2[0] = 'a';    //  SEGMENTATION FAULT

    CORRECT and SAFE declaration for string literals:
    const char *str2 = "hello world";

    Pointer arithmetic is allowed because str2 itself is modifiable.
    This makes str2 point to the second character of the string. */&lt;/span&gt;
    &lt;span class="n"&gt;str2&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="c1"&gt;// Prints the string starting from the new pointer location&lt;/span&gt;
    &lt;span class="n"&gt;printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"str2: %s&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;str2&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;   &lt;span class="c1"&gt;// Output: ello world&lt;/span&gt;

    &lt;span class="c1"&gt;// Reassigning the pointer is allowed.&lt;/span&gt;
    &lt;span class="c1"&gt;// Now str2 points to a DIFFERENT string literal.&lt;/span&gt;
    &lt;span class="n"&gt;str2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"bye world"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="c1"&gt;// Prints the new string literal&lt;/span&gt;
    &lt;span class="n"&gt;printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"str2: %s&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;str2&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;   &lt;span class="c1"&gt;// Output: bye world&lt;/span&gt;

    &lt;span class="c1"&gt;// sizeof(str2) gives the size of the POINTER itself,&lt;/span&gt;
    &lt;span class="c1"&gt;// NOT the size of the string it points to.&lt;/span&gt;
    &lt;span class="n"&gt;printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"size of str2: %zu&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;sizeof&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;str2&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;   &lt;span class="c1"&gt;// 8 bytes on 64-bit systems&lt;/span&gt;

    &lt;span class="c1"&gt;// All pointer types have the same size on a given architecture&lt;/span&gt;
    &lt;span class="n"&gt;printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"size of int pointer: %zu&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;sizeof&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt; &lt;span class="c1"&gt;// 8 bytes (64-bit)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>discuss</category>
      <category>c</category>
      <category>learning</category>
      <category>beginners</category>
    </item>
    <item>
      <title>MQTT Explained in Simple Terms: The Lightweight Protocol That Powers the Entire IoT World</title>
      <dc:creator>Aman Prasad</dc:creator>
      <pubDate>Tue, 17 Feb 2026 06:38:47 +0000</pubDate>
      <link>https://forem.com/amanprasad/mqtt-explained-in-simple-terms-the-lightweight-protocol-that-powers-the-entire-iot-world-142p</link>
      <guid>https://forem.com/amanprasad/mqtt-explained-in-simple-terms-the-lightweight-protocol-that-powers-the-entire-iot-world-142p</guid>
      <description>&lt;p&gt;&lt;strong&gt;MQTT (Message Queuing Telemetry Transport)&lt;/strong&gt; is a lightweight, standards-based messaging protocol designed for &lt;strong&gt;machine-to-machine (M2M)&lt;/strong&gt; and &lt;strong&gt;Internet of Things (IoT)&lt;/strong&gt; communication.&lt;/p&gt;

&lt;p&gt;It is optimized for Low bandwidth, High latency networks, Resource-constrained devices like microcontrollers. Unlike HTTP, MQTT does &lt;strong&gt;not&lt;/strong&gt; use request–response. Instead, it uses a &lt;strong&gt;publish/subscribe&lt;/strong&gt; communication model.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fycf5212og889e25x662q.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fycf5212og889e25x662q.jpeg" alt="MQTT" width="800" height="480"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Why is the MQTT protocol important?&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;The MQTT protocol has become a standard for IoT data transmission because it delivers the following benefits:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Lightweight and efficient:&lt;/strong&gt; MQTT implementation on the IoT device requires minimal resources, so it can even be used on small microcontrollers. For example, a minimal MQTT control message can be as little as two data bytes. MQTT message headers are also small so that you can optimize network bandwidth.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scalable:&lt;/strong&gt; The protocol has built-in features to support communication with a large number of IoT devices. Hence, you can implement the MQTT protocol to connect with millions of these devices.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reliable:&lt;/strong&gt; Many IoT devices connect over unreliable cellular networks with low bandwidth and high latency. MQTT has built-in features that reduce the time the IoT device takes to reconnect with the cloud. It also defines three different quality-of-service levels to ensure reliability for IoT.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Secure:&lt;/strong&gt; MQTT makes it easy for developers to encrypt messages and authenticate devices and users using modern authentication protocols, such as OAuth, TLS1.3, Customer Managed Certificates, and more.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Principle behind MQTT&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;MQTT is built around the &lt;strong&gt;publish-subscribe communication model&lt;/strong&gt;, which is fundamentally different from the traditional client–server approach.&lt;/p&gt;

&lt;p&gt;In a typical client–server system, a client directly requests data from a server, and the server responds with the requested information. This creates a &lt;strong&gt;tight coupling&lt;/strong&gt; between both sides. The client must know where the server is and both must be available at the same time.&lt;/p&gt;

&lt;p&gt;But MQTT removes this direct dependency by introducing an &lt;strong&gt;intermediary called a broker&lt;/strong&gt;. Instead of sending messages directly to each other, devices communicate through the broker:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A device that sends data is called a &lt;strong&gt;publisher&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;A device that receives data is called a &lt;strong&gt;subscriber&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;The &lt;strong&gt;broker&lt;/strong&gt; receives all published messages and delivers them to the appropriate subscribers based on topics&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Because of this design, publishers and subscribers remain completely independent of each other.&lt;/p&gt;

&lt;p&gt;This creates three beautiful kinds of freedom (called decoupling):&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Space Decoupling: "I don't need to know where you live”&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Publishers and subscribers do not need to know anything about each other’s network details. They don’t exchange IP addresses, port numbers, or device identities.&lt;/p&gt;

&lt;p&gt;Each device only knows the broker address and the topic it publishes to or subscribes to. This makes it easy to add, remove or replace devices without changing the rest of the system.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Time Decoupling: "I'll leave a message for you”&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;In MQTT, publishers and subscribers do not need to be connected at the same time.&lt;/p&gt;

&lt;p&gt;A publisher can send data even when subscribers are offline, and subscribers can receive data later when they reconnect (depending on QoS and session settings). This is especially useful for IoT devices that frequently go into sleep mode or experience unstable connectivity.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Synchronization Decoupling: "No waiting in line”&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Publishers and subscribers operate independently and do not block each other.&lt;/p&gt;

&lt;p&gt;A publisher can send messages without waiting for subscribers to process them and subscribers can receive messages whenever they are ready. This asynchronous behavior makes MQTT highly efficient and suitable for real-time systems with limited processing power.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;MQTT components&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;MQTT follows the publish/subscribe model by defining a small set of core components. The most important ones are &lt;strong&gt;clients&lt;/strong&gt;, the &lt;strong&gt;broker&lt;/strong&gt; and the &lt;strong&gt;connection&lt;/strong&gt; that links them.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;MQTT client&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;An MQTT client is &lt;strong&gt;any device or application&lt;/strong&gt; that communicates using the MQTT protocol. This can range from a cloud server or mobile app to a small microcontroller running an MQTT library.&lt;/p&gt;

&lt;p&gt;A client can play different roles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;When it sends data, it acts as a &lt;strong&gt;publisher&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;When it receives data, it acts as a &lt;strong&gt;subscriber&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;A single client can do both at the same time&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In simple terms, if a device connects to a broker and exchanges messages using MQTT, it is considered an MQTT client.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;MQTT broker&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The MQTT broker is the &lt;strong&gt;central communication hub&lt;/strong&gt; of the system. All MQTT clients connect to the broker and clients never communicate directly with each other.&lt;/p&gt;

&lt;p&gt;The broker’s main responsibilities include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Receiving messages from publishers&lt;/li&gt;
&lt;li&gt;Filtering messages based on topics&lt;/li&gt;
&lt;li&gt;Delivering messages to all subscribed clients&lt;/li&gt;
&lt;li&gt;Managing client connections and sessions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In addition, the broker often handles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Client authentication and authorization&lt;/li&gt;
&lt;li&gt;Storing and delivering messages for disconnected clients&lt;/li&gt;
&lt;li&gt;Forwarding data to databases, analytics engines, or cloud services&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Because of this, the broker plays a critical role in ensuring reliability, scalability, and security.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;MQTT connection&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Communication in MQTT starts when a client establishes a connection with the broker.&lt;/p&gt;

&lt;p&gt;The process works as follows:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The client sends a &lt;code&gt;CONNECT&lt;/code&gt; message to the broker&lt;/li&gt;
&lt;li&gt;The broker responds with a &lt;code&gt;CONNACK&lt;/code&gt; message to confirm the connection&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This communication happens over a &lt;strong&gt;persistent TCP/IP connection&lt;/strong&gt;, which remains open while data is exchanged. All MQTT communication flows through this connection.&lt;/p&gt;

&lt;p&gt;An important rule in MQTT is that &lt;strong&gt;clients only connect to the broker&lt;/strong&gt;, never directly to other clients. This design keeps the system loosely coupled and easy to scale.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;MQTT Topics&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;A &lt;strong&gt;topic&lt;/strong&gt; is a structured string that the MQTT broker uses to &lt;strong&gt;route messages&lt;/strong&gt; between clients. Instead of sending messages directly to a specific device, MQTT clients publish messages to a topic and the broker decides which subscribers should receive them.&lt;/p&gt;

&lt;p&gt;Topics are arranged in a &lt;strong&gt;hierarchical format&lt;/strong&gt;, similar to folders in a file system, with each level separated by a forward slash (&lt;code&gt;/&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example topic structure&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ourhome/groundfloor/livingroom/light
ourhome/firstfloor/kitchen/temperature
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each part of the topic adds context:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;ourhome&lt;/code&gt; → identifies the system&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;groundfloor&lt;/code&gt; / &lt;code&gt;firstfloor&lt;/code&gt; → identifies the location&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;livingroom&lt;/code&gt; / &lt;code&gt;kitchen&lt;/code&gt; → identifies the room&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;light&lt;/code&gt; / &lt;code&gt;temperature&lt;/code&gt; → identifies the device or data type&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This hierarchy makes it easy to organize data logically and scale the system as more devices are added.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;MQTT Publish&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Publishing&lt;/strong&gt; is the process of sending data to the broker.&lt;/p&gt;

&lt;p&gt;When an MQTT client publishes a message, it includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A &lt;strong&gt;topic&lt;/strong&gt; (where the message belongs)&lt;/li&gt;
&lt;li&gt;A &lt;strong&gt;payload&lt;/strong&gt; (the actual data)&lt;/li&gt;
&lt;li&gt;Optional settings like QoS and retain flag&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The payload is sent as raw bytes, which means the client is free to choose any data format, such as Plain text, JSON, Binary data, Sensor readings&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A smart lamp in a home automation system may publish:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Topic: ourhome/groundfloor/livingroom/light
Payload:ON
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once published, the message is delivered to &lt;strong&gt;all clients subscribed to that topic&lt;/strong&gt;, based on broker rules.&lt;/p&gt;

&lt;p&gt;point to remember:&lt;/p&gt;

&lt;p&gt;Publishers do not know who receives the message. It only send data to a topic.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;MQTT Subscribe&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Subscribing&lt;/strong&gt; is how an MQTT client expresses interest in receiving certain messages.&lt;/p&gt;

&lt;p&gt;To subscribe, a client sends a &lt;code&gt;SUBSCRIBE&lt;/code&gt; request to the broker that includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;One or more topic filters&lt;/li&gt;
&lt;li&gt;The desired QoS level for each topic&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;After subscribing, the broker automatically forwards any matching messages to the client whenever data is published on those topics.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Example&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;A mobile app that monitors home lighting may subscribe to:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ourhome/+/+/light
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every time a light publishes its state (&lt;code&gt;ON&lt;/code&gt; or &lt;code&gt;OFF&lt;/code&gt;), the app receives the update and can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Display the current status&lt;/li&gt;
&lt;li&gt;Update a counter of active lights&lt;/li&gt;
&lt;li&gt;Trigger notifications or automation rules&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Quality of Service (QoS)&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;In MQTT, &lt;strong&gt;Quality of Service (QoS)&lt;/strong&gt; defines &lt;strong&gt;how reliably a message is delivered&lt;/strong&gt; from a publisher to a subscriber.&lt;/p&gt;

&lt;p&gt;Because IoT networks can be slow, unstable, or intermittent. MQTT allows developers to choose the &lt;strong&gt;right balance between reliability, speed, and bandwidth usage&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;MQTT supports &lt;strong&gt;three QoS levels&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;QoS 0 – &lt;em&gt;At most once&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;QoS 1 – &lt;em&gt;At least once&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;QoS 2 – &lt;em&gt;Exactly once&lt;/em&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each level provides a different delivery guarantee.&lt;/p&gt;

&lt;h3&gt;
  
  
  QoS 0
&lt;/h3&gt;

&lt;p&gt;QoS 0 delivers a message &lt;strong&gt;at most once&lt;/strong&gt; (fire and forget). The message is sent &lt;strong&gt;without any acknowledgment&lt;/strong&gt; from the receiver.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Publisher sends the message&lt;/li&gt;
&lt;li&gt;No confirmation is expected&lt;/li&gt;
&lt;li&gt;Message may be lost if the connection fails&lt;/li&gt;
&lt;li&gt;It ensures fastest delivery, lowest bandwidth usage and no retry mechanism.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwu4xhboppesy647ud7lv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwu4xhboppesy647ud7lv.png" alt="QoS 0" width="800" height="257"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use cases&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Temperature and humidity readings, live sensor streams where occasional loss is acceptable&lt;/p&gt;

&lt;h3&gt;
  
  
  QoS 1
&lt;/h3&gt;

&lt;p&gt;QoS 1 guarantees that a message is delivered &lt;strong&gt;at least once&lt;/strong&gt;. However, &lt;strong&gt;duplicate messages are possible&lt;/strong&gt;.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Publisher sends message&lt;/li&gt;
&lt;li&gt;Subscriber (via broker) sends an acknowledgment (&lt;code&gt;PUBACK&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;If no acknowledgment is received, the publisher retransmits&lt;/li&gt;
&lt;li&gt;It ensures reliable delivery, it may contain duplicate messages and in this case moderate bandwidth is used.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Friz9qn50xb2f2u4ltsmw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Friz9qn50xb2f2u4ltsmw.png" alt="QoS 1" width="800" height="285"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use cases&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Device control commands, status updates, alerts and notifications&lt;/p&gt;

&lt;h3&gt;
  
  
  QoS 2
&lt;/h3&gt;

&lt;p&gt;QoS 2 ensures that a message is delivered &lt;strong&gt;exactly once&lt;/strong&gt;, with &lt;strong&gt;no loss and no duplication&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;It uses a &lt;strong&gt;four-step handshake&lt;/strong&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;code&gt;PUBLISH&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;PUBREC&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;PUBREL&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;PUBCOMP&lt;/code&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This ensures both sender and receiver agree that the message was delivered once and only once.&lt;/p&gt;

&lt;p&gt;It provides the highest reliability with the cost of increased overhead, higher latency, and greater memory usage.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foqvt2dw46dak75vxhl0h.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foqvt2dw46dak75vxhl0h.png" alt="QoS 2" width="800" height="280"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use cases&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Financial transactions, billing data, critical industrial control messages&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjdmdezokvl8cxprtlx8f.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjdmdezokvl8cxprtlx8f.png" alt="QoS" width="800" height="416"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Last Will and Testament (LWT)&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;In MQTT, &lt;strong&gt;Last Will and Testament (LWT)&lt;/strong&gt; is a mechanism that helps detect &lt;strong&gt;unexpected client failures&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;It allows an MQTT client to tell the broker in advance:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;“If I disconnect suddenly or crash, publish this message on my behalf.”&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This feature is extremely useful in IoT systems where devices may lose power, crash, or disconnect due to unstable networks.&lt;/p&gt;

&lt;p&gt;Without LWT, other systems would have &lt;strong&gt;no way of knowing&lt;/strong&gt; whether a device went offline intentionally or failed unexpectedly.&lt;/p&gt;

&lt;p&gt;LWT solves this problem by automatically informing subscribers about the device’s failure.&lt;/p&gt;

&lt;h2&gt;
  
  
  Common MQTT question
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What Does a Will Message Contain?&lt;/strong&gt;&lt;br&gt;
A will message is just like a normal MQTT message and includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Topic&lt;/strong&gt; - where the message will be published&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Payload&lt;/strong&gt; - the message content&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;QoS level&lt;/strong&gt; - reliability of delivery&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Retain flag&lt;/strong&gt; (optional)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;What Port does MQTT Normally Use?&lt;/strong&gt;&lt;br&gt;
The standard port is 1883.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can you use MQTT without a broker?&lt;/strong&gt;&lt;br&gt;
No&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What Protocol does MQTT use?&lt;/strong&gt;&lt;br&gt;
The standard version uses TCP/IP.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can multiple clients publish to the same topic?&lt;/strong&gt;&lt;br&gt;
Yes, Multiple clients can publish messages to the same topic.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Is is possible to know the identity of the client that published a message?&lt;/strong&gt;&lt;br&gt;
No, not unless the client includes that information in the topic or payload.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What happens to messages that get published to topics that no one subscribes to?&lt;/strong&gt;&lt;br&gt;
They are discarded by the broker.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How can I find out what topics have been published?&lt;/strong&gt;&lt;br&gt;
You can’t do this easily as the broker doesn’t seem to keep a list of published topics as they aren’t permanent.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can I subscribe to a topic that no one is publishing to?&lt;/strong&gt;&lt;br&gt;
Yes, Subscribing to a topic does not require an active publisher.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Are messages stored on the broker?&lt;/strong&gt;&lt;br&gt;
Yes, but only temporarily. Once messages are delivered to all subscribers, they are discarded.&lt;br&gt;
(See retained messages below.)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What are retained messages?&lt;/strong&gt;&lt;br&gt;
When you publish a message with the retain flag set, the broker stores only the last published message** for that topic.&lt;/p&gt;

&lt;p&gt;This retained message is immediately sent to new subscribers when they subscribe to the topic. MQTT retains only one message per topic.&lt;/p&gt;

&lt;h3&gt;
  
  
  Image sources
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://youtu.be/LTAm1R_4YYE?si=LPI4z6zkOzG-Ynw-" rel="noopener noreferrer"&gt;All the images are from this youtube video&lt;/a&gt;&lt;/p&gt;

</description>
      <category>discuss</category>
      <category>beginners</category>
      <category>learning</category>
      <category>iot</category>
    </item>
    <item>
      <title>Understanding Endianness: Little-Endian vs Big-Endian</title>
      <dc:creator>Aman Prasad</dc:creator>
      <pubDate>Thu, 12 Feb 2026 13:55:12 +0000</pubDate>
      <link>https://forem.com/amanprasad/understanding-endianness-little-endian-vs-big-endian-31e</link>
      <guid>https://forem.com/amanprasad/understanding-endianness-little-endian-vs-big-endian-31e</guid>
      <description>&lt;p&gt;&lt;strong&gt;Endianness&lt;/strong&gt; refers to the order in which bytes are arranged and stored in computer memory.&lt;/p&gt;

&lt;p&gt;In simple terms, endianness decides &lt;strong&gt;which byte is stored at the lowest memory address&lt;/strong&gt;: the most significant byte (MSB) or the least significant byte (LSB).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A simple analogy:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Think of endianness like reading direction.&lt;/p&gt;

&lt;p&gt;Some languages read &lt;strong&gt;left to right&lt;/strong&gt;, while others read &lt;strong&gt;right to left&lt;/strong&gt;. Both convey the same information but only if you know the rule beforehand.&lt;/p&gt;

&lt;p&gt;Similarly, computers need a defined rule to interpret multi-byte values correctly.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Two Main Types of Endianness
&lt;/h2&gt;

&lt;p&gt;Most modern systems use one of two byte-ordering schemes.&lt;/p&gt;

&lt;p&gt;To illustrate, consider storing the 4-byte hexadecimal value: &lt;code&gt;0x12345678&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F38xshgc9p3b5eivdwmjh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F38xshgc9p3b5eivdwmjh.png" alt="little endian and big endian with memory address diagram" width="800" height="350"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Big-Endian
&lt;/h3&gt;

&lt;p&gt;In Big-Endian systems, the &lt;strong&gt;most significant byte&lt;/strong&gt; (MSB) is stored at the lowest memory address. &lt;br&gt;
This format is often considered more &lt;em&gt;human-readable&lt;/em&gt; because it matches how we write numbers.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Memory Address&lt;/th&gt;
&lt;th&gt;Stored Byte&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Address +0&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;0x12&lt;/code&gt; (MSB)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Address +1&lt;/td&gt;
&lt;td&gt;&lt;code&gt;0x34&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Address +2&lt;/td&gt;
&lt;td&gt;&lt;code&gt;0x56&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Address +3&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;0x78&lt;/code&gt; (LSB)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;It is used in Networking protocols, older mainframes and legacy systems.&lt;/p&gt;
&lt;h3&gt;
  
  
  Little-Endian (LE)
&lt;/h3&gt;

&lt;p&gt;In Little-Endian systems, the &lt;strong&gt;least significant byte&lt;/strong&gt; (LSB) is stored at the lowest memory address. This layout aligns well with how CPUs perform arithmetic internally.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Memory Address&lt;/th&gt;
&lt;th&gt;Stored Byte&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Address +0&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;0x78&lt;/code&gt; (LSB)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Address +1&lt;/td&gt;
&lt;td&gt;&lt;code&gt;0x56&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Address +2&lt;/td&gt;
&lt;td&gt;&lt;code&gt;0x34&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Address +3&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;0x12&lt;/code&gt; (MSB)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;It is used by Intel (x86), AMD, and most modern desktops, laptops and embedded MCUs.&lt;/p&gt;
&lt;h3&gt;
  
  
  &lt;strong&gt;Bi-Endianness&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Many modern processors, like &lt;strong&gt;ARM&lt;/strong&gt;, are actually &lt;strong&gt;Bi-endian&lt;/strong&gt;. This means they can be configured to operate in either Big-Endian or Little-Endian mode depending on the operating system's requirements. &lt;br&gt;
In practice, most modern ARM systems run in &lt;strong&gt;little-endian mode&lt;/strong&gt;.&lt;/p&gt;
&lt;h2&gt;
  
  
  &lt;strong&gt;Why Does It Matter?&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;In high-level languages like Python or Java, endianness is usually hidden.&lt;/p&gt;

&lt;p&gt;However, it becomes critical in the following cases:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Networking:&lt;/strong&gt; The internet uses big-endian. Without proper byte conversion, data sent from a little-endian system will be misinterpreted.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Binary File Sharing:&lt;/strong&gt; Opening a binary file created on a big-endian system on a little-endian machine can corrupt values unless handled correctly.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Low-Level Programming:&lt;/strong&gt; In C, assembly, or embedded systems, incorrect assumptions about byte order lead to subtle and dangerous bugs.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  &lt;strong&gt;Detecting Endianness Using C&lt;/strong&gt;
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="cp"&gt;#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;stdio.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;stdint.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
&lt;/span&gt;
&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;uint16_t&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;uint8_t&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;
        &lt;span class="n"&gt;printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Little Endian"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Big Endian"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  Step by step explanation
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;The Variable &lt;code&gt;uint16_t x = 1&lt;/code&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;uint16_t&lt;/code&gt; is a &lt;strong&gt;16-bit (2-byte)&lt;/strong&gt; integer&lt;/li&gt;
&lt;li&gt;The numeric value is &lt;code&gt;1&lt;/code&gt;, but memory must store it using &lt;strong&gt;two bytes&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Possible memory layouts:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Endianness&lt;/th&gt;
&lt;th&gt;Memory Bytes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Big-Endian&lt;/td&gt;
&lt;td&gt;&lt;code&gt;00 01&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Little-Endian&lt;/td&gt;
&lt;td&gt;&lt;code&gt;01 00&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;The pointer cast&lt;/strong&gt;  &lt;code&gt;(uint8_t*)&amp;amp;x&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;This line does three things:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;&amp;amp;x&lt;/code&gt; → gets the memory address of &lt;code&gt;x&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;(uint8_t*)&lt;/code&gt; → treats that address as a pointer to a &lt;strong&gt;single byte&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Dereferencing reads the &lt;strong&gt;first byte in memory&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;if &lt;code&gt;*((uint8_t)&amp;amp;x == 1)*&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;If the first byte is &lt;code&gt;1&lt;/code&gt;&lt;/strong&gt;: the least significant byte is stored first. The system is &lt;strong&gt;Little-Endian&lt;/strong&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;If the first byte is &lt;code&gt;0&lt;/code&gt;&lt;/strong&gt;: the most significant byte is stored first. The system is &lt;strong&gt;Big-Endian&lt;/strong&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;
  
  
  A Subtle Advantage of Little-Endian
&lt;/h2&gt;

&lt;p&gt;One often-mentioned but rarely explained advantage of little-endian systems is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;The same value can be read from memory at different widths using the same base address.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This works because, in little-endian memory, the &lt;strong&gt;least significant byte is stored at the lowest address&lt;/strong&gt;.&lt;/p&gt;
&lt;h3&gt;
  
  
  Example
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="kt"&gt;uint32_t&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mh"&gt;0x12345678&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Little-endian memory layout:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Memory Address&lt;/th&gt;
&lt;th&gt;Stored Byte&lt;/th&gt;
&lt;th&gt;Significance&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;&amp;amp;x + 0&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;0x78&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Least Significant Byte (LSB)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;&amp;amp;x + 1&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;0x56&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;&amp;amp;x + 2&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;0x34&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;&amp;amp;x + 3&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;0x12&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Most Significant Byte (MSB)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;code&gt;&amp;amp;x&lt;/code&gt; always points to the &lt;strong&gt;lowest memory address&lt;/strong&gt; (&lt;code&gt;&amp;amp;x + 0&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;Now, reading from the same address:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Read size&lt;/th&gt;
&lt;th&gt;Expression&lt;/th&gt;
&lt;th&gt;Result&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;8-bit&lt;/td&gt;
&lt;td&gt;&lt;code&gt;*(uint8_t*)&amp;amp;x&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;0x78&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;16-bit&lt;/td&gt;
&lt;td&gt;&lt;code&gt;*(uint16_t*)&amp;amp;x&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;0x5678&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;32-bit&lt;/td&gt;
&lt;td&gt;&lt;code&gt;*(uint32_t*)&amp;amp;x&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;0x12345678&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The starting address never changes only the read size does. Increasing the width naturally reveals more significant bytes.&lt;/p&gt;
&lt;h2&gt;
  
  
  Why Do We Care About the &lt;em&gt;Lower&lt;/em&gt; Bytes?
&lt;/h2&gt;

&lt;p&gt;This raises an important question:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;If big-endian systems expose the upper part of a number first, why do CPUs and programmers care so much about the lower bytes?&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The answer lies in &lt;strong&gt;how arithmetic works&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;In any positional number system, the &lt;strong&gt;least significant bits form the foundation of the value&lt;/strong&gt;, while higher bits only add scale.&lt;/p&gt;

&lt;p&gt;For example: &lt;code&gt;0x12345678&lt;/code&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The lower byte (&lt;code&gt;0x78&lt;/code&gt;) controls changes of ±1&lt;/li&gt;
&lt;li&gt;The upper byte (&lt;code&gt;0x12&lt;/code&gt;) only affects large magnitude&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All arithmetic operations addition, subtraction, multiplication &lt;strong&gt;start from the least significant byte&lt;/strong&gt; and propagate upward using carry.&lt;/p&gt;
&lt;h2&gt;
  
  
  Addition and Subtraction
&lt;/h2&gt;

&lt;p&gt;When adding multi-byte numbers, the CPU must process the &lt;strong&gt;least significant byte first&lt;/strong&gt; to determine whether a carry occurs.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;  Carry:           1 1
  Value A:   0x0 0 F F
+ Value B:   0x0 0 0 1
  --------------------
  Result:    0x0 1 0 0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;LSB addition: &lt;code&gt;0xFF + 0x01 = 0x00&lt;/code&gt; (carry = 1)&lt;/li&gt;
&lt;li&gt;Next byte uses the carry to produce &lt;code&gt;0x01&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;In little-endian systems, the LSB is fetched first, allowing computation to begin immediately while higher bytes are fetched in parallel. This was especially important on early 8-bit and 16-bit processors and strongly influenced CPU and compiler design.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Big-Endian Still Exists
&lt;/h2&gt;

&lt;p&gt;If little-endian fits computation so well, why does big-endian persist?&lt;/p&gt;

&lt;p&gt;The reason is &lt;strong&gt;legacy and standardization&lt;/strong&gt;, not performance.&lt;/p&gt;

&lt;p&gt;Big-endian is used in:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Networking (TCP/IP)&lt;/li&gt;
&lt;li&gt;File formats like &lt;strong&gt;JPEG&lt;/strong&gt; and &lt;strong&gt;PNG&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Older architectures and mainframes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Once a format or protocol is defined, changing its byte order would break compatibility with existing data and software.&lt;/p&gt;

&lt;h2&gt;
  
  
  Modern Reality
&lt;/h2&gt;

&lt;p&gt;On modern systems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;CPU arithmetic happens in registers (no endianness)&lt;/li&gt;
&lt;li&gt;Caches and pipelines hide memory order&lt;/li&gt;
&lt;li&gt;High-level languages abstract it away&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Today, &lt;strong&gt;endianness is mostly a data-format concern&lt;/strong&gt;, not a CPU performance concern except in networking, embedded systems, and low-level code.&lt;/p&gt;

&lt;h3&gt;
  
  
  Reference
&lt;/h3&gt;

&lt;p&gt;This discussion is inspired by community explanations on Stack Overflow:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://stackoverflow.com/questions/13926760/the-reason-behind-endianness" rel="noopener noreferrer"&gt;https://stackoverflow.com/questions/13926760/the-reason-behind-endianness&lt;/a&gt;&lt;/p&gt;

</description>
      <category>discuss</category>
      <category>beginners</category>
      <category>learning</category>
      <category>architecture</category>
    </item>
    <item>
      <title>Flash Memory Explained: NAND vs NOR, Architecture, and Memory Organization</title>
      <dc:creator>Aman Prasad</dc:creator>
      <pubDate>Sat, 07 Feb 2026 13:42:21 +0000</pubDate>
      <link>https://forem.com/amanprasad/flash-memory-explained-nand-vs-nor-architecture-and-memory-organization-3abf</link>
      <guid>https://forem.com/amanprasad/flash-memory-explained-nand-vs-nor-architecture-and-memory-organization-3abf</guid>
      <description>&lt;p&gt;Flash memory is a type of non-volatile semiconductor memory that can be electrically erased and reprogrammed. It is based on floating-gate MOSFETs (Metal-Oxide-Semiconductor Field-Effect Transistors) where data is stored by trapping electrons in a floating gate, altering the threshold voltage of the transistor to represent binary states. Unlike volatile memory like DRAM, flash retains data even when power is removed, making it ideal for applications requiring persistent storage such as SSDs, USB drives, memory cards and embedded systems. Flash memory evolved from EEPROM, but instead of erasing individual bytes, it erases data in larger blocks, which significantly improves speed, density, and cost efficiency.&lt;/p&gt;

&lt;h3&gt;
  
  
  Table of Contents
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
Types of Flash Memory: NAND and NOR

&lt;ul&gt;
&lt;li&gt;NOR Flash&lt;/li&gt;
&lt;li&gt;NAND Flash&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Comparison of NAND and NOR&lt;/li&gt;

&lt;li&gt;Memory Organization: Sector, Block, and Page&lt;/li&gt;

&lt;/ul&gt;

&lt;h2&gt;
  
  
  Types of Flash Memory: NAND and NOR
&lt;/h2&gt;

&lt;p&gt;The two primary types of flash memory are NAND and NOR which are named after the way their memory cells are connected internally, which resembles NAND and NOR logic gates.&lt;/p&gt;

&lt;p&gt;Both use the same basic &lt;strong&gt;floating-gate cell&lt;/strong&gt; design, but they differ in &lt;strong&gt;architecture, access methods, performance, cost, and applications&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0moxeizcpbv32fevgdq8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0moxeizcpbv32fevgdq8.png" alt="NAND and NOR flash memory diagram with truth table" width="800" height="560"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Image source: &lt;a href="https://nexusindustrialmemory.com/guides/what-is-nand-memory/" rel="noopener noreferrer"&gt;nexusindustrialmemory&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  NOR Flash
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv0u7lpemf86z376f7as9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv0u7lpemf86z376f7as9.png" alt="NOR Flash memory" width="655" height="1304"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Image source: &lt;a href="https://www.embedded.com/flash-101-nand-flash-vs-nor-flash/" rel="noopener noreferrer"&gt;embedded.com&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;In &lt;strong&gt;NOR flash&lt;/strong&gt;, memory cells are connected in &lt;strong&gt;parallel&lt;/strong&gt;, with the &lt;strong&gt;drain of each cell connected to a bit line&lt;/strong&gt; and the &lt;strong&gt;source connected to a common source line&lt;/strong&gt; (typically ground). This parallel connection resembles the structure of a &lt;strong&gt;NOR logic gate&lt;/strong&gt;, which is the origin of the name &lt;em&gt;NOR flash&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;This architecture enables &lt;strong&gt;true random access at the byte level&lt;/strong&gt;, allowing the processor to directly read instructions from flash memory. As a result, code can be executed directly from NOR flash using &lt;strong&gt;Execute-In-Place (XIP)&lt;/strong&gt;, without first copying the code into RAM.&lt;/p&gt;

&lt;p&gt;NOR flash offers &lt;strong&gt;fast read access&lt;/strong&gt;, making it ideal for code storage. However, &lt;strong&gt;write and erase operations are slower&lt;/strong&gt; because erase operations occur at the &lt;strong&gt;sector level&lt;/strong&gt;, and the cell structure requires higher voltages and larger physical area. This leads to &lt;strong&gt;lower memory density&lt;/strong&gt; and a &lt;strong&gt;higher cost per bit&lt;/strong&gt; compared to NAND flash.&lt;/p&gt;

&lt;p&gt;Typical applications of NOR flash include &lt;strong&gt;firmware storage&lt;/strong&gt; in embedded systems such as &lt;strong&gt;bootloaders, BIOS, and microcontroller internal flash&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Key Components
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Memory Cell&lt;/strong&gt;: The basic storage element implemented as a &lt;strong&gt;floating-gate MOSFET&lt;/strong&gt;. Data is stored by trapping or removing charge from the floating gate, representing logic &lt;code&gt;0&lt;/code&gt; or &lt;code&gt;1&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Word Line&lt;/strong&gt;: Horizontal lines (in black color) that connect to the control gates of the memory cells. These are used to select specific rows of cells for operations like read, program (write), or erase.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bit Line&lt;/strong&gt;: Vertical orange line at the top which is connected to the drain of the cells. This carries data in and out during read and write operations.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Source Line&lt;/strong&gt;: Vertical blue line at the top right. This is typically connected to ground or a reference voltage and is shared among cells.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Structural Characteristics
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Direct electrical path per cell:&lt;/strong&gt; Each memory cell is connected directly between the &lt;strong&gt;bit line (drain)&lt;/strong&gt; and the &lt;strong&gt;source line (ground)&lt;/strong&gt;. This one-to-one connection allows the state of a single cell to be sensed without interference from neighboring cells.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Independent cell access:&lt;/strong&gt; Because cells are not connected in series, selecting a specific &lt;strong&gt;word line&lt;/strong&gt; activates only the targeted cell(s). This independence enables &lt;strong&gt;true random access&lt;/strong&gt; to individual bytes or words.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Larger physical cell size:&lt;/strong&gt; Each cell requires its own drain contact, source connection, and routing lines. This increases the silicon area per bit, resulting in &lt;strong&gt;lower storage density&lt;/strong&gt; compared to NAND flash.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;High reliability for code storage:&lt;/strong&gt; The simple read path and minimal need for complex error correction make NOR flash highly reliable for instruction fetch and execution, which is critical for firmware and boot code.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Operational Implications
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Read Operation&lt;/strong&gt;: Read operations are &lt;strong&gt;fast and byte-addressable&lt;/strong&gt;. The processor can directly fetch instructions from NOR flash using &lt;strong&gt;Execute-In-Place (XIP)&lt;/strong&gt;, eliminating the need to copy code into RAM before execution.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Write Operation&lt;/strong&gt;: Programming is &lt;strong&gt;slower&lt;/strong&gt; because it involves injecting charge into the floating gate using precise voltage pulses. Writes typically occur at the &lt;strong&gt;page level&lt;/strong&gt;, even if only a small amount of data is modified.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Erase Operation:&lt;/strong&gt; Erase operations are performed at the &lt;strong&gt;sector level&lt;/strong&gt;, where a group of memory cells is cleared simultaneously by removing charge from their floating gates. This operation is relatively slow and requires higher voltages.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  NAND Flash
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdjn1r8eywkqf00pq48qy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdjn1r8eywkqf00pq48qy.png" alt="NAND flash memory" width="796" height="1302"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Image source: &lt;a href="https://www.embedded.com/flash-101-nand-flash-vs-nor-flash/" rel="noopener noreferrer"&gt;embedded.com&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;In &lt;strong&gt;NAND flash&lt;/strong&gt;, memory cells are connected &lt;strong&gt;in series&lt;/strong&gt;, forming a &lt;strong&gt;cell string&lt;/strong&gt; typically consisting of &lt;strong&gt;32 to 128 cells&lt;/strong&gt;. The &lt;strong&gt;drain of one cell is connected to the source of the next&lt;/strong&gt; and the entire string is connected between a &lt;strong&gt;bit line&lt;/strong&gt; at the top and a &lt;strong&gt;common source line&lt;/strong&gt; at the bottom. This serial connection resembles the structure of a &lt;strong&gt;NAND logic gate&lt;/strong&gt;, which is the origin of the name NAND flash.&lt;/p&gt;

&lt;p&gt;This architecture significantly reduces the number of required contacts and routing lines per cell, enabling &lt;strong&gt;much higher storage density&lt;/strong&gt;, &lt;strong&gt;lower cost per bit&lt;/strong&gt; and &lt;strong&gt;larger memory capacities&lt;/strong&gt; than NOR flash. However, because cells are accessed through a series path, NAND flash does &lt;strong&gt;not support true random access&lt;/strong&gt;. Instead, data is accessed in &lt;strong&gt;pages&lt;/strong&gt;, and erase operations are performed in &lt;strong&gt;blocks&lt;/strong&gt;, making random reads slower but bulk data operations highly efficient.&lt;/p&gt;

&lt;h3&gt;
  
  
  Key Components
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Memory Cell&lt;/strong&gt;: The basic storage element implemented as a &lt;strong&gt;floating-gate MOSFET&lt;/strong&gt;. Data is stored by trapping or removing charge from the floating gate, representing logic 0 or 1.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Word Line&lt;/strong&gt;: Horizontal lines (in black color) that connect to the control gates of the memory cells. These are used to select specific rows of cells for operations like read, program (write), or erase.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bit Line&lt;/strong&gt;: Vertical orange line at the top, which is connected to the drain of the top select transistor. This carries data in and out during read and write operations.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Source Line&lt;/strong&gt;: Horizontal blue line at the bottom. This is typically connected to ground or a reference voltage and is shared among strings.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ground Line Select Transistor (SL Select):&lt;/strong&gt; A switch transistor at the bottom of the string that connects or isolates the string from the source line.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Structural Characteristics
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Series-connected cell strings:&lt;/strong&gt; Memory cells are connected in a chain, requiring current to pass through multiple cells to access a target cell.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;High-density layout:&lt;/strong&gt; Fewer contacts and shared routing allow more cells to fit in the same silicon area, resulting in &lt;strong&gt;significantly higher density&lt;/strong&gt; than NOR flash.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Shared access path:&lt;/strong&gt; Cells do not have independent read paths. All unselected cells in a string must be biased ON to access the selected cell.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Complex peripheral circuitry:&lt;/strong&gt; NAND flash requires page buffers, sense amplifiers, and &lt;strong&gt;error correction codes (ECC)&lt;/strong&gt; to ensure data integrity.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Operational Implications
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Read Operation:&lt;/strong&gt; Reads are performed at the &lt;strong&gt;page level&lt;/strong&gt;. An entire page is transferred into an internal buffer and the requested data is then output. This makes random reads slower compared to NOR flash.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Write Operation:&lt;/strong&gt; Programming is &lt;strong&gt;fast and efficient&lt;/strong&gt;, occurring at the page level. NAND flash is well suited for frequent data writes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Erase Operation:&lt;/strong&gt; Erase operations are performed at the &lt;strong&gt;block level&lt;/strong&gt;, where a block consists of many pages. Block erase in NAND flash is faster and more energy-efficient &lt;strong&gt;per bit erased&lt;/strong&gt; compared to NOR flash.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  NAND Flash Cell Types
&lt;/h3&gt;

&lt;p&gt;NAND flash is further classified based on the number of bits stored per cell:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;SLC (Single-Level Cell):&lt;/strong&gt; Stores 1 bit per cell. Offers the highest speed, endurance (up to ~100,000 cycles), and reliability, but at higher cost and lower density.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MLC (Multi-Level Cell):&lt;/strong&gt; Stores 2 bits per cell. Balances density and endurance (3,000–10,000 cycles).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;TLC (Triple-Level Cell):&lt;/strong&gt; Stores 3 bits per cell. Higher density with reduced endurance (1,000–5,000 cycles); common in consumer SSDs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;QLC (Quad-Level Cell):&lt;/strong&gt; Stores 4 bits per cell. Very high density with lower endurance (100–1,000 cycles) and slower performance.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PLC (Penta-Level Cell):&lt;/strong&gt; Stores 5 bits per cell. Emerging technology focused on ultra-high density with increased reliability challenges.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Comparison of NAND and NOR
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;NOR Flash&lt;/th&gt;
&lt;th&gt;NAND Flash&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cell Architecture&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Parallel connection (like NOR gate)&lt;/td&gt;
&lt;td&gt;Series connection (like NAND gate)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Access Method&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Random byte-level access, supports XIP&lt;/td&gt;
&lt;td&gt;Sequential page/block access&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Read Speed&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Faster (e.g., 100-200 ns per byte)&lt;/td&gt;
&lt;td&gt;Slower for random reads, faster for sequential&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Write/Erase Speed&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Slower (sector erase in tens to hundreds of milliseconds, writes in milliseconds)&lt;/td&gt;
&lt;td&gt;Faster (erase in ms, write in µs per page)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Density/Cost&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Lower density, higher cost per bit&lt;/td&gt;
&lt;td&gt;Higher density, lower cost per bit&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Endurance&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Higher than NAND (typically 10⁴–10⁵ erase cycles)&lt;/td&gt;
&lt;td&gt;Varies by type (SLC high, QLC low)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Typical Capacities&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Up to a few GB&lt;/td&gt;
&lt;td&gt;Up to TB-scale&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Power Consumption&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Higher for writes/erases&lt;/td&gt;
&lt;td&gt;Lower overall&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Use Cases&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Code storage, firmware, embedded systems&lt;/td&gt;
&lt;td&gt;Data storage, SSDs, USB drives, memory cards&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Memory Organization: Sector, Block, and Page
&lt;/h2&gt;

&lt;p&gt;Flash memory is &lt;strong&gt;not organized like RAM&lt;/strong&gt;. Instead of allowing free read, write, and erase operations on individual bytes, flash memory follows a &lt;strong&gt;strict hierarchical structure&lt;/strong&gt;. This structure exists because of the &lt;strong&gt;physical nature of flash memory cells&lt;/strong&gt; and how they are erased.&lt;/p&gt;

&lt;p&gt;Flash memory is organized into &lt;strong&gt;pages&lt;/strong&gt; (read/write units) and &lt;strong&gt;blocks or sectors&lt;/strong&gt; (erase units). This design directly impacts how data is stored, modified, and managed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Flash Memory Uses Pages and Blocks
&lt;/h2&gt;

&lt;p&gt;Flash memory cells store data using &lt;strong&gt;charge trapped in floating gates&lt;/strong&gt;. To erase data, a &lt;strong&gt;high voltage&lt;/strong&gt; must be applied to remove this charge.&lt;/p&gt;

&lt;p&gt;Because applying such high voltage to individual cells is impractical and unsafe, flash memory erases &lt;strong&gt;groups of cells together&lt;/strong&gt;. This leads to the following rules:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Read operations&lt;/strong&gt; occur at the page level&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Write operations&lt;/strong&gt; occur at the page level&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Erase operations&lt;/strong&gt; occur at the block (or sector) level&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This asymmetry is fundamental to all flash memory technologies.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Hierarchical Structure of Flash Memory
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Page
&lt;/h3&gt;

&lt;p&gt;A page is the smallest unit used for reading or writing data in flash memory. It is a row of memory cells that share a common word line, which is a control signal.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;In NAND flash: Pages are usually 2KB to 16KB in size, with 4KB being common in modern SSDs. They include a spare area of 64-512 bytes for error correction codes (ECC), metadata, and bad block markers.&lt;/li&gt;
&lt;li&gt;In NOR flash: Pages are smaller, often 256-512 bytes.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Writing to a page involves charging or discharging the floating gates in the cells, which takes microseconds. Pages cannot be overwritten directly; the block that contains the page must be erased first. In NAND, pages have a main data area and a spare area for extra information.&lt;/p&gt;

&lt;h3&gt;
  
  
  Block
&lt;/h3&gt;

&lt;p&gt;A block is a group of pages and the smallest unit that can be erased at once. It is a grid of strings (columns of connected cells) and pages (rows). Erasing uses high voltage to set all bits to 1 via Fowler-Nordheim tunneling.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;In NAND flash: 64-512 pages per block, totaling 128KB to 8MB (e.g., 4MB common).&lt;/li&gt;
&lt;li&gt;In NOR flash: Often called sectors, with erase units of 4KB-256KB.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Blocks wear out over erase cycles. In NAND-based storage devices, this is managed by a Flash Translation Layer (FTL), which performs wear leveling and garbage collection by relocating valid pages before erasing blocks.&lt;/p&gt;

&lt;h3&gt;
  
  
  Sector
&lt;/h3&gt;

&lt;p&gt;The term &lt;strong&gt;sector&lt;/strong&gt; is used differently depending on the type of flash memory and the context. In &lt;strong&gt;NOR flash&lt;/strong&gt;, a sector refers to the &lt;strong&gt;smallest erasable unit&lt;/strong&gt; of memory and is functionally equivalent to a block in NAND flash. NOR flash sectors typically range from &lt;strong&gt;4 KB to 256 KB&lt;/strong&gt; and contain multiple pages.&lt;/p&gt;

&lt;p&gt;In &lt;strong&gt;NAND flash&lt;/strong&gt;, the term sector is not a formal physical unit. It is often used informally to describe a &lt;strong&gt;512-byte or 4 KB logical chunk of data&lt;/strong&gt;, a convention inherited from hard disk drives. These logical sectors map to portions of a page but do not represent erase units. In NAND flash, &lt;strong&gt;pages are the smallest read/write units&lt;/strong&gt;, and &lt;strong&gt;blocks are the smallest erase units&lt;/strong&gt;.&lt;/p&gt;

</description>
      <category>discuss</category>
      <category>learning</category>
      <category>beginners</category>
      <category>science</category>
    </item>
    <item>
      <title>Discovering Hall Sensors: The Hidden Tech in Laptops and TWS Earbuds</title>
      <dc:creator>Aman Prasad</dc:creator>
      <pubDate>Fri, 30 Jan 2026 08:35:17 +0000</pubDate>
      <link>https://forem.com/amanprasad/discovering-hall-sensors-the-hidden-tech-in-laptops-and-tws-earbuds-np7</link>
      <guid>https://forem.com/amanprasad/discovering-hall-sensors-the-hidden-tech-in-laptops-and-tws-earbuds-np7</guid>
      <description>&lt;p&gt;Have you ever wondered why your laptop screen turns off when you close the lid? Or how your True Wireless Stereo (TWS) earbuds, know when the charging case is open or closed? It all comes down to a clever little component called the Hall sensor. In this short post, I'll share a fun experiment I did that uncovers this tech in everyday devices.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Laptop Trick
&lt;/h2&gt;

&lt;p&gt;It started with a simple curiosity. I placed a magnet near the edges of my laptop base and the display turned off! Why? Laptops use Hall sensors (named after physicist Edwin Hall) to detect magnetic fields. These sensors are typically embedded near the hinge or edges. When you close the lid, a small magnet in the display aligns with the sensor in the base and it signal the system to sleep or turn off the screen. By mimicking that with an external magnet, you can "trick" the laptop into thinking the lid is closed.&lt;/p&gt;

&lt;p&gt;I even made a quick video demonstrating this.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F33ecl6rif44wd1ca5anq.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F33ecl6rif44wd1ca5anq.gif" alt="gif demonstrate how the laptop screen is off when it comes with the magnet" width="240" height="426"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://drive.google.com/file/d/1FF5mvIboPbtt53mJtUKB2PfJWi54nwyD/view?usp=sharing" rel="noopener noreferrer"&gt;Watch the same demo in higher quality&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Extending to TWS Earbuds
&lt;/h2&gt;

&lt;p&gt;Inspired, I dug into my TWS charging case. These cases also detect lid status to pause charging, play audio, or enter sleep mode. Sure enough, after some disassembly, I spotted a Hall sensor inside! It's positioned to react to a magnet in the lid, just like in laptops.&lt;br&gt;
Here's a photo I took of the Hall sensor in the TWS.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuwcfydrr19x28ijize2m.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuwcfydrr19x28ijize2m.jpg" alt="TWS circuit" width="800" height="1062"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fol0giplbu9uv0fb2s3vt.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fol0giplbu9uv0fb2s3vt.jpg" alt="TWS circuit" width="800" height="1066"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjwuwmd7mt11a3jh6hvmq.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjwuwmd7mt11a3jh6hvmq.gif" alt="hall sensor in TWS" width="426" height="240"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://drive.google.com/file/d/1omv9YTGlJNzZh3EbewhAbpz1DXRb0cnx/view?usp=sharing" rel="noopener noreferrer"&gt;Watch the same demo in higher quality&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The sensor is tiny but powerful, using the Hall effect to measure magnetic field changes and convert them into electrical signals that the device interprets.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Hall Sensors Work
&lt;/h2&gt;

&lt;p&gt;A Hall sensor is a semiconductor that generates a voltage difference when exposed to a magnetic field perpendicular to the current flow.&lt;br&gt;
In devices like laptops and TWS, this voltage triggers actions like screen off/on or power management.&lt;br&gt;
Pro tip: If you're into hardware hacking, tools like a multimeter or Arduino can help you experiment with these sensors safely.&lt;/p&gt;

&lt;p&gt;This discovery shows how universal tech like Hall sensors powers seamless user experiences across gadgets. Next time you close your laptop or pop open your earbuds case, give a nod to Edwin Hall!&lt;/p&gt;

&lt;p&gt;If you've tried similar experiments or have tips on Hall sensor projects, drop a comment below. Thanks for reading! 🚀&lt;/p&gt;

</description>
      <category>discuss</category>
      <category>tutorial</category>
      <category>science</category>
      <category>architecture</category>
    </item>
    <item>
      <title>Memory Layout in Embedded Systems: How C Code Really Ends Up in FLASH and RAM</title>
      <dc:creator>Aman Prasad</dc:creator>
      <pubDate>Thu, 29 Jan 2026 05:34:20 +0000</pubDate>
      <link>https://forem.com/amanprasad/memory-layout-in-embedded-systems-how-c-code-really-ends-up-in-flash-and-ram-34c0</link>
      <guid>https://forem.com/amanprasad/memory-layout-in-embedded-systems-how-c-code-really-ends-up-in-flash-and-ram-34c0</guid>
      <description>&lt;p&gt;The CPU does not understand variables, types, or sections. It only executes raw commands to "read address X" or "write address Y." It only understands &lt;strong&gt;memory addresses.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When you declare a variable, you are effectively requesting storage. The &lt;strong&gt;Compiler&lt;/strong&gt; assigns it to a logical section (like &lt;code&gt;.data&lt;/code&gt; or &lt;code&gt;.bss&lt;/code&gt;), and the &lt;strong&gt;Linker&lt;/strong&gt; calculates its final physical address based on the rules defined in your &lt;strong&gt;Linker Script&lt;/strong&gt;.&lt;br&gt;
If you don't understand this mapping, you are blind to the root causes of memory corruption and performance bottlenecks. In embedded systems, correct logic placed in the wrong memory is still a broken system.&lt;/p&gt;
&lt;h3&gt;
  
  
  Table of Contents
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;From C Code to Binary: Who Decides Memory Placement&lt;/li&gt;
&lt;li&gt;FLASH Memory Layout (Non-Volatile Sections)&lt;/li&gt;
&lt;li&gt;RAM Memory Layout (Volatile Sections)&lt;/li&gt;
&lt;li&gt;Startup Code: The Invisible Hand Before main()&lt;/li&gt;
&lt;li&gt;The Truth Table: Where Does It Go?&lt;/li&gt;
&lt;li&gt;Verifying Memory Placement&lt;/li&gt;
&lt;li&gt;Final Rules to Remember&lt;/li&gt;
&lt;li&gt;Conclusion&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  From C Code to Binary: Who decides memory placement
&lt;/h2&gt;

&lt;p&gt;Your C code doesn't just become a binary. It passes through a four-stage transformation. This transformation happens entirely at build time, long before the binary is flashed or executed on the CPU. Understanding this pipeline reveals that C syntax defines &lt;strong&gt;logic&lt;/strong&gt;, while the &lt;strong&gt;Linker Script&lt;/strong&gt; defines &lt;strong&gt;location&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. The Preprocessor:&lt;/strong&gt; The often-forgotten first step. It handles &lt;code&gt;#include&lt;/code&gt; files and expands &lt;code&gt;#define&lt;/code&gt; macros. It doesn't care about memory or logic; it simply performs text manipulation to prepare a pure C file for the compiler.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. The Compiler:&lt;/strong&gt; The Compiler translates C logic into &lt;strong&gt;Assembly instructions&lt;/strong&gt;. At this stage, the tool works with placeholders (logical categories like &lt;code&gt;.data&lt;/code&gt; or &lt;code&gt;.bss&lt;/code&gt;). It does not decide physical memory locations. It doesn't know &lt;em&gt;where&lt;/em&gt; memory is? It only works with placeholders.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. The Assembler:&lt;/strong&gt; The Assembler converts those assembly instructions into &lt;strong&gt;Machine Code&lt;/strong&gt;. It produces &lt;strong&gt;relocatable object files&lt;/strong&gt;. These files contain the binary logic, but the addresses are still  &lt;strong&gt;relocatable&lt;/strong&gt;. They are not yet tied to a physical spot in your RAM or FLASH.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. The Linker:&lt;/strong&gt; The Linker is the architect. It takes all the relocatable object files and uses the &lt;strong&gt;Linker Script (.ld)&lt;/strong&gt; to assign every symbol a fixed, physical address in &lt;strong&gt;FLASH&lt;/strong&gt; or &lt;strong&gt;RAM&lt;/strong&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The Bottom Line:&lt;br&gt;
You write &lt;code&gt;int x = 10;&lt;/code&gt; but the linker decides whether if that &lt;code&gt;10&lt;/code&gt; lives at address &lt;code&gt;0x20000004&lt;/code&gt; (RAM) or causes a collision. Memory placement is entirely controlled by the linker script.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2&gt;
  
  
  FLASH Memory Layout (Non-Volatile Sections)
&lt;/h2&gt;

&lt;p&gt;Flash is the permanent home for everything your program knows but does not need to change. Its contents survive resets and power loss.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fky8tcewpluhjnksu8s93.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fky8tcewpluhjnksu8s93.png" alt="FLASH Memory Layout" width="800" height="1024"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;.isr_vector&lt;/code&gt;  (The Map)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Located at the very start of FLASH (typically &lt;code&gt;0x00000000&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;It contains the initial stack pointer and the addresses of the Reset Handler and all Interrupt Service Routines. On reset, the CPU fetches this table first to know how to start execution.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;.text&lt;/code&gt;  (The Instructions)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Contains the compiled machine instructions for the application, libraries, and ISRs. The CPU executes this code directly from FLASH using Execute-In-Place (XIP).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;.rodata&lt;/code&gt;  (The Constants)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Stores read-only data such as &lt;code&gt;const&lt;/code&gt; global variables, lookup tables, and string literals.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why &lt;code&gt;const&lt;/code&gt; Saves RAM:&lt;/strong&gt;&lt;br&gt;
If you write &lt;code&gt;const int table[] = {1, 2, 3};&lt;/code&gt;, the array lives &lt;strong&gt;only&lt;/strong&gt; in Flash. If you forget &lt;code&gt;const&lt;/code&gt;, the linker forces it into RAM (so you can edit it), wasting precious SRAM for data that never changes. &lt;strong&gt;Always use &lt;code&gt;const&lt;/code&gt; for lookup tables.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The String Literal Trap&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;const char *ptr = "Hello";&lt;/code&gt; → The string "Hello" is stored in FLASH (&lt;code&gt;.rodata&lt;/code&gt;) but the pointer &lt;code&gt;ptr&lt;/code&gt; lives in RAM. Safe and RAM-efficient&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;char arr[] = "Hello";&lt;/code&gt; → The string "Hello" is stored in Flash &lt;em&gt;and&lt;/em&gt; copied to &lt;strong&gt;RAM&lt;/strong&gt; at startup. (Costs extra RAM,  allows modification if modification is necessary).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Warning:&lt;/strong&gt; If you remove &lt;code&gt;const&lt;/code&gt; from the pointer (&lt;code&gt;char *ptr = "Hello";&lt;/code&gt;), the string &lt;em&gt;still&lt;/em&gt; lives in Flash (&lt;code&gt;.rodata&lt;/code&gt;).&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;With &lt;code&gt;const&lt;/code&gt;:&lt;/strong&gt; The compiler gives you an error if you try to write to it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Without &lt;code&gt;const&lt;/code&gt;:&lt;/strong&gt; The compiler allows the write because the type system no longer enforces read-only access, even though the underlying memory is still read-only, but when the CPU tries to write to the Read-Only Flash address, the system triggers a &lt;strong&gt;HARD FAULT&lt;/strong&gt; and crashes.&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;Rule: Removing const does not move the string to RAM. It only removes protection and makes undefined behavior possible.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3&gt;
  
  
  The Hidden Data Sections in FLASH: &lt;code&gt;.data&lt;/code&gt; and &lt;code&gt;.bss&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;Although &lt;code&gt;.data&lt;/code&gt; and &lt;code&gt;.bss&lt;/code&gt; are runtime RAM sections, FLASH plays a critical role in their initialization. It represent the bridge between storage (Flash) and execution (RAM).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;.data&lt;/code&gt;  Initialized Global variables (LMA vs VMA)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Global initialized variables (e.g., &lt;code&gt;int score = 100;&lt;/code&gt;). This variable &lt;em&gt;must&lt;/em&gt; live in RAM so you can change it. But RAM is wiped at power loss. So where does the &lt;code&gt;100&lt;/code&gt; come from?&lt;/p&gt;

&lt;p&gt;This section lives a double life.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;In Flash (LMA - Load Memory Address):&lt;/strong&gt; The initial value (&lt;code&gt;100&lt;/code&gt;) is stored here to survive power loss.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;In RAM (VMA - Virtual Memory Address):&lt;/strong&gt; The startup code reserves space for the variable here.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Mechanism:&lt;/strong&gt; Before &lt;code&gt;main()&lt;/code&gt; runs, the startup code copies the values from Flash (LMA) to RAM (VMA).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;.bss&lt;/code&gt; — Zero-Initialized Global&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;.bss&lt;/code&gt; section contains global and static variables that are uninitialized or explicitly set to zero &lt;br&gt;
(e.g., &lt;code&gt;int counter;&lt;/code&gt;, &lt;code&gt;static int flag;&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;No space is reserved for these variables in FLASH; only RAM is allocated.&lt;br&gt;
At startup, the runtime clears the entire &lt;code&gt;.bss&lt;/code&gt; region to zero before &lt;code&gt;main()&lt;/code&gt; executes.&lt;br&gt;
This avoids wasting FLASH space storing zeros, so &lt;code&gt;.bss&lt;/code&gt; consumes &lt;strong&gt;RAM only&lt;/strong&gt;.&lt;/p&gt;
&lt;h2&gt;
  
  
  RAM Memory Layout (Volatile Sections)
&lt;/h2&gt;

&lt;p&gt;RAM is the system’s &lt;strong&gt;working memory&lt;/strong&gt;. It holds all writable runtime state and is rebuilt on every reset.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb6ubl9acng5qbn9pib6t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb6ubl9acng5qbn9pib6t.png" alt="RAM Memory Layout" width="800" height="812"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;.data&lt;/code&gt;  (Active Variables)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Contains initialized global and static variables copied from FLASH during startup. These variables are freely read and modified during execution.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;.bss&lt;/code&gt;  (Zeroed Variables)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Holds global and static variables without explicit initial values. This entire region is cleared to zero at startup for predictable behavior.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Heap  (Dynamic Memory)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Starts after &lt;code&gt;.bss&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Grows upward&lt;/li&gt;
&lt;li&gt;Used by &lt;code&gt;malloc()&lt;/code&gt; / &lt;code&gt;free()&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Fragmentation-prone&lt;/li&gt;
&lt;li&gt;No bounds checking&lt;/li&gt;
&lt;li&gt;In embedded systems, uncontrolled heap usage leads to &lt;strong&gt;Fragmentation&lt;/strong&gt;. Many safety-critical systems restrict or avoid heap usage entirely to avoid instability.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Stack  (Execution Context)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Starts at the top of RAM&lt;/li&gt;
&lt;li&gt;Grows &lt;strong&gt;downward&lt;/strong&gt; from the end of RAM.&lt;/li&gt;
&lt;li&gt;Stores function call frames, local variables, return addresses, and interrupt context
Since the Stack grows down and the Heap grows up, they are on a collision course. If the Stack grows too deep (recursion), it will silently overwrite the Heap or &lt;code&gt;.bss&lt;/code&gt; variables. This is the #1 cause of "ghost bugs."&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Startup Code: The Invisible Hand Before &lt;code&gt;main()&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;In a standard C course, you are taught that "execution begins at &lt;code&gt;main()&lt;/code&gt;." &lt;strong&gt;On a microcontroller, this is a lie.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Execution does &lt;strong&gt;not&lt;/strong&gt; begin at &lt;code&gt;main()&lt;/code&gt; on a microcontroller.&lt;/p&gt;

&lt;p&gt;Before user code runs, startup code prepares the execution environment:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Stack Pointer Init:&lt;/strong&gt; Loads the Main Stack Pointer (MSP) from the vector table. Without this, functions cannot be called.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;.data&lt;/code&gt; Copy:&lt;/strong&gt; Copies initial values from Flash to RAM. If this fails, variables start with garbage values.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;.bss&lt;/code&gt; Zeroing:&lt;/strong&gt; The entire &lt;code&gt;.bss&lt;/code&gt; region is cleared to zero in RAM.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;System Initialization:&lt;/strong&gt; Clock and low-level hardware configuration is performed.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Jump to &lt;code&gt;main()&lt;/code&gt;&lt;/strong&gt; Only after memory is prepared does execution enter the application.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If any of these steps fail, variables contain garbage, the stack corrupts memory, and failures appear unrelated to the real cause.&lt;/p&gt;
&lt;h2&gt;
  
  
  The Truth Table: Where Does It Go?
&lt;/h2&gt;

&lt;p&gt;Here is a quick reference guide to predict where your variables will land.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;strong&gt;Variable Declaration&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Segment&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Why?&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;int x;&lt;/code&gt; (Global)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;.bss&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;No initial value. Zeroed by startup code.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;int x = 10;&lt;/code&gt; (Global)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;.data&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Needs a non-zero initial value. Copied from Flash.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;const int x = 10;&lt;/code&gt; (Global)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;.rodata&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Read-only. Stays in Flash.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;static int x = 5;&lt;/code&gt; (Local)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;.data&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;static&lt;/code&gt; means "persist forever." Cannot live on Stack.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;int x = 5;&lt;/code&gt; (Local)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Stack&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Temporary. Exists only while function runs.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;char *s = "Text";&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;.rodata&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;String is in Flash; Pointer is in RAM.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;char s[] = "Text";&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Stack&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Array is on Stack; String is copied into it.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;malloc(10)&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Heap&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Requested manually by programmer.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;h2&gt;
  
  
  Verifying Memory Placement
&lt;/h2&gt;

&lt;p&gt;Understanding memory layout is meaningless unless it can be &lt;strong&gt;verified&lt;/strong&gt;. Embedded systems do not tolerate assumptions. Use these tools to turn theory into engineering.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 1:&lt;/strong&gt; Use this minimal snippet to force variables into every section of the memory.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="cp"&gt;#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;stdio.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;stdlib.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
&lt;/span&gt;
&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;var_bss&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;                    &lt;span class="c1"&gt;// Uninitialized -&amp;gt; .bss&lt;/span&gt;
&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;var_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;              &lt;span class="c1"&gt;// Initialized   -&amp;gt; .data&lt;/span&gt;
&lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;var_rodata&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;     &lt;span class="c1"&gt;// Read-only     -&amp;gt; .rodata (Flash)&lt;/span&gt;

&lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;memory_map_test&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;var_stack&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;          &lt;span class="c1"&gt;// Local         -&amp;gt; Stack&lt;/span&gt;
    &lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;var_static&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// Static Local  -&amp;gt; .data&lt;/span&gt;
    &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;var_heap&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;malloc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;  &lt;span class="c1"&gt;// Dynamic       -&amp;gt; Heap&lt;/span&gt;

    &lt;span class="n"&gt;printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Code (.text):   %p&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;memory_map_test&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;free&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;var_heap&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;memory_map_test&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Step 2:&lt;/strong&gt; High-Level Footprint (&lt;code&gt;size&lt;/code&gt;)&lt;br&gt;
Run &lt;code&gt;size &amp;lt;filename.exe&amp;gt;&lt;/code&gt; to see the total consumption.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="n"&gt;size&lt;/span&gt; &lt;span class="n"&gt;test&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;exe&lt;/span&gt;
&lt;span class="n"&gt;text&lt;/span&gt;    &lt;span class="n"&gt;data&lt;/span&gt;     &lt;span class="n"&gt;bss&lt;/span&gt;     &lt;span class="n"&gt;dec&lt;/span&gt;     &lt;span class="n"&gt;hex&lt;/span&gt; &lt;span class="n"&gt;filename&lt;/span&gt;
&lt;span class="mi"&gt;14696&lt;/span&gt;    &lt;span class="mi"&gt;1560&lt;/span&gt;     &lt;span class="mi"&gt;116&lt;/span&gt;   &lt;span class="mi"&gt;16372&lt;/span&gt;    &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="n"&gt;ff4&lt;/span&gt; &lt;span class="n"&gt;test&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;exe&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Step 3:&lt;/strong&gt; Forensic Inspection (&lt;code&gt;nm&lt;/code&gt;)&lt;br&gt;
Use &lt;code&gt;nm&lt;/code&gt; to prove exactly which section each variable occupies.&lt;br&gt;
Run this command&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;nm test.exe | &lt;span class="nb"&gt;grep &lt;/span&gt;var_
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="n"&gt;nm&lt;/span&gt;  &lt;span class="n"&gt;test&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;exe&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;grep&lt;/span&gt; &lt;span class="n"&gt;var_&lt;/span&gt;
&lt;span class="mo"&gt;00407070&lt;/span&gt; &lt;span class="n"&gt;B&lt;/span&gt; &lt;span class="n"&gt;_var_bss&lt;/span&gt;
&lt;span class="mo"&gt;00404004&lt;/span&gt; &lt;span class="n"&gt;D&lt;/span&gt; &lt;span class="n"&gt;_var_data&lt;/span&gt;
&lt;span class="mo"&gt;00405064&lt;/span&gt; &lt;span class="n"&gt;R&lt;/span&gt; &lt;span class="n"&gt;_var_rodata&lt;/span&gt;
&lt;span class="mo"&gt;0040400&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="n"&gt;_var_static&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2277&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;T&lt;/strong&gt; = Text (Flash)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;R&lt;/strong&gt; = Read-only (Flash)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;D&lt;/strong&gt; = Data (RAM)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;B&lt;/strong&gt; = BSS (RAM)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Step 4:&lt;/strong&gt; The Ground Truth (Map File)&lt;/p&gt;

&lt;p&gt;Enable the linker map file (&lt;code&gt;-Wl,-Map=output.map&lt;/code&gt;) in your IDE. This is the final document showing every symbol and its physical address. Use it to verify that your symbols are not colliding and are placed within the correct memory boundaries defined in your &lt;code&gt;.ld&lt;/code&gt; script.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Rules to Remember
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;The CPU only understands addresses&lt;/li&gt;
&lt;li&gt;The linker decides memory placement&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;.data&lt;/code&gt; costs FLASH + RAM&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;.bss&lt;/code&gt; costs RAM only&lt;/li&gt;
&lt;li&gt;Stack overflows are silent&lt;/li&gt;
&lt;li&gt;Always verify memory with tools&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;You manage the memory, or the memory manages you. By understanding the pipeline from the Compiler to the Linker, and verifying your layout with tools, you transform from a C programmer into an Embedded Engineer.&lt;/p&gt;

</description>
      <category>c</category>
      <category>memory</category>
      <category>learning</category>
      <category>programming</category>
    </item>
    <item>
      <title>Understanding the ABI by Observation</title>
      <dc:creator>Aman Prasad</dc:creator>
      <pubDate>Fri, 23 Jan 2026 04:29:31 +0000</pubDate>
      <link>https://forem.com/amanprasad/understanding-the-abi-by-observation-5865</link>
      <guid>https://forem.com/amanprasad/understanding-the-abi-by-observation-5865</guid>
      <description>&lt;h3&gt;
  
  
  📌Table of Contents
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;What Exactly Is an ABI?&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The ABI Contract: What It Defines&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A. Calling Convention&lt;/li&gt;
&lt;li&gt;B. Data Layout &amp;amp; Alignment&lt;/li&gt;
&lt;li&gt;C. Stack Frame&lt;/li&gt;
&lt;li&gt;D. Name Mangling (C++)&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Target Context: ARM Cortex-M (AAPCS)&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;Practical Exploration: ABI in Action with C Functions&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Function with 2 Arguments (&lt;code&gt;add2&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Function with 4 Arguments (&lt;code&gt;add4&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Function with 5 Arguments (&lt;code&gt;add5&lt;/code&gt;)&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Verification: Which Way Does the Stack Grow?&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Data Layout &amp;amp; Alignment: The Offset Proof&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;ELF Symbols and Function Size (&lt;code&gt;nm -S&lt;/code&gt;)&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Name Mangling: C vs C++&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Key Takeaways&lt;/p&gt;&lt;/li&gt;

&lt;/ul&gt;

&lt;h2&gt;
  
  
  What Exactly Is an ABI?
&lt;/h2&gt;

&lt;p&gt;An &lt;strong&gt;Application Binary Interface (ABI)&lt;/strong&gt; is a low-level contract that defines how &lt;em&gt;compiled binaries&lt;/em&gt; interact.&lt;/p&gt;

&lt;p&gt;It ensures that when &lt;strong&gt;Function A&lt;/strong&gt; calls &lt;strong&gt;Function B&lt;/strong&gt;, both sides agree on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;where arguments are located&lt;/li&gt;
&lt;li&gt;where return values appear&lt;/li&gt;
&lt;li&gt;how control returns to the caller&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This remains true even if:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the code was compiled with different compilers&lt;/li&gt;
&lt;li&gt;parts are written in different languages (C, C++, Assembly)&lt;/li&gt;
&lt;li&gt;libraries are precompiled&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without an ABI, &lt;code&gt;Module_A.o&lt;/code&gt; might pass an argument in a register while &lt;code&gt;Module_B.o&lt;/code&gt; expects it on the stack. The result is not a compiler error  it is silent runtime failure.&lt;/p&gt;

&lt;h2&gt;
  
  
  The ABI Contract: What It Defines
&lt;/h2&gt;

&lt;p&gt;An ABI specifies rules that compiled code must obey so that independently compiled binaries can interoperate correctly at runtime.&lt;/p&gt;

&lt;h3&gt;
  
  
  A. Calling Convention
&lt;/h3&gt;

&lt;p&gt;The calling convention defines:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Argument Passing&lt;/strong&gt;
Which arguments go in registers, which go on the stack, and in what order.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Return Values&lt;/strong&gt;
Which register holds the result (e.g., &lt;code&gt;r0&lt;/code&gt; on ARM).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Register Preservation&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Caller-saved (volatile)&lt;/strong&gt;: the caller must save them if needed.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Callee-saved (non-volatile)&lt;/strong&gt;: the callee must preserve and restore them.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Under AAPCS, registers &lt;code&gt;r0–r3&lt;/code&gt; and &lt;code&gt;r12&lt;/code&gt; are caller-saved, while &lt;code&gt;r4–r11&lt;/code&gt; are callee-saved.&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;
  
  
  B. Data Layout and Alignment
&lt;/h3&gt;

&lt;p&gt;The ABI defines:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Type Size&lt;/strong&gt;
For example, &lt;code&gt;int&lt;/code&gt; is 32-bit on ARM EABI.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Alignment Rules&lt;/strong&gt;
32-bit data must be aligned to 4-byte boundaries.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Structure Padding&lt;/strong&gt;
Compilers insert padding bytes to preserve alignment.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is why &lt;code&gt;sizeof(struct)&lt;/code&gt; is often larger than the sum of its members.&lt;/p&gt;

&lt;h3&gt;
  
  
  C. Stack Frame
&lt;/h3&gt;

&lt;p&gt;The ABI governs stack behavior:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Stack growth direction&lt;/li&gt;
&lt;li&gt;Where the return address lives&lt;/li&gt;
&lt;li&gt;How local variables are addressed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The ABI specifies stack alignment and what must be preserved at function boundaries, while the compiler decides how to implement the prologue and epilogue.&lt;/p&gt;

&lt;h3&gt;
  
  
  D. Name Mangling (C++)
&lt;/h3&gt;

&lt;p&gt;C++ supports function overloading, so function names must encode type information.&lt;/p&gt;

&lt;p&gt;The ABI standardizes this encoding so binaries can link correctly across compilers.&lt;/p&gt;

&lt;h2&gt;
  
  
  Target Context: ARM Cortex-M (AAPCS)
&lt;/h2&gt;

&lt;p&gt;On STM32 and other Cortex-M systems, the ABI is &lt;strong&gt;AAPCS&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Key rules:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;First 4 integer arguments → &lt;code&gt;r0–r3&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;5th and rest arguments → stack&lt;/li&gt;
&lt;li&gt;Return value → &lt;code&gt;r0&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Return address → &lt;code&gt;lr&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Everything below is verified against this ABI.&lt;/p&gt;

&lt;h2&gt;
  
  
  Practical Exploration: ABI in Action with C Functions
&lt;/h2&gt;

&lt;p&gt;We examine unoptimized (&lt;code&gt;-O0&lt;/code&gt;) and optimized (&lt;code&gt;-O2&lt;/code&gt;) output to separate:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;ABI rules&lt;/strong&gt; from &lt;strong&gt;compiler implementation details&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Function with 2 Arguments (&lt;code&gt;add2&lt;/code&gt;)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;add2&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;At optimization &lt;code&gt;-O0&lt;/code&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fle96kpecvrwwt7npgi7y.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fle96kpecvrwwt7npgi7y.png" alt="add 2 with optimization disabled" width="788" height="625"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;At &lt;code&gt;-O0&lt;/code&gt;, the assembly contains stack setup, spills, and reloads.&lt;/p&gt;

&lt;p&gt;This noise exists for debugging — not because the ABI requires it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;At optimization &lt;code&gt;-O2&lt;/code&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Farhexh0kgfkmrs7c54uf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Farhexh0kgfkmrs7c54uf.png" alt="add 2 with optimization -O2" width="644" height="244"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;At &lt;code&gt;-O2&lt;/code&gt;, only the ABI-mandated behavior remains:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Arguments arrive in &lt;code&gt;r0&lt;/code&gt;, &lt;code&gt;r1&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Result placed in &lt;code&gt;r0&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Return via &lt;code&gt;bx lr&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Key insight:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Everything except argument location, return value, and return mechanism is compiler detail.&lt;/p&gt;

&lt;h3&gt;
  
  
  Function with 4 Arguments (&lt;code&gt;add4&lt;/code&gt;)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;add4&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;with optimization &lt;code&gt;-O0&lt;/code&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F59uv23i2kc8l78y1gh38.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F59uv23i2kc8l78y1gh38.png" alt="add 4 with optimization disabled" width="800" height="820"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;with optimization &lt;code&gt;-O2&lt;/code&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feshqe1knjlega02q7i4d.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feshqe1knjlega02q7i4d.png" alt="add 4 with optimization -O2" width="729" height="325"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  ABI Guarantees on Entry
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;r0&lt;/code&gt; → &lt;code&gt;a&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;r1&lt;/code&gt; → &lt;code&gt;b&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;r2&lt;/code&gt; → &lt;code&gt;c&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;r3&lt;/code&gt; → &lt;code&gt;d&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This function uses &lt;strong&gt;all available argument registers&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;No stack access is required to receive arguments.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;For up to four arguments, the ARM ABI passes all parameters in registers.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Function with 5 Arguments (&lt;code&gt;add5&lt;/code&gt;)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;add5&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This function crosses an &lt;strong&gt;ABI boundary&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  The ABI Rule
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;r0–r3&lt;/code&gt; → first four arguments&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;5th argument → stack&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is the first time the stack becomes mandatory.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;with optimization &lt;code&gt;-O0&lt;/code&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkvhxsodw41o90nazw04w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkvhxsodw41o90nazw04w.png" alt="add 5 with optimization disabled" width="800" height="843"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;with optimization &lt;code&gt;-O2&lt;/code&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdzwz3sa91j28pyq26v35.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdzwz3sa91j28pyq26v35.png" alt="add 5 with optimization -O2" width="800" height="366"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;At &lt;code&gt;-O2&lt;/code&gt;, the instruction:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight nasm"&gt;&lt;code&gt;&lt;span class="nf"&gt;ldr&lt;/span&gt; &lt;span class="nv"&gt;ip&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;sp&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;proves that the fifth argument must be fetched from memory.&lt;/p&gt;

&lt;p&gt;This behavior is &lt;strong&gt;ABI law&lt;/strong&gt;, not optimization or compiler behavior.&lt;/p&gt;

&lt;h2&gt;
  
  
  Verification: Which Way Does the Stack Grow?
&lt;/h2&gt;

&lt;p&gt;Rather than assuming, we verify it directly from assembly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Observation
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight nasm"&gt;&lt;code&gt;&lt;span class="nf"&gt;str&lt;/span&gt; &lt;span class="nv"&gt;fp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;sp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="err"&gt;!&lt;/span&gt;
&lt;span class="nf"&gt;sub&lt;/span&gt; &lt;span class="nb"&gt;sp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;sp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Both instructions &lt;strong&gt;subtract from &lt;code&gt;sp&lt;/code&gt;&lt;/strong&gt; to allocate space.&lt;/p&gt;

&lt;h3&gt;
  
  
  Conclusion
&lt;/h3&gt;

&lt;p&gt;If allocating space requires decrementing the stack pointer, the stack grows &lt;strong&gt;toward lower memory addresses&lt;/strong&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Verified: The stack grows downward.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Data Layout &amp;amp; Alignment: The Offset Proof
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Claim
&lt;/h3&gt;

&lt;p&gt;A 32-bit &lt;code&gt;int&lt;/code&gt; must be 4-byte aligned.&lt;/p&gt;

&lt;p&gt;A &lt;code&gt;char&lt;/code&gt; followed by an &lt;code&gt;int&lt;/code&gt; requires padding.&lt;/p&gt;

&lt;h3&gt;
  
  
  Experiment
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;MyPackedStruct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kt"&gt;int&lt;/span&gt;  &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;get_i&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;MyPackedStruct&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Optimized Assembly (&lt;code&gt;O2&lt;/code&gt;)
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Focaay5jom43obx3uixl5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Focaay5jom43obx3uixl5.png" alt="type size optimized -O2" width="800" height="301"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Optimization disabled (&lt;code&gt;O0&lt;/code&gt;)
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fukzmkq9224p8lip210nm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fukzmkq9224p8lip210nm.png" alt="type size optimization disabled" width="800" height="554"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Observation
&lt;/h3&gt;

&lt;p&gt;The field &lt;code&gt;i&lt;/code&gt; is accessed at offset &lt;code&gt;#4&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;If there were no padding, the offset would be &lt;code&gt;#1&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Conclusion
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;int&lt;/code&gt; is 4 bytes wide&lt;/li&gt;
&lt;li&gt;The compiler inserted 3 bytes of padding&lt;/li&gt;
&lt;li&gt;This layout is &lt;strong&gt;ABI-mandated&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Optimization does not change it&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  ELF Symbols and Function Size (&lt;code&gt;nm -S&lt;/code&gt;)
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;aman&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="n"&gt;intget_i&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;MyPackedStruct&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Command
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;arm-none-eabi-nm &lt;span class="nt"&gt;-S&lt;/span&gt; main.o
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Output
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;0000000000000004 B aman
0000000000000028 T get_i

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Interpretation
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;aman&lt;/code&gt; is 4 bytes → confirms &lt;code&gt;int&lt;/code&gt; size.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;B&lt;/code&gt; (BSS)&lt;/strong&gt;: uninitialized global data&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;code&gt;get_i&lt;/code&gt; occupies 0x28 bytes (40 bytes).&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;T&lt;/code&gt; (Text)&lt;/strong&gt;: executable code&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At &lt;code&gt;-O0&lt;/code&gt;, this corresponds to 10 ARM instructions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="mi"&gt;10&lt;/span&gt; &lt;span class="err"&gt;×&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt; &lt;span class="n"&gt;bytes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;40&lt;/span&gt; &lt;span class="n"&gt;bytes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mh"&gt;0x28&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At &lt;code&gt;-O2&lt;/code&gt;, the function shrinks to 2 instructions (8 bytes).&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Function size is a compiler artifact; ABI rules are not.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Name Mangling: C vs C++
&lt;/h2&gt;

&lt;h3&gt;
  
  
  C
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="c1"&gt;// C file&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;run this command and you’ll get this symbol add&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
arm-none-eabi-nm.exe add.o
00000000 T add

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  C++
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="c1"&gt;// C++ file&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;run this command and you’ll get this symbol _Z3addii&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;arm-none-eabi-nm.exe add.o
00000000 T _Z3addii
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This encodes function name and parameter types.&lt;/p&gt;

&lt;p&gt;Using:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="k"&gt;extern&lt;/span&gt;&lt;span class="s"&gt;"C"&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;disables mangling and restores the C symbol name.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;The ABI governs how functions are called, how data is laid out, how the stack behaves, and how symbols are named — all of which can be verified directly from generated binaries.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Once you learn to &lt;strong&gt;observe&lt;/strong&gt; the ABI instead of memorizing it, low-level code stops being mysterious and starts being predictable.&lt;/p&gt;

</description>
      <category>programming</category>
      <category>assembly</category>
      <category>learning</category>
      <category>beginners</category>
    </item>
    <item>
      <title>Function Prologue and Epilogue in ARM: What Really Happens When a Function Enters and Exits</title>
      <dc:creator>Aman Prasad</dc:creator>
      <pubDate>Fri, 16 Jan 2026 07:47:57 +0000</pubDate>
      <link>https://forem.com/amanprasad/function-prologue-and-epilogue-in-arm-what-really-happens-when-a-function-enters-and-exits-34p4</link>
      <guid>https://forem.com/amanprasad/function-prologue-and-epilogue-in-arm-what-really-happens-when-a-function-enters-and-exits-34p4</guid>
      <description>&lt;p&gt;Function prologue and epilogue are the instructions executed at the beginning and end of a function to preserve required CPU state and manage the stack. Although they are not visible in C code, the compiler automatically inserts these sequences to ensure correct function execution. In this article, we examine how ARM compilers use prologue and epilogue to safely handle function calls at the assembly level.&lt;/p&gt;

&lt;h2&gt;
  
  
  Table of Contents
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Why Function Prologue and Epilogue Exist&lt;/li&gt;
&lt;li&gt;The Rulebook: AAPCS&lt;/li&gt;
&lt;li&gt;What Happens at Function Entry: The Prologue&lt;/li&gt;
&lt;li&gt;What Happens at Function Exit: The Epilogue&lt;/li&gt;
&lt;li&gt;
From C Code to Assembly: A Practical Example

&lt;ul&gt;
&lt;li&gt;Understanding the Assembly Output&lt;/li&gt;
&lt;li&gt;Prologue — Setting Up the Stack Frame&lt;/li&gt;
&lt;li&gt;Function Body — Execution of C Logic&lt;/li&gt;
&lt;li&gt;Epilogue — Cleaning Up and Returning&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Leaf vs Non-Leaf Functions&lt;/li&gt;

&lt;li&gt;Prologue and Epilogue in Interrupts and Context Switching&lt;/li&gt;

&lt;li&gt;Naked Functions: Skipping Prologue and Epilogue (When and Why)&lt;/li&gt;

&lt;li&gt;Conclusion&lt;/li&gt;

&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why Function Prologue and Epilogue Exist
&lt;/h2&gt;

&lt;p&gt;On ARM, function calls reuse the same CPU registers and stack memory. Without a defined mechanism to save and restore this state, operations performed inside a function corrupt the caller’s execution context without a defined calling convention, operations performed inside a function would corrupt the caller’s execution context.. To prevent this, the compiler automatically inserts a function prologue and epilogue that preserve required registers and restore the stack state, ensuring correct program execution.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Rulebook: AAPCS
&lt;/h2&gt;

&lt;p&gt;Before we look at the assembly, we need to understand why the code is generated this way.&lt;/p&gt;

&lt;p&gt;In the ARM ecosystem, all toolchains follow a strict set of rules called the &lt;strong&gt;AAPCS&lt;/strong&gt; (Procedure Call Standard for the ARM Architecture). This standard defines:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Which registers a function can overwrite freely (Caller-Saved: &lt;code&gt;R0-R3&lt;/code&gt;, &lt;code&gt;R12&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;Which registers a function must preserve and restore (Callee-Saved: &lt;code&gt;R4-R11&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;How the stack is managed (Full Descending Stack, 8-byte alignment).&lt;/li&gt;
&lt;li&gt;The AAPCS also defines how function arguments are passed and how return values are delivered.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The Prologue and Epilogue are simply the compiler's way of enforcing these rules consistently across all functions.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Happens at Function Entry: The Prologue
&lt;/h2&gt;

&lt;p&gt;When a function is called on ARM Cortex-M, the compiler executes a short sequence of instructions at the function entry known as the prologue. These instructions run before any user-defined C code in function and prepare the stack and registers according to the AAPCS. A typical Cortex-M prologue looks like this (details will be examined in the example below)&lt;/p&gt;




&lt;h2&gt;
  
  
  What Happens at Function Exit: The Epilogue
&lt;/h2&gt;

&lt;p&gt;At function return, the compiler inserts a short sequence of instructions known as the epilogue. Its role is to undo the changes made by the prologue and restore the CPU state so execution can safely resume in the caller.&lt;/p&gt;

&lt;p&gt;The exact instructions used depend on the function, but the epilogue typically releases the stack frame, restores saved registers, and returns control to the caller. These steps are shown in the assembly example below.&lt;/p&gt;




&lt;h2&gt;
  
  
  From C Code to Assembly: A Practical Example
&lt;/h2&gt;

&lt;p&gt;To make this concrete, the following example was compiled for an STM32F407 (ARM Cortex-M4) with optimizations disabled (&lt;code&gt;-O0&lt;/code&gt;). The generated assembly uses the &lt;code&gt;Thumb-2&lt;/code&gt; instruction set, as is standard on Cortex-M cores. We focus on the assembly generated for &lt;code&gt;compute_sum()&lt;/code&gt;, a non-leaf function that calls another function.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;compute_sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;
    &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;temp1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;temp2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;temp1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;temp2&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;
    &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;compute_sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Assembly generated for &lt;code&gt;compute_sum&lt;/code&gt; function&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;080002f8 &amp;lt;compute_sum&amp;gt;:
 80002f8:   b580        push    {r7, lr}
 80002fa:   b086        sub sp, #24
 80002fc:   af00        add r7, sp, #0
 80002fe:   6078        str r0, [r7, #4]
 8000300:   6039        str r1, [r7, #0]
 8000302:   687b        ldr r3, [r7, #4]
 8000304:   005b        lsls    r3, r3, #1
 8000306:   617b        str r3, [r7, #20]
 8000308:   683a        ldr r2, [r7, #0]
 800030a:   4613        mov r3, r2
 800030c:   005b        lsls    r3, r3, #1
 800030e:   4413        add r3, r2
 8000310:   613b        str r3, [r7, #16]
 8000312:   6939        ldr r1, [r7, #16]
 8000314:   6978        ldr r0, [r7, #20]
 8000316:   f7ff ffe1   bl  80002dc &amp;lt;add&amp;gt;
 800031a:   60f8        str r0, [r7, #12]
 800031c:   68fb        ldr r3, [r7, #12]
 800031e:   4618        mov r0, r3
 8000320:   3718        adds    r7, #24
 8000322:   46bd        mov sp, r7
 8000324:   bd80        pop {r7, pc}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This function allocates local variables and calls another function, which makes it a non-leaf function.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6klt34nkyt9jg4yvywnf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6klt34nkyt9jg4yvywnf.png" alt="assembly code for the compute_sum function" width="800" height="1125"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Understanding the Assembly Output
&lt;/h2&gt;

&lt;p&gt;The image above shows the disassembly of the &lt;code&gt;compute_sum()&lt;/code&gt; function. The instructions are visually divided into three regions: Prologue, Function Body, and Epilogue. Each region serves a distinct purpose in the execution of the function.&lt;/p&gt;

&lt;h3&gt;
  
  
  Prologue — setting up the stack frame
&lt;/h3&gt;

&lt;p&gt;The prologue appears at the top of the function:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;push {r7, lr}
sub  sp, #24
add  r7, sp, #0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This sequence is the function prologue and it is inserted automatically by the compiler.&lt;br&gt;
At function entry, the compiler:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Saves &lt;code&gt;r7&lt;/code&gt; and &lt;code&gt;lr&lt;/code&gt; so the caller’s frame pointer and return address are not lost.&lt;/li&gt;
&lt;li&gt;Reserves 24 bytes on the stack for local variables and compiler-generated temporaries&lt;/li&gt;
&lt;li&gt;Even though the function defines only three &lt;code&gt;int&lt;/code&gt; variables (12 bytes), extra space is allocated to maintain alignment and to give the compiler room for temporary values, which is common when optimizations are disabled (&lt;code&gt;-O0&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Sets up &lt;code&gt;r7&lt;/code&gt; as a frame pointer, allowing all local variables to be accessed using fixed offsets regardless of changes to &lt;code&gt;sp&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Together, these steps create a private stack frame for the function, ensuring it can execute and return without disturbing the caller’s state.&lt;/p&gt;
&lt;h3&gt;
  
  
  Function Body — execution of C logic
&lt;/h3&gt;

&lt;p&gt;The middle section of the image corresponds to the actual work performed by &lt;code&gt;compute_sum()&lt;/code&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The input parameters (&lt;code&gt;x&lt;/code&gt; and &lt;code&gt;y&lt;/code&gt;) are first stored on the stack so they can be reused&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;temp1&lt;/code&gt; is calculated as &lt;code&gt;x * 2&lt;/code&gt; using a left-shift operation&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;temp2&lt;/code&gt; is calculated as &lt;code&gt;y * 3&lt;/code&gt; using a shift followed by an add&lt;/li&gt;
&lt;li&gt;The computed values are loaded into registers and passed to &lt;code&gt;add()&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The instruction &lt;code&gt;bl &amp;lt;add&amp;gt;&lt;/code&gt; performs a function call and overwrites the Link Register (&lt;code&gt;lr&lt;/code&gt;). Because of this, &lt;code&gt;lr&lt;/code&gt; must be saved earlier in the prologue. This is what makes &lt;code&gt;compute_sum()&lt;/code&gt; a non-leaf function.&lt;/p&gt;
&lt;h3&gt;
  
  
  Epilogue — cleaning up and returning
&lt;/h3&gt;

&lt;p&gt;This sequence forms the function epilogue and restores the caller’s state.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;adds r7, #24
mov  sp, r7
pop  {r7, pc}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;The stack space allocated for the function is released&lt;/li&gt;
&lt;li&gt;The original frame pointer (r7) is restored&lt;/li&gt;
&lt;li&gt;The return address is loaded into the program counter (pc), returning execution to the caller&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The epilogue exactly mirrors the prologue, ensuring the function exits with the CPU state unchanged.&lt;/p&gt;




&lt;h2&gt;
  
  
  Leaf vs Non-Leaf Functions
&lt;/h2&gt;

&lt;p&gt;Not all functions require the same prologue and epilogue.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A leaf function&lt;/strong&gt; is a function that does not call any other function. Since it never executes a BL instruction, the Link Register (LR) is not overwritten. As a result, the compiler may omit saving LR and, in some cases, avoid creating a full stack frame altogether.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A non-leaf function&lt;/strong&gt;, on the other hand, calls one or more functions. Because a BL instruction overwrites LR, the function must save LR in its prologue and restore it in the epilogue. Non-leaf functions almost always require a stack frame to preserve state and manage local variables.&lt;/p&gt;

&lt;p&gt;Whether a function is leaf or non-leaf directly influences how much code the compiler inserts at function entry and exit.&lt;/p&gt;




&lt;h2&gt;
  
  
  Prologue and Epilogue in Interrupts and Context Switching
&lt;/h2&gt;

&lt;p&gt;On ARM Cortex-M, a similar mechanism appears in interrupt handling. When an interrupt occurs, the hardware automatically pushes an architecturally defined subset of the CPU state onto the stack and restores it on return. RTOS context switching extends this idea in software. While the mechanisms differ, the goal is the same: preserving execution context.&lt;/p&gt;




&lt;h2&gt;
  
  
  Naked Functions: Skipping Prologue and Epilogue (When and Why)
&lt;/h2&gt;

&lt;p&gt;By default, the compiler generates a prologue and epilogue to manage the stack and preserve registers according to the AAPCS. Using &lt;code&gt;__attribute__((naked))&lt;/code&gt;, this behavior can be disabled entirely.&lt;/p&gt;

&lt;p&gt;A naked function is compiled without any automatically generated prologue or epilogue. The compiler does not save or restore registers, allocate stack space, enforce stack alignment, or generate a return sequence. All responsibility for preserving CPU state and managing the stack falls entirely on the programmer.&lt;/p&gt;

&lt;p&gt;This is only appropriate in very low-level code, such as task context switching, interrupt entry routines, or early boot initialization. Because naked functions bypass the ABI completely, the compiler does not protect register or stack state. Even small mistakes can therefore cause stack corruption or hard faults.&lt;/p&gt;

&lt;p&gt;For this reason, naked functions should not be used in normal application code. They are intended only for situations where compiler-generated prologue and epilogue code must be avoided and the programmer is prepared to manage the CPU state manually.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Function prologue and epilogue are fundamental to how ARM compilers implement safe and predictable function calls. By following the AAPCS, the compiler ensures registers, stack state, and return flow are preserved across function boundaries. Understanding how these mechanisms work especially at the assembly level makes it easier to analyze stack usage, debug low-level issues, and write reliable embedded software.&lt;/p&gt;

</description>
      <category>discuss</category>
      <category>learning</category>
      <category>programming</category>
      <category>coding</category>
    </item>
    <item>
      <title>Bit Fields in C Explained: How They Work and Why They Matter</title>
      <dc:creator>Aman Prasad</dc:creator>
      <pubDate>Sat, 10 Jan 2026 04:14:33 +0000</pubDate>
      <link>https://forem.com/amanprasad/bit-fields-in-c-explained-how-they-work-and-why-they-matter-34i9</link>
      <guid>https://forem.com/amanprasad/bit-fields-in-c-explained-how-they-work-and-why-they-matter-34i9</guid>
      <description>&lt;p&gt;We often use full integers to store simple flags that need only one bit. Bit fields in C seem like an easy way to save memory by using just the bits we need.&lt;br&gt;
But this simplicity hides compiler and hardware details that can change how the data is actually stored in memory.&lt;/p&gt;
&lt;h2&gt;
  
  
  📌Table of Contents
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;What Are Bit Fields?&lt;/li&gt;
&lt;li&gt;Bit Fields vs Normal Structure Members&lt;/li&gt;
&lt;li&gt;How Compilers Actually Store Bit Fields&lt;/li&gt;
&lt;li&gt;Appropriate Uses of Bit Fields&lt;/li&gt;
&lt;li&gt;Rules of Thumb&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  What Are Bit Fields?
&lt;/h2&gt;

&lt;p&gt;A bit field is a special &lt;code&gt;struct&lt;/code&gt; member that allows you to specify exactly how many bits a variable should occupy, rather than using the standard byte-aligned sizes.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;Date&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;unsigned&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;day&lt;/span&gt;   &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;  &lt;span class="c1"&gt;// 5 bits (Range: 0-31)&lt;/span&gt;
    &lt;span class="kt"&gt;unsigned&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;month&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;  &lt;span class="c1"&gt;// 4 bits (Range: 0-15)&lt;/span&gt;
    &lt;span class="kt"&gt;unsigned&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;year&lt;/span&gt;  &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;11&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// 11 bits (Range: 0-2047)&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Instead of allocating a full &lt;code&gt;int&lt;/code&gt; (typically &lt;code&gt;32 bits&lt;/code&gt;) for each member, the compiler may pack these fields together to reduce memory usage.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Syntax and basic rules&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A bit field is defined by placing a colon &lt;code&gt;:&lt;/code&gt; after a structure member name, followed by the number of bits it should use.&lt;/li&gt;
&lt;li&gt;Bit fields can only be declared inside a &lt;code&gt;struct&lt;/code&gt;. They cannot exist as standalone variables.&lt;/li&gt;
&lt;li&gt;Bit fields are not addressable objects in C, so the address-of operator (&lt;code&gt;&amp;amp;&lt;/code&gt;) cannot be used on them.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Bit Fields vs Normal Structure Members
&lt;/h2&gt;

&lt;p&gt;This behavior contrasts with normal structure members.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Normal structure members are aligned to byte boundaries, so each int usually consumes 4 bytes, even if it stores only a small value.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Bit field members, on the other hand, can be packed into adjacent bits within a machine word, allowing multiple small values to share the same underlying storage.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The trade-off is clear: normal members offer predictable layout, while bit fields trade layout guarantees for compactness and expressiveness.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Compilers Actually Store Bit Fields
&lt;/h2&gt;

&lt;p&gt;This is where bit fields stop being simple.&lt;br&gt;
When you write a structure like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;Flags&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;unsigned&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kt"&gt;unsigned&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kt"&gt;unsigned&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It is natural to assume that these fields are placed in memory one after another, each occupying a single bit in order, for example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;bit 0 -&amp;gt; a
bit 1 -&amp;gt; b
bit 2 -&amp;gt; c
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;However, the C standard does not guarantee any such layout.&lt;br&gt;
Bit fields are stored inside a larger storage unit, typically the base type used in their declaration such as &lt;code&gt;unsigned int&lt;/code&gt;. How individual bit fields are placed within that storage unit is largely decided by the compiler.&lt;/p&gt;

&lt;p&gt;In particular, the C standard does not define:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the ordering of bits within a word (LSB vs MSB)&lt;/li&gt;
&lt;li&gt;how bit fields are packed across bytes&lt;/li&gt;
&lt;li&gt;alignment and padding rules&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Two different compilers targeting the same architecture are therefore allowed to produce different memory layouts for the same bit-field structure.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;This does not make bit fields useless. It means they are context-sensitive and not a reliable way to control precise bit-level memory layout.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2&gt;
  
  
  Appropriate Uses of Bit Fields
&lt;/h2&gt;

&lt;p&gt;Bit fields and manual bit masking serve different purposes, even though both operate at the bit level.&lt;/p&gt;

&lt;p&gt;Bit fields are best used to represent logical state inside your program. They improve readability, group related flags naturally, and work well when the exact memory layout does not matter outside the program. This makes them a good fit for internal flags, state machines, and configuration structures.&lt;/p&gt;

&lt;p&gt;Manual bit masking is the correct choice when exact bit positions matter. This includes hardware registers, binary protocols, and any layout defined by a datasheet or specification. Bit masks provide full control over bit positions, behave consistently across compilers, and match hardware documentation exactly.&lt;/p&gt;

&lt;p&gt;For example, when working with hardware registers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="cp"&gt;#define UART_RXNE (1 &amp;lt;&amp;lt; 5)
#define UART_TC   (1 &amp;lt;&amp;lt; 6)
#define UART_TXE  (1 &amp;lt;&amp;lt; 7)
&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This approach may look less elegant than bit fields, but it is precise, portable, and unambiguous. In embedded systems, correctness matters more than elegance.&lt;/p&gt;

&lt;p&gt;While &lt;code&gt;bool&lt;/code&gt; works for individual flags, multiple &lt;code&gt;bool&lt;/code&gt; members still consume at least one byte each and may introduce padding.&lt;/p&gt;

&lt;h2&gt;
  
  
  Rules of Thumb
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Bit fields express meaning, not layout.&lt;/li&gt;
&lt;li&gt;Never use bit fields for memory-mapped hardware registers.&lt;/li&gt;
&lt;li&gt;Use bit fields for internal flags and logical program state.&lt;/li&gt;
&lt;li&gt;Use bit masks for hardware registers and binary protocols.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>discuss</category>
      <category>programming</category>
      <category>coding</category>
      <category>learning</category>
    </item>
    <item>
      <title>Why Arrays Start at Index 0: A Memory-Level Explanation</title>
      <dc:creator>Aman Prasad</dc:creator>
      <pubDate>Sun, 04 Jan 2026 14:52:04 +0000</pubDate>
      <link>https://forem.com/amanprasad/why-arrays-start-at-index-0-a-memory-level-explanation-393p</link>
      <guid>https://forem.com/amanprasad/why-arrays-start-at-index-0-a-memory-level-explanation-393p</guid>
      <description>&lt;p&gt;Have you ever wondered why arrays in C/C++ (and many other languages) start with indexing at 0 instead of 1?&lt;br&gt;
To understand this properly, we need to look at how arrays are stored in memory and how the compiler computes element addresses.&lt;/p&gt;
&lt;h2&gt;
  
  
  📌 Table of Contents
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Arrays as Contiguous Memory Blocks&lt;/li&gt;
&lt;li&gt;How &lt;code&gt;arr[i]&lt;/code&gt; Works: Pointer Arithmetic Explained&lt;/li&gt;
&lt;li&gt;Why This Forces Indexing to Start at 0&lt;/li&gt;
&lt;li&gt;What If Arrays Started at Index 1?&lt;/li&gt;
&lt;li&gt;Why &lt;code&gt;arr[i]&lt;/code&gt; and &lt;code&gt;i[arr]&lt;/code&gt; Mean the Same Thing&lt;/li&gt;
&lt;li&gt;Conclusion&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;
  
  
  Arrays as Contiguous Memory Blocks
&lt;/h2&gt;

&lt;p&gt;At its core, an array in C/C++ is a fixed-size collection of elements of the same type, stored in contiguous memory locations. When you declare&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;arr&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The compiler allocates space for &lt;code&gt;100&lt;/code&gt; consecutive integers.&lt;br&gt;
On most modern systems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;An int typically occupies &lt;code&gt;4 bytes&lt;/code&gt; (on 32/64-bit architectures).&lt;/li&gt;
&lt;li&gt;So, the array consumes &lt;code&gt;400 bytes&lt;/code&gt;, laid out back-to-back in memory.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  How &lt;code&gt;arr[i]&lt;/code&gt; Works: Pointer Arithmetic Explained
&lt;/h2&gt;

&lt;p&gt;The real reason arrays start at index 0 has nothing to do with counting or convention. It comes from how the compiler rewrites array indexing into pointer arithmetic.&lt;/p&gt;

&lt;p&gt;When you write &lt;code&gt;arr[i]&lt;/code&gt; it is translated directly into &lt;code&gt;*(arr + i)&lt;/code&gt;&lt;br&gt;
This is not an implementation detail. It is how the language defines array subscripting.&lt;br&gt;
This single translation explains why array indexing starts at zero.&lt;/p&gt;

&lt;p&gt;Let’s unpack what each part in &lt;code&gt;*(arr + i)&lt;/code&gt; means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;arr&lt;/code&gt; Refers to the base address of the array. It is the address of the first element (i.e., &lt;code&gt;&amp;amp;arr[0]&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;+ i&lt;/code&gt; Performs pointer arithmetic. This does not add i bytes.
It adds &lt;code&gt;i × sizeof(element_type)&lt;/code&gt; bytes&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;*&lt;/code&gt; Dereferences the computed address to read or write the value.&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;So &lt;code&gt;arr[i]&lt;/code&gt; literally means: Go i elements away from the start of the array, then access the value stored there.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Let’s verify this equivalence with a simple C program&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="cp"&gt;#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;stdio.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
&lt;/span&gt;
&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;arr&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;40&lt;/span&gt;&lt;span class="p"&gt;};&lt;/span&gt;

    &lt;span class="c1"&gt;// Direct array access&lt;/span&gt;
    &lt;span class="n"&gt;printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"arr[1]: %d&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arr&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;  &lt;span class="c1"&gt;// Output: 20&lt;/span&gt;

    &lt;span class="c1"&gt;// Equivalent pointer version&lt;/span&gt;
    &lt;span class="n"&gt;printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"*(arr + 1): %d&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;arr&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;  &lt;span class="c1"&gt;// Same: 20&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Why This Forces Indexing to Start at 0
&lt;/h2&gt;

&lt;p&gt;Here’s the key insight: the first element isn’t one step away — it lives at the base address.&lt;br&gt;
There is zero distance and zero bytes to skip.&lt;/p&gt;

&lt;p&gt;That means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;distance from base address = 0&lt;/li&gt;
&lt;li&gt;offset = 0&lt;/li&gt;
&lt;li&gt;index = 0&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s why the first element is accessed as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;arr[0] == *(arr + 0)&lt;/code&gt; — no adjustment needed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each subsequent element is reached by moving forward in memory:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;arr[1] == *(arr + 1)&lt;/code&gt; — skip 1 element (4 bytes for &lt;code&gt;int&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;arr[2] == *(arr + 2)&lt;/code&gt; — skip 2 elements (8 bytes)&lt;/li&gt;
&lt;li&gt;and so on&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each index represents how many elements to move forward from the base address.&lt;br&gt;
No additional arithmetic or correction is required.&lt;/p&gt;

&lt;p&gt;An index is an offset measured in elements.&lt;br&gt;
Offsets start at 0 because nothing can be closer than zero distance from the origin.&lt;br&gt;
This follows directly from how memory addressing and pointer arithmetic work.&lt;/p&gt;
&lt;h2&gt;
  
  
  What If Arrays Started at Index 1?
&lt;/h2&gt;

&lt;p&gt;Now that we know &lt;code&gt;arr[i]&lt;/code&gt; is just syntactic sugar for &lt;code&gt;*(arr + i)&lt;/code&gt;, let’s imagine a different design.&lt;/p&gt;

&lt;p&gt;Suppose arrays were &lt;strong&gt;1-based indexed&lt;/strong&gt;, as in some mathematical tools (for example, MATLAB), where the first element is accessed as &lt;code&gt;arr[1]&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Pointer arithmetic itself does not change.&lt;br&gt;
&lt;code&gt;arr[i]&lt;/code&gt; would still translate to:&lt;br&gt;
&lt;code&gt;*(arr + i)&lt;/code&gt;&lt;br&gt;
If we applied this rule directly:&lt;br&gt;
&lt;code&gt;arr[1]&lt;/code&gt; → &lt;code&gt;*(arr + 1)&lt;/code&gt;&lt;br&gt;
this would actually point to the second element, not the first.&lt;br&gt;
To make 1-based indexing work, the compiler would need to internally rewrite every access as:&lt;br&gt;
&lt;code&gt;arr[i]&lt;/code&gt; → &lt;code&gt;*(arr + (i - 1))&lt;/code&gt;&lt;br&gt;
That subtraction is the key difference.&lt;/p&gt;

&lt;p&gt;While modern compilers can often optimize this subtraction away, one-based indexing still introduces a semantic mismatch with the hardware’s base + offset addressing model. It complicates bounds reasoning and obscures the simple “offset from base” mental model.&lt;/p&gt;

&lt;p&gt;Modern CPU addressing modes operate naturally in terms of base address plus offset, making zero-based indexing a direct and transparent match.&lt;/p&gt;
&lt;h2&gt;
  
  
  Why &lt;code&gt;arr[i]&lt;/code&gt; and &lt;code&gt;i[arr]&lt;/code&gt; Mean the Same Thing
&lt;/h2&gt;

&lt;p&gt;Once you understand that array indexing in C is defined in terms of pointer arithmetic, an interesting (and often surprising) consequence follows.&lt;br&gt;
In fact, the C standard defines &lt;code&gt;a[b]&lt;/code&gt; as &lt;code&gt;*(a + b)&lt;/code&gt;, which is why &lt;code&gt;b[a]&lt;/code&gt; is also valid C. &lt;/p&gt;

&lt;p&gt;In C, the subscript operator is defined as:&lt;br&gt;
&lt;code&gt;a[b]&lt;/code&gt; = &lt;code&gt;*(a + b)&lt;/code&gt;&lt;br&gt;
This definition does not treat &lt;code&gt;a&lt;/code&gt; as “the array” and &lt;code&gt;b&lt;/code&gt; as “the index.”&lt;br&gt;
It simply means: add &lt;code&gt;b&lt;/code&gt; to &lt;code&gt;a&lt;/code&gt;, then dereference the result.&lt;/p&gt;

&lt;p&gt;Now consider the implication of this definition.&lt;br&gt;
Pointer addition is just integer addition under the hood, and addition is commutative:&lt;br&gt;
(a + b) == (b + a)&lt;/p&gt;

&lt;p&gt;Because of this, both of the following expressions compute the same address&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;*(a + b)&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;*(b + a)&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Which means:&lt;br&gt;
a[b] == b[a]&lt;/p&gt;

&lt;p&gt;This is not a trick, a compiler hack, or undefined behavior.&lt;br&gt;
It is a direct and intentional consequence of how the C language defines array subscripting.&lt;/p&gt;

&lt;p&gt;The example below demonstrates this equivalence in practice.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="cp"&gt;#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;stdio.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
&lt;/span&gt;
&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Declare an array with 5 elements&lt;/span&gt;
    &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;arr&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;};&lt;/span&gt;

    &lt;span class="cm"&gt;/*
     * In C, array access is defined as:
     *   a[b] == *(a + b)
     *
     * Because addition is commutative:
     *   a + b == b + a
     *
     * This means:
     *   arr[3] == 3[arr]
     */&lt;/span&gt;

    &lt;span class="c1"&gt;// Normal array indexing&lt;/span&gt;
    &lt;span class="n"&gt;printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"arr[3]  = %d&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arr&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;   &lt;span class="c1"&gt;// Output: 4&lt;/span&gt;

    &lt;span class="c1"&gt;// Equivalent but unusual indexing&lt;/span&gt;
    &lt;span class="n"&gt;printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"3[arr]  = %d&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;arr&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;   &lt;span class="c1"&gt;// Output: 4&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;NOTE:&lt;/strong&gt; While &lt;code&gt;i[arr]&lt;/code&gt; is valid C, it is rarely used in real code because it hurts readability. It exists only because array indexing is defined in terms of pointer arithmetic.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;In C/C++, array indexing is not about counting positions. It is about measuring offsets from a base address.&lt;/p&gt;

</description>
      <category>programming</category>
      <category>discuss</category>
      <category>c</category>
      <category>computerscience</category>
    </item>
    <item>
      <title>Structure Padding Isn’t Wastage of Memory — It’s a Hardware Requirement</title>
      <dc:creator>Aman Prasad</dc:creator>
      <pubDate>Wed, 31 Dec 2025 17:02:49 +0000</pubDate>
      <link>https://forem.com/amanprasad/structure-padding-isnt-wastage-of-memory-its-a-hardware-requirement-2gk3</link>
      <guid>https://forem.com/amanprasad/structure-padding-isnt-wastage-of-memory-its-a-hardware-requirement-2gk3</guid>
      <description>&lt;p&gt;Have you ever manually calculated the size of a struct, only to find that &lt;code&gt;sizeof&lt;/code&gt; returns a larger number? You aren't crazy, and the compiler isn't broken. In this guide, we’ll decode Structure Padding. why it happens, why your CPU loves it, and how to optimize it for embedded systems.&lt;/p&gt;

&lt;h2&gt;
  
  
  Table of Contents
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;What Is a Structure?&lt;/li&gt;
&lt;li&gt;How Structures Are Stored in Memory&lt;/li&gt;
&lt;li&gt;The Padding Myth&lt;/li&gt;
&lt;li&gt;Why Structure Padding Is Necessary&lt;/li&gt;
&lt;li&gt;Packed Structures: When to Use Them (and When Not To)&lt;/li&gt;
&lt;li&gt;The Trade-Off: Memory vs Performance&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What is a Structure
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;A Structure in C is a user defined data type that allow programmers to group together the values of different data types under a single name.&lt;/li&gt;
&lt;li&gt;The items in the structure are called its members and they can be of any valid data type&lt;/li&gt;
&lt;li&gt;A structure in C is defined using the &lt;code&gt;struct&lt;/code&gt; keyword followed by the structure’s name, inside curly braces {}&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How Structures Are Stored in Memory
&lt;/h2&gt;

&lt;p&gt;Consider the following structure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="cp"&gt;#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;stdio.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
&lt;/span&gt;&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;example&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;     &lt;span class="c1"&gt;// 1 byte&lt;/span&gt;
                &lt;span class="c1"&gt;// 3 byte of padding inserted here&lt;/span&gt;
    &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;      &lt;span class="c1"&gt;// 4 bytes&lt;/span&gt;
    &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;     &lt;span class="c1"&gt;// 1 byte&lt;/span&gt;
                &lt;span class="c1"&gt;// 3 bytes of padding inserted here&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"size of struct = %zu bytes&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
           &lt;span class="k"&gt;sizeof&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;example&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;  &lt;span class="c1"&gt;// prints 12 bytes (instead of 6)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At first glance, this structure seems simple. It contains two &lt;code&gt;char&lt;/code&gt; fields (1 byte each) and one &lt;code&gt;int&lt;/code&gt; field (4 bytes).&lt;br&gt;
Adding the sizes manually gives 6 bytes.&lt;/p&gt;

&lt;p&gt;Yet &lt;code&gt;size of struct example&lt;/code&gt; evaluates to 12 bytes.&lt;/p&gt;

&lt;p&gt;This is not a mistake. To understand why, we need to look at how the compiler maps this structure onto actual memory addresses.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Byte-level layout&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The compiler places structure members into a contiguous block of memory, assigning each field a fixed offset from the start of the structure:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Offset&lt;/th&gt;
&lt;th&gt;Address&lt;/th&gt;
&lt;th&gt;Content&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;+0&lt;/td&gt;
&lt;td&gt;0x00001001&lt;/td&gt;
&lt;td&gt;char a&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;+1&lt;/td&gt;
&lt;td&gt;0x00001002&lt;/td&gt;
&lt;td&gt;Padding&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;+2&lt;/td&gt;
&lt;td&gt;0x00001003&lt;/td&gt;
&lt;td&gt;Padding&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;+3&lt;/td&gt;
&lt;td&gt;0x00001004&lt;/td&gt;
&lt;td&gt;Padding&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;+4&lt;/td&gt;
&lt;td&gt;0x00001005&lt;/td&gt;
&lt;td&gt;int b (LSB)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;+5&lt;/td&gt;
&lt;td&gt;0x00001006&lt;/td&gt;
&lt;td&gt;int b&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;+6&lt;/td&gt;
&lt;td&gt;0x00001007&lt;/td&gt;
&lt;td&gt;int b&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;+7&lt;/td&gt;
&lt;td&gt;0x00001008&lt;/td&gt;
&lt;td&gt;int b (MSB)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;+8&lt;/td&gt;
&lt;td&gt;0x00001009&lt;/td&gt;
&lt;td&gt;char c&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;+9&lt;/td&gt;
&lt;td&gt;0x0000100A&lt;/td&gt;
&lt;td&gt;Padding&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;+10&lt;/td&gt;
&lt;td&gt;0x0000100B&lt;/td&gt;
&lt;td&gt;Padding&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;+11&lt;/td&gt;
&lt;td&gt;0x0000100C&lt;/td&gt;
&lt;td&gt;Padding&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;Note: Byte order shown assumes a little-endian system. Padding behavior is independent of endianness.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;At this point, we can see padding but we still haven’t explained why it exists.&lt;/p&gt;
&lt;h2&gt;
  
  
  The Padding Myth
&lt;/h2&gt;

&lt;p&gt;To understand why padding exists at all, we need to briefly leave the C language behind and look at how CPUs actually fetch data from memory.&lt;/p&gt;

&lt;p&gt;Let’s trigger the confusion deliberately.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;Test&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kt"&gt;int&lt;/span&gt;  &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="n"&gt;printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"%zu&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;sizeof&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;Test&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Many beginners expect this to print 5 (1 byte + 4 bytes).&lt;br&gt;
On most systems, it prints 8.&lt;/p&gt;

&lt;p&gt;At this point, you’ll often hear:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The compiler is wasting memory by inserting padding.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That conclusion is wrong.&lt;br&gt;
What actually happened is structure padding. The compiler inserted extra bytes between members and possibly at the end to satisfy the hardware alignment rules. It has nothing to do with the C language.&lt;/p&gt;

&lt;p&gt;Visually, the memory layout looks like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvg51u1qxnp3sl05i2a1n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvg51u1qxnp3sl05i2a1n.png" alt="Memory layout of Struct with padding in banked memory" width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This diagram illustrates &lt;strong&gt;how a C structure is laid out in memory&lt;/strong&gt; on a system with a &lt;strong&gt;32-bit data bus&lt;/strong&gt; and alignment requirements.&lt;/p&gt;

&lt;p&gt;This diagram uses a 32-bit data bus visualization to illustrate alignment constraints imposed by the hardware.&lt;/p&gt;

&lt;p&gt;At the top-left, the &lt;code&gt;struct example&lt;/code&gt; definition is shown:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;char a&lt;/code&gt; → 1 byte&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;int b&lt;/code&gt; → 4 bytes&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;char c&lt;/code&gt; → 1 byte&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Naively, this looks like &lt;strong&gt;6 bytes of data&lt;/strong&gt;.&lt;br&gt;
However, the memory layout above shows why the actual size becomes &lt;strong&gt;12 bytes&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Byte-level memory layout&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The table above represents memory &lt;strong&gt;byte by byte&lt;/strong&gt;, grouped into &lt;strong&gt;32-bit word banks&lt;/strong&gt; (BANK 0 to BANK 3), each corresponding to one byte lane of the data bus:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;BANK 0&lt;/strong&gt; → D7–D0&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;BANK 1&lt;/strong&gt; → D15–D8&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;BANK 2&lt;/strong&gt; → D23–D16&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;BANK 3&lt;/strong&gt; → D31–D24&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each row represents a &lt;strong&gt;4-byte aligned word&lt;/strong&gt; in memory.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Placement of &lt;code&gt;char a&lt;/code&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;char a&lt;/code&gt; occupies &lt;strong&gt;only one byte&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;It is placed in &lt;strong&gt;BANK 0&lt;/strong&gt; at offset &lt;code&gt;+0&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;The remaining three byte lanes in that word are unused for data&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These unused lanes are shown as &lt;strong&gt;padding bytes&lt;/strong&gt;.&lt;br&gt;
They exist so that the next field can start at a properly aligned address.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Alignment of &lt;code&gt;int b&lt;/code&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;int b&lt;/code&gt; requires a &lt;strong&gt;4-byte aligned address&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;The compiler therefore starts &lt;code&gt;int b&lt;/code&gt; at offset &lt;code&gt;+4&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;All four byte lanes (BANK 0–BANK 3) are used to store the integer&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This allows the CPU to fetch &lt;code&gt;int b&lt;/code&gt; in &lt;strong&gt;one aligned 32-bit memory access&lt;/strong&gt;.&lt;/p&gt;
&lt;h3&gt;
  
  
  Placement of &lt;code&gt;char c&lt;/code&gt; and tail padding
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;char c&lt;/code&gt; occupies one byte at offset &lt;code&gt;+8&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;The remaining three bytes in that word are again unused&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These final padding bytes are &lt;strong&gt;tail padding&lt;/strong&gt;.&lt;br&gt;
They ensure that the &lt;strong&gt;total structure size is a multiple of 4&lt;/strong&gt;, so that arrays of this structure remain correctly aligned.&lt;/p&gt;
&lt;h3&gt;
  
  
  Final result
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Actual data: &lt;strong&gt;6 bytes&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Padding inserted: &lt;strong&gt;6 bytes&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Total structure size: &lt;strong&gt;12 bytes&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The key takeaway illustrated by this diagram is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Padding is not wasted memory — it is the cost of alignment, paid to allow efficient and safe access on real hardware.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Important note&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;This diagram intentionally shows &lt;strong&gt;banked memory and a 32-bit data bus&lt;/strong&gt; to emphasize that structure padding is driven by &lt;strong&gt;hardware access rules&lt;/strong&gt;. Different architectures may implement alignment differently, but the principle of alignment remains the same.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2&gt;
  
  
  Why Structure Padding Is Necessary
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5stjdqdmm7nw3oh9h4bf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5stjdqdmm7nw3oh9h4bf.png" alt="with padding and without padding memory representation" width="512" height="243"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To understand why padding is necessary, we need to bridge the gap between how we write code (byte by byte) and how the hardware runs it (word by word).&lt;/p&gt;

&lt;p&gt;A 32-bit CPU does not access memory one byte at a time; that would be inefficient. Instead, it fetches data in 4-byte chunks known as words. The CPU runs fastest when the data it needs starts exactly at the beginning of a word boundary.&lt;/p&gt;

&lt;p&gt;Let's visualize our structure (char a, char b, int c) and see how the CPU handles it with and without padding.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. The Slow Way: Without Padding (Unaligned)&lt;/strong&gt;&lt;br&gt;
Look at the left side of the diagram. If the compiler packed data tightly, here is what happens when the CPU needs to read the 4-byte integer &lt;code&gt;int c&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Problem:&lt;/strong&gt; Because the two char fields take up the first two bytes, &lt;code&gt;int c&lt;/code&gt; starts halfway through the first 4-byte word and ends halfway through the second word. It is split across two words.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Cost:&lt;/strong&gt; To get that single integer &lt;code&gt;int c&lt;/code&gt;, the CPU must perform two memory cycles. It has to fetch Word 1 to get the first half of the integer, then fetch Word 2 to get the second half, and finally stitch them together. This is slow and inefficient.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. The Fast Way: With Padding (Aligned)&lt;/strong&gt;&lt;br&gt;
Now look at the right side of the diagram. This is what the compiler actually does to help the CPU. The compiler inserts padding two empty unused bytes after the char fields.&lt;/p&gt;

&lt;p&gt;The Benefit: This forces &lt;code&gt;int c&lt;/code&gt; to start exactly at the beginning of Word 2. When the CPU needs that integer, it can retrieve the entire 4-byte value in a single memory cycle (indicated by the single green arrow).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwsjqukg51s40dm2q1c3s.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwsjqukg51s40dm2q1c3s.png" alt="why padding matters" width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As the bottom of the image summarizes, structure padding is a deliberate trade-off. The compiler sacrifices a small amount of memory space (the padding bytes) to gain a significant boost in execution speed by ensuring data is aligned for single-cycle CPU access.&lt;/p&gt;
&lt;h2&gt;
  
  
  Packed Structures: When to Use Them (and When Not To)
&lt;/h2&gt;

&lt;p&gt;After understanding structure padding, a natural question arises:&lt;br&gt;
If padding costs memory, why not just remove it?&lt;br&gt;
C allows you to do exactly that using packed structures, commonly through #pragma pack or compiler-specific attributes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Packing through #Pragma pack&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="cp"&gt;#pragma pack(1)
&lt;/span&gt;&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;example&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;     &lt;span class="c1"&gt;// 1 byte&lt;/span&gt;
    &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;      &lt;span class="c1"&gt;// 4 bytes&lt;/span&gt;
    &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;     &lt;span class="c1"&gt;// 1 byte&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="cp"&gt;#pragma pack()
&lt;/span&gt;&lt;span class="c1"&gt;// total size = 6 bytes, no padding&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Packing through compiler-specific attributes&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;PackedExample&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="n"&gt;__attribute__&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;packed&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This tells the compiler to ignore natural alignment rules and place structure members back-to-back with no padding.&lt;/p&gt;

&lt;p&gt;At first glance, this looks like an optimization. In practice, it’s a trade-off, and often dangerous.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What packing actually does&lt;/strong&gt;&lt;br&gt;
Packing affects only memory layout, not CPU behavior.&lt;br&gt;
When you pack a structure:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Padding bytes are removed&lt;/li&gt;
&lt;li&gt;Multi-byte fields may become misaligned&lt;/li&gt;
&lt;li&gt;size of &lt;code&gt;struct&lt;/code&gt; becomes smaller&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;What does not change:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How the CPU fetches memory&lt;/li&gt;
&lt;li&gt;Alignment requirements of the architecture&lt;/li&gt;
&lt;li&gt;Cost of misaligned access&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The CPU still expects aligned data.&lt;br&gt;
Packing simply removes the compiler’s safety net.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reducing Padding by Reordering Members&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Consider this reordered structure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;optimized&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;int&lt;/span&gt;  &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;   &lt;span class="c1"&gt;// 4 bytes&lt;/span&gt;
    &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;   &lt;span class="c1"&gt;// 1 byte&lt;/span&gt;
    &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;   &lt;span class="c1"&gt;// 1 byte&lt;/span&gt;
    &lt;span class="c1"&gt;// 2 bytes of padding at the end&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="c1"&gt;// total size = 8 bytes&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This structure contains the same data as the original version, but the total size is reduced from 12 bytes to 8 bytes without using packed attributes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why this layout is better&lt;/strong&gt;&lt;br&gt;
The key change is member ordering.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The int field, which requires 4-byte alignment, is placed first&lt;/li&gt;
&lt;li&gt;Smaller &lt;code&gt;char&lt;/code&gt; fields followed after the large &lt;code&gt;int&lt;/code&gt; field.&lt;/li&gt;
&lt;li&gt;Padding is pushed to the end of the structure, not between members&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This layout allows the compiler to satisfy alignment rules with minimal padding.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why this is better than packing&lt;/strong&gt;&lt;br&gt;
Compared to a packed structure:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;All fields remain naturally aligned&lt;/li&gt;
&lt;li&gt;The CPU can access &lt;code&gt;int b&lt;/code&gt; in one aligned memory read&lt;/li&gt;
&lt;li&gt;No risk of misaligned access faults&lt;/li&gt;
&lt;li&gt;Performance and portability are preserved&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is the compiler-friendly way to reduce padding.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The general rule&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Order structure members from largest alignment requirement to smallest.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This simple rule often eliminates most padding automatically.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Trade-Off: Memory vs Performance
&lt;/h2&gt;

&lt;p&gt;In engineering, there is rarely a perfect solution, only trade-offs.&lt;br&gt;
Structure padding exists because software has to choose between two competing goals:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Minimize memory usage&lt;/li&gt;
&lt;li&gt;Maximize execution speed and safety&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You rarely get both at the same time.&lt;br&gt;
Padding is the compiler’s way of deliberately choosing performance and correctness over absolute memory compactness.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What happens when you minimize memory&lt;/strong&gt;&lt;br&gt;
When fields are packed tightly with no padding:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Structures are smaller&lt;/li&gt;
&lt;li&gt;Cache and RAM usage is reduced&lt;/li&gt;
&lt;li&gt;Memory footprints look efficient on paper&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But the cost is hidden:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Multi-byte fields may become misaligned&lt;/li&gt;
&lt;li&gt;The CPU may need multiple memory reads for a single variable&lt;/li&gt;
&lt;li&gt;Extra instructions are required to assemble the value&lt;/li&gt;
&lt;li&gt;On some architectures, misaligned access can trap or crash
In other words, you save bytes but pay in cycles.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;What happens when you allow padding&lt;/strong&gt;&lt;br&gt;
When padding is introduced:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Structures become slightly larger&lt;/li&gt;
&lt;li&gt;Some memory appears “unused”&lt;/li&gt;
&lt;li&gt;size of &lt;code&gt;struct&lt;/code&gt; increases&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But the benefits are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Data is naturally aligned&lt;/li&gt;
&lt;li&gt;The CPU fetches values in one memory access&lt;/li&gt;
&lt;li&gt;Code executes faster and more predictably&lt;/li&gt;
&lt;li&gt;Hardware behavior becomes simpler and safer&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You spend a few bytes to save CPU time.&lt;/p&gt;

</description>
      <category>programming</category>
      <category>lowcode</category>
      <category>computerscience</category>
      <category>discuss</category>
    </item>
  </channel>
</rss>
