<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Arunabh Gupta</title>
    <description>The latest articles on Forem by Arunabh Gupta (@arundevs).</description>
    <link>https://forem.com/arundevs</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3584973%2F4a1061d9-c4a2-406a-abe9-7d9cc8b0873b.png</url>
      <title>Forem: Arunabh Gupta</title>
      <link>https://forem.com/arundevs</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/arundevs"/>
    <language>en</language>
    <item>
      <title>DP Isn’t Just About Big-O: How Cache Misses Killed My Knapsack</title>
      <dc:creator>Arunabh Gupta</dc:creator>
      <pubDate>Tue, 06 Jan 2026 23:07:32 +0000</pubDate>
      <link>https://forem.com/arundevs/dp-isnt-just-about-big-o-how-cache-misses-killed-my-knapsack-3751</link>
      <guid>https://forem.com/arundevs/dp-isnt-just-about-big-o-how-cache-misses-killed-my-knapsack-3751</guid>
      <description>&lt;p&gt;“I had a correct O(n·x) DP solution. Constraints were within limits. Still… TLE.”&lt;/p&gt;

&lt;p&gt;I'm sure you must have said or heard of this phrase a lot of times before. Same happened with me as well. I was trying to solve the &lt;a href="https://cses.fi/problemset/task/1158" rel="noopener noreferrer"&gt;Book Shop Problem&lt;/a&gt; on CSES. It's a knapsack problem. I had my logic sorted out, but when I submitted the code, I was getting TLE for some reason. I checked multiple times, tried sorting and other techniques to somehow get better results but nothing worked. Also memory limit is not an issue since enough memory is given for this problem. So what the hell happened here ???&lt;/p&gt;

&lt;p&gt;I would strongly suggest to try this problem first before reading ahead.&lt;/p&gt;

&lt;h2&gt;
  
  
  The DP that should have passed
&lt;/h2&gt;

&lt;p&gt;Let's first discuss an approach for this problem. If you can observe here, you basically have two options at any index of the prices array. You can either take it or skip that book. &lt;/p&gt;

&lt;p&gt;So, if we define a 2D dp array for this problem, any state dp[i][k] ( where i is index and k is budget ) can be defined as maximum value using values from i .... n-1 with a budget k. &lt;/p&gt;

&lt;p&gt;Therefore the dp state would become&lt;br&gt;
&lt;/p&gt;

&lt;p&gt;&lt;code&gt;dp[i][k] = max(dp[i+1][k-h[i]]+s[i], dp[i+1][k])&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;p&gt;Algorithmically, nothing seems to be wrong.&lt;/p&gt;

&lt;p&gt;( I would suggest to watch some knapsack problem tutorials if you have difficulty understanding this. You can refer to &lt;a href="https://www.youtube.com/watch?v=YTHThUwb-xE&amp;amp;t=902s" rel="noopener noreferrer"&gt;this&lt;/a&gt; video for a better explanation. Although his approach is a bit different. He is thinking in prefix array terms whereas I my approach is derived from suffix array point of view. )&lt;/p&gt;

&lt;h2&gt;
  
  
  The mistake I didn’t know was a mistake
&lt;/h2&gt;

&lt;p&gt;My initial code was something like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;dp&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1002&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;100002&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;solve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;h&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;=&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;
        &lt;span class="n"&gt;dp&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;=&lt;/span&gt;&lt;span class="n"&gt;h&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;=&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;=&lt;/span&gt;&lt;span class="n"&gt;h&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]){&lt;/span&gt;
                &lt;span class="n"&gt;dp&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dp&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;h&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;dp&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="n"&gt;dp&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dp&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;dp&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This looks fine, right?&lt;br&gt;
Let me break this down a little bit. I am looping through all the k from 0 -&amp;gt; x and for every k, I am looping through the array, updating the dp array using the same logic we discussed before.  &lt;/p&gt;

&lt;p&gt;This looping seems fine but still it was slow and I was getting a TLE. &lt;/p&gt;
&lt;h2&gt;
  
  
  The missing concept: memory access
&lt;/h2&gt;

&lt;p&gt;When we write code, it’s easy to think that accessing an array is a constant-time operation.&lt;br&gt;
In theory, it is.&lt;/p&gt;

&lt;p&gt;In practice, where the data lives matters more than the operation itself.&lt;/p&gt;

&lt;p&gt;Modern CPUs are extremely fast. So fast, in fact, that reading from main memory (RAM) would slow them down constantly. To avoid this, CPUs use caches — small, very fast memory layers placed between the CPU and RAM.&lt;/p&gt;

&lt;p&gt;Roughly speaking, access times look like this:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Location&lt;/th&gt;
&lt;th&gt;Approximate cost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;CPU register&lt;/td&gt;
&lt;td&gt;1 cycle&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;L1 cache&lt;/td&gt;
&lt;td&gt;~4 cycles&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;L2 cache&lt;/td&gt;
&lt;td&gt;~12 cycles&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;L3 cache&lt;/td&gt;
&lt;td&gt;~40 cycles&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;RAM&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;200–300 cycles&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This means that a single memory access from RAM can cost as much as hundreds of CPU instructions.&lt;/p&gt;
&lt;h3&gt;
  
  
  Cache lines: why memory is fetched in chunks
&lt;/h3&gt;

&lt;p&gt;Another important detail:&lt;br&gt;
The CPU never fetches a single integer from memory.&lt;/p&gt;

&lt;p&gt;Instead, it fetches a cache line, typically 64 bytes at a time.&lt;/p&gt;

&lt;p&gt;For an int array, that’s:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;64 bytes = 16 integers
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So when you access:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;dp[i][k]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;the CPU actually loads:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;dp[i][k], dp[i][k+1], dp[i][k+2], ..., dp[i][k+15]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Whether you use them or not.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why sequential access is fast
&lt;/h3&gt;

&lt;p&gt;If your code accesses memory sequentially, like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;dp[i][0], dp[i][1], dp[i][2], dp[i][3], ...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;then:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;One cache line is loaded&lt;/li&gt;
&lt;li&gt;All 16 integers are used&lt;/li&gt;
&lt;li&gt;The next access is already in cache&lt;/li&gt;
&lt;li&gt;The CPU runs at full speed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is why simple loops over arrays are incredibly fast.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why skipping memory is slow
&lt;/h3&gt;

&lt;p&gt;Now consider this access pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;dp[n-2][k]
dp[n-3][k]
dp[n-4][k]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Because C++ stores 2D arrays in row-major order, each of these accesses is far apart in memory—often hundreds of kilobytes.&lt;/p&gt;

&lt;p&gt;What happens in this case is&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A cache line is loaded&lt;/li&gt;
&lt;li&gt;Only one integer from it is used&lt;/li&gt;
&lt;li&gt;The remaining 15 integers are wasted&lt;/li&gt;
&lt;li&gt;The CPU has to fetch another cache line from RAM&lt;/li&gt;
&lt;li&gt;The CPU stalls for hundreds of cycles&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In other words, I was paying the cost of memory access 16 times more often than necessary.&lt;/p&gt;

&lt;p&gt;At small input sizes, this doesn’t matter.&lt;br&gt;
At 100 million DP transitions, it absolutely does.&lt;/p&gt;

&lt;p&gt;You can also use the analogy of a book to get a better feel for this. Imagine reading word 1 from page 1, word 1 from page 2, word 1 from page 3 and so on. It's very inefficient right. Instead you'll read the entire page 1, then page 2, then page 3 and so on. &lt;/p&gt;
&lt;h2&gt;
  
  
  Why “skipping memory” is disastrous
&lt;/h2&gt;

&lt;p&gt;At this point, it's clear that memory is fetched in cache lines, not as individual elements. So the real problem appears when &lt;strong&gt;we don't completely use what we have fetched&lt;/strong&gt;. &lt;/p&gt;

&lt;p&gt;So we know that a cache line holds 16 integers. If my code access the memory sequentially, then all 16 integers will be used. If my code is jumping around the memory, then only 1 integer will be used and the rest 15 integers are wasted. &lt;/p&gt;

&lt;p&gt;This minor difference causes a theoretically correct solution to give TLE. &lt;/p&gt;

&lt;p&gt;Let's understand this in a bit more detail.&lt;/p&gt;
&lt;h3&gt;
  
  
  One cache miss vs many cache misses
&lt;/h3&gt;

&lt;p&gt;Consider two access patterns over an array of integers.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Pattern 1: Sequential access
a[0], a[1], a[2], a[3], ...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What happens:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;One cache line is loaded&lt;/li&gt;
&lt;li&gt;16 integers become available&lt;/li&gt;
&lt;li&gt;The CPU uses all of them&lt;/li&gt;
&lt;li&gt;Cache miss every 16 accesses
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Pattern 2: Skipping access
a[0], a[100000], a[200000], ...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What happens:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A cache line is loaded&lt;/li&gt;
&lt;li&gt;Only one integer is used&lt;/li&gt;
&lt;li&gt;The rest are wasted&lt;/li&gt;
&lt;li&gt;Cache miss every single access&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Both patterns perform the same number of array reads.&lt;br&gt;
But one causes 16× more cache misses. On a large scale this can cause some serious optimisation issues. &lt;/p&gt;
&lt;h2&gt;
  
  
  The fix (almost anticlimactic)
&lt;/h2&gt;

&lt;p&gt;We just need to flip the loops inside out. This way we will fix the index and loop for all k from 0 -&amp;gt; x. This way we are looping in a row-first manner, avoiding cache misses on every single access. This is much more efficient than the first version.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;=&lt;/span&gt;&lt;span class="n"&gt;h&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]){&lt;/span&gt;
                &lt;span class="n"&gt;dp&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dp&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;h&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;dp&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="n"&gt;dp&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dp&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  How to identify this mistake in other problems
&lt;/h2&gt;

&lt;p&gt;By now, you might be thinking:&lt;br&gt;
 “How do I even know when to flip loops?”&lt;br&gt;
 “How do I decide which loop goes outside and which goes inside?”&lt;/p&gt;

&lt;p&gt;The answer lies in DP state dependency — not in optimization tricks.&lt;/p&gt;

&lt;p&gt;Let’s revisit the DP transition we used.&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 1: Identify the dependency in the DP formula
&lt;/h3&gt;

&lt;p&gt;From the knapsack recurrence, we need:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="n"&gt;dp&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;dp&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;h&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So we can summarize the dependency as:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="n"&gt;dp&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;  &lt;span class="err"&gt;←&lt;/span&gt;  &lt;span class="n"&gt;dp&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the only dependency that matters.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Translate dependency into an ordering constraint
&lt;/h3&gt;

&lt;p&gt;In simple terms, this means:&lt;br&gt;
To compute values for row i, the entire row i+1 must already be computed.&lt;br&gt;
Now ask yourself a very direct question:&lt;br&gt;
"How do I compute the table so that row i+1 is already available when I compute row i?"&lt;br&gt;
There is only one correct answer:&lt;br&gt;
i must go from n-1 down to 0&lt;/p&gt;

&lt;p&gt;Which immediately gives us:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This loop order is not an optimization — it is required by correctness.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: What about the inner loop?
&lt;/h3&gt;

&lt;p&gt;Once we are inside a fixed row i, we are computing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="n"&gt;dp&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;dp&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;dp&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="p"&gt;...,&lt;/span&gt; &lt;span class="n"&gt;dp&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now ask the next crucial question:&lt;br&gt;
Does dp[i][k] depend on dp[i][k-1]?&lt;br&gt;
The answer is no.&lt;br&gt;
It depends only on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;dp[i+1][k]&lt;/li&gt;
&lt;li&gt;dp[i+1][k - h[i]]&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Which means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;All k values inside the same row are independent&lt;/li&gt;
&lt;li&gt;They can be computed in any order&lt;/li&gt;
&lt;li&gt;As long as row i+1 already exists&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Step 4: The natural loop structure
&lt;/h3&gt;

&lt;p&gt;Putting this together, the correct loop order becomes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;   &lt;span class="c1"&gt;// dependency loop&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;  &lt;span class="c1"&gt;// free loop&lt;/span&gt;
        &lt;span class="n"&gt;compute&lt;/span&gt; &lt;span class="n"&gt;dp&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This loop order matches the DP logic itself, not just performance concerns.&lt;/p&gt;

&lt;p&gt;A crucial mental rule to avoid this confusion&lt;br&gt;
&lt;strong&gt;The dimension that appears in the DP dependency must be the outer loop.&lt;/strong&gt;&lt;br&gt;
Look at the formula again:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="n"&gt;dp&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dp&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;][...])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;i appears on the right-hand side&lt;/li&gt;
&lt;li&gt;k does not&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So we can say:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;i controls the correctness order&lt;/li&gt;
&lt;li&gt;k is a secondary, free dimension&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Which gives us the rule:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;for (i ...)        // dependency dimension
    for (k ...)    // free dimension
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once you start thinking in these terms, “flipping loops” stops being a trick and becomes a natural consequence of the DP definition.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;At the end I came to a realization that big O notation only gives you an idea of how many operations are performed. It says nothing about how fast memory is accessed in these operations. &lt;br&gt;
Once n × x gets large enough, cache behavior matters more than anything else.&lt;/p&gt;

</description>
      <category>dsa</category>
      <category>dp</category>
      <category>memory</category>
    </item>
    <item>
      <title>Goroutines, OS Threads, and the Go Scheduler — A Deep Dive That Actually Makes Sense</title>
      <dc:creator>Arunabh Gupta</dc:creator>
      <pubDate>Sun, 30 Nov 2025 22:31:20 +0000</pubDate>
      <link>https://forem.com/arundevs/goroutines-os-threads-and-the-go-scheduler-a-deep-dive-that-actually-makes-sense-1f9f</link>
      <guid>https://forem.com/arundevs/goroutines-os-threads-and-the-go-scheduler-a-deep-dive-that-actually-makes-sense-1f9f</guid>
      <description>&lt;p&gt;If you’ve ever tried learning go routines, you’ve probably came across the line “They’re lightweight threads”. But then the questions start coming in:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;“Are they real threads ?”
&lt;/li&gt;
&lt;li&gt;“How can go run millions of them ?”
&lt;/li&gt;
&lt;li&gt;“What is this GMP thingy ?"&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I had the same questions and a lot of confusion when I started learning Go. Goroutines seems magical - almost too good to be true. The more research I did, things started to make some sense.&lt;br&gt;&lt;br&gt;
So I went deeper down the rabbit hole to understand more about how goroutines work along with OS threads. This article is a journey and my attempt to explain how everything works. &lt;/p&gt;

&lt;h1&gt;
  
  
  Introduction
&lt;/h1&gt;

&lt;p&gt;Let’s first get out basics clear. We need to know about “Processes”, “Threads”, “Context-Switching” and some other basic concepts like stack size ( although they are not that important for the context of this article ). &lt;/p&gt;

&lt;h2&gt;
  
  
  Process
&lt;/h2&gt;

&lt;p&gt;A &lt;strong&gt;process&lt;/strong&gt; is an independent execution environment created by the operating system.&lt;br&gt;&lt;br&gt;
 It consists of:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;A private virtual address space (VAS)&lt;/strong&gt; mapped by the OS and MMU ( Memory Management Unit )&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Executable code&lt;/strong&gt; (the program’s text segment)
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A heap&lt;/strong&gt; for dynamic memory allocation
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;One or more stacks&lt;/strong&gt;, one per thread
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;File descriptors&lt;/strong&gt; referencing kernel-managed resources (files, sockets)
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Environment variables&lt;/strong&gt; inherited from its parent
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Process control metadata&lt;/strong&gt;, including PID, scheduling priority, credentials, resource limits, and runtime statistics&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Processes provide &lt;strong&gt;isolation&lt;/strong&gt;: each process runs with its own memory mappings and cannot directly access the memory of other processes.&lt;br&gt;&lt;br&gt;
 Context switching between processes requires switching address spaces, memory mappings, and other resource structures, which makes process switches relatively expensive.&lt;/p&gt;

&lt;p&gt;In simple terms, A process is just a running program with it’s own isolated memory and system resources. Two processes will never interfere with each other’s execution and would stay isolated. &lt;/p&gt;

&lt;h2&gt;
  
  
  Threads
&lt;/h2&gt;

&lt;p&gt;Now that we understand what a process is, the next piece is understanding &lt;strong&gt;threads&lt;/strong&gt;, because goroutines are build on this concept. &lt;/p&gt;

&lt;p&gt;A single process may contain &lt;strong&gt;one or many threads&lt;/strong&gt;, all sharing the same:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;virtual address space
&lt;/li&gt;
&lt;li&gt;heap
&lt;/li&gt;
&lt;li&gt;global variables
&lt;/li&gt;
&lt;li&gt;open file descriptors
&lt;/li&gt;
&lt;li&gt;code and libraries&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;However, each thread has:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;its own stack&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;its own program counter (PC)&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;its own CPU register set&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;its own thread-local storage (TLS)&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;its own kernel scheduling metadata&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Threads within the same process run &lt;strong&gt;independently&lt;/strong&gt; and may execute concurrently on different CPU cores ( through context switching. We’ll come to this in next section ).&lt;/p&gt;

&lt;p&gt;Because threads share memory, they can communicate cheaply — but must also use synchronization mechanisms (mutexes, semaphores) to avoid race conditions.&lt;/p&gt;

&lt;p&gt;Race condition is a situation where 2 or more threads try to access the same shared data at the same time and atleast one of them is trying to update/modify it. The final outcome of the data depends on timing of those threads execution. Since the execution timings are unpredictable, results become random and inconsistent everytime we run the program making it difficult to debug the code. That’s why mutexes and semaphores are important to avoid race conditions since they lock the shared data once a thread has access to it. This way other threads don’t have access to that data till the execution of the first thread is finished and the lock is released.  &lt;/p&gt;

&lt;p&gt;In summary, threads are the kernel’s basic unit of CPU execution inside a process. Or in more simpler words , these are most basic sequence of instructions inside a process which run on cpu core 1 at a time. If you have a Octa core process ( means 8 cores ) then at a time, 8 threads can run simultaneously achieving true parallelism. &lt;/p&gt;

&lt;p&gt;Even though switching between threads is cheaper than switching between processes, it’s still not free. Thread context switches involve several steps:&lt;/p&gt;

&lt;p&gt;1. Saving/loading CPU registers: Each thread has its own execution state — registers like the program counter, stack pointer, general-purpose registers, etc.&lt;br&gt;&lt;br&gt;
 When switching threads, the OS must &lt;strong&gt;save the registers of the outgoing thread&lt;/strong&gt; and &lt;strong&gt;restore the registers of the incoming thread&lt;/strong&gt;, which takes time.&lt;/p&gt;

&lt;p&gt;2. Switching stacks: Every thread has its &lt;strong&gt;own stack&lt;/strong&gt;.&lt;br&gt;&lt;br&gt;
 A context switch requires switching the &lt;strong&gt;stack pointer&lt;/strong&gt; from one thread’s stack to another’s.&lt;br&gt;&lt;br&gt;
 This means the CPU must now begin reading/writing function frames from a completely different memory region. These stack sizes can range from 1Mb to 8 Mbs due to which it takes time in context switching. &lt;/p&gt;

&lt;p&gt;3. Possible TLB and cache effects: &lt;strong&gt;TLB = Translation Lookaside Buffer&lt;/strong&gt;, a small high-speed cache that stores recently used virtual-to-physical memory address translations.&lt;br&gt;&lt;br&gt;
 Switching threads can cause &lt;strong&gt;TLB misses&lt;/strong&gt; and &lt;strong&gt;cache invalidations&lt;/strong&gt;, which forces the CPU to reload memory mappings or fetch data from slower memory levels, reducing performance.&lt;/p&gt;

&lt;p&gt;4. Involvement of the OS scheduler Thread switching requires a tap into the operating system (kernel mode).&lt;br&gt;&lt;br&gt;
 The OS scheduler must:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;decide which thread runs next
&lt;/li&gt;
&lt;li&gt;update scheduling metadata
&lt;/li&gt;
&lt;li&gt;manage states like runnable, waiting, or blocked
This kernel-mode transition adds a lot of overhead.&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Context Switching
&lt;/h1&gt;

&lt;p&gt;A &lt;strong&gt;context switch&lt;/strong&gt; is the act of the operating system pausing one running thread or process and resuming another.&lt;br&gt;&lt;br&gt;
Because the CPU can run only &lt;strong&gt;one thread per core&lt;/strong&gt; at a time, the OS must rapidly switch between multiple threads to provide concurrency.&lt;/p&gt;

&lt;p&gt;When the OS switches from Thread A → Thread B, it must:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Save the CPU registers&lt;/strong&gt; (program counter, stack pointer, general-purpose registers, flags) of Thread A
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Save Thread A’s kernel metadata&lt;/strong&gt; (scheduling state, priority, CPU usage stats)
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Load Thread B’s saved register state&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Switch to Thread B’s stack&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Possibly switch address spaces&lt;/strong&gt; (if switching between processes)
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Update scheduling queues and bookkeeping&lt;/strong&gt; (updating all the small pieces of internal data the OS keeps to track the state of each thread or process)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This entire procedure is performed by the OS scheduler and requires switching from &lt;strong&gt;user mode&lt;/strong&gt; to &lt;strong&gt;kernel mode&lt;/strong&gt;, running scheduling logic, then returning back to user mode.&lt;/p&gt;

&lt;p&gt;A context switch ensures that each runnable thread gets a fair share of CPU time but comes with performance costs due to register saving, stack switching, TLB/cache effects, and kernel involvement. Following are the most common reasons why context switching is expensive:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Saving &amp;amp; loading registers: The CPU must store all registers of the old thread and restore the registers of the new one — its entire execution state.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Switching stacks: Each thread has its own stack. The CPU has to stop using one thread’s stack and start using another’s.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Kernel involvement: Switching threads requires entering the kernel, updating run queues and priorities, then returning to user mode.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;TLB (Translation Lookaside Buffer) effects: Switching between processes requires switching the page table and flushing part of the TLB, slowing down memory access.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Cache disruption: Each thread often works on different memory regions. Switching threads may cause cache misses because the CPU has to load new data from memory.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All this makes context switching &lt;strong&gt;far from free&lt;/strong&gt;, even though modern CPUs and OSes optimize it heavily. This is where goroutines shine !!!&lt;/p&gt;

&lt;h1&gt;
  
  
  What are Goroutines and how are they different from OS threads ?
&lt;/h1&gt;

&lt;p&gt;Concurrency is one of the Go’s biggest strengths, and goroutines are at the center of it. But in order to appreciate why goroutines are special, we need to understand what they are and how are they different from OS threads. &lt;/p&gt;

&lt;h2&gt;
  
  
  What are Goroutines ?
&lt;/h2&gt;

&lt;p&gt;A goroutine is a lightweight function that runs independently and concurrently within a go program. More technical definition would be - a goroutine is a user-space execution unit ( or user space threads ) managed entirely by go runtime. It has -&amp;gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Its own stack, starting with a very small size ( ~2KB) compared to OS threads
&lt;/li&gt;
&lt;li&gt;The ability to grow or shrink its stack size on demand
&lt;/li&gt;
&lt;li&gt;Scheduling performed by the go runtime scheduler, no the OS scheduler
&lt;/li&gt;
&lt;li&gt;Extremely low creation and context-switching cost&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Goroutines are multiplexed on the OS threads using M:N mapping&lt;br&gt;&lt;br&gt;
M goroutines are mapped to N OS threads. This is why goroutines can scale to hundreds of thousands or even millions in a single process. &lt;/p&gt;

&lt;h2&gt;
  
  
  How are Goroutines different from OS Threads ?
&lt;/h2&gt;

&lt;p&gt;Although they both represent unit of execution or a sequence of instructions, goroutines and OS threads differ in almost every important way. &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Scheduling

&lt;ul&gt;
&lt;li&gt;Threads: scheduled by OS kernel
&lt;/li&gt;
&lt;li&gt;Goroutines: schedules by Go runtime ( this is why goroutine switching is way cheaper since it never traps into kernel )
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Stack size

&lt;ul&gt;
&lt;li&gt;OS threads: large, fixed-size stacks ( 1MB-8MB each )
&lt;/li&gt;
&lt;li&gt;Goroutines: tiny, flexible stacks ( start ~2KB, size is flexible. This is one of the main reasons why go can handle such a massive concurrency )
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Creation Cost:

&lt;ul&gt;
&lt;li&gt;Thread creation is expensive since kernel and memory allocation is involved.
&lt;/li&gt;
&lt;li&gt;Goroutine creation is extremely cheap ( just user space allocation )
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Context switching cost

&lt;ul&gt;
&lt;li&gt;Thread switching involves saving registers, switching stacks, kernel mode transitions, scheduler logic, and cache/TLB effects.
&lt;/li&gt;
&lt;li&gt;Goroutine switching is done inside the Go runtime and requires only saving a small amount of state. Since no kernel is involved, creation is very fast.
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Memory &amp;amp; Resource Usage

&lt;ul&gt;
&lt;li&gt;Thread consumes megabytes of memory.
&lt;/li&gt;
&lt;li&gt;Goroutines consume kilobyts. This allows go programs to use thousands of go routines safely.
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Blocking behaviour:

&lt;ul&gt;
&lt;li&gt;A blocking syscall blocks an entire OS thread
&lt;/li&gt;
&lt;li&gt;A blocking I/O operation in Go parks the goroutine, not the OS thread. Runtime efficiently reassigns another goroutine to the freed OS thread. In this way OS threads are never sitting idle giving higher performance.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;A goroutine is a lightweight, user-space thread scheduled by the Go runtime, while an OS thread is a heavyweight execution unit scheduled by the operating system.&lt;br&gt;&lt;br&gt;
 Goroutines are cheaper, faster, and far more scalable than OS threads.&lt;/p&gt;

&lt;h1&gt;
  
  
  What is Go runtime ?
&lt;/h1&gt;

&lt;p&gt;By now we’ve talked about processes, threads, and goroutines — but there’s a crucial piece sitting between goroutines and the operating system: the &lt;strong&gt;Go runtime&lt;/strong&gt;.&lt;br&gt;&lt;br&gt;
This runtime is what actually makes goroutines possible.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;Go runtime&lt;/strong&gt; is a mini operating system that runs &lt;em&gt;inside your Go program&lt;/em&gt;. More formally , it is a user-space runtime system bundled with every Go program.&lt;br&gt;&lt;br&gt;
 It takes care of everything the OS doesn’t handle for you, including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;running and scheduling goroutines ( GMP model )
&lt;/li&gt;
&lt;li&gt;managing memory and garbage collection
&lt;/li&gt;
&lt;li&gt;handling timers
&lt;/li&gt;
&lt;li&gt;dealing with network and system calls
&lt;/li&gt;
&lt;li&gt;growing and shrinking goroutine stacks
&lt;/li&gt;
&lt;li&gt;waking goroutines when events happen&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When you run a Go program, you’re not just running your code — you’re also running this runtime, which works in the background and keeps the whole concurrency system running smoothly.&lt;/p&gt;

&lt;p&gt;The key thing to understand:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Goroutines don’t run on the OS.&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
 &lt;strong&gt;They run on top of the Go runtime, and the runtime runs on OS threads.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is what makes goroutines so efficient.&lt;/p&gt;

&lt;p&gt;You can think of go runtime as a middle layer between goroutines and OS threads. &lt;/p&gt;

&lt;h1&gt;
  
  
  GMP Model, M:N mapping ?
&lt;/h1&gt;

&lt;p&gt;Goroutines don’t run directly on CPU cores, and they aren’t scheduled by the operating system.&lt;br&gt;&lt;br&gt;
Instead, Go uses a custom, high-performance scheduler called the &lt;strong&gt;GMP model&lt;/strong&gt;, which is at the heart of Go’s concurrency design.&lt;/p&gt;

&lt;p&gt;Understanding GMP is crucial because it explains &lt;strong&gt;how thousands of goroutines can be multiplexed onto a small number of OS threads&lt;/strong&gt; efficiently.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is the GMP model ?
&lt;/h2&gt;

&lt;p&gt;G stands for goroutine ( obviously ). In other words, a lightweight, user-space execution unit.&lt;br&gt;&lt;br&gt;
It contains it’s own tiny stack, it’s program counter and rest of the meta data. Note the G cannot run on a CPU on itself. &lt;/p&gt;

&lt;p&gt;M stands for Machine ( OS thread ). Basically, it is a real operating system thread. This is what the OS scheduler actually runs on a OS. 1 M can run only 1 G at a time. Switching between G’s happen inside M itself. If the M has to do a context switch, then all the G’s assigned to that M will be parked for time being. Once that M returns back, all the states of previously assigned G’s are loaded back.  &lt;/p&gt;

&lt;p&gt;P stands for Processor. Note that I am not talking about the actual CPU cores here. Processor represents a logical scheduler token which run a queue of goroutines. Only an M holding P is allowed to execute a Go code. &lt;/p&gt;

&lt;p&gt;P1 → M1 → G1, G2, G3...&lt;/p&gt;

&lt;p&gt;P2 → M2 → G4, G5...&lt;/p&gt;

&lt;p&gt;P3 → M3 → G6...&lt;/p&gt;

&lt;p&gt;...&lt;/p&gt;

&lt;p&gt;If let’s say G1 is blocked due to some I/O ( api call or file read ), then M1 will park that goroutine ( it’s actually the go runtime which parks the G1 by marking it as waiting. Since go code is running on M1, we just say that M1 parks the G1 ) and picks the next available G in P1’s queue. This way OS thread is always busy avoiding wasting CPU time. &lt;/p&gt;

&lt;p&gt;Also note that P’s run queue is managed by go runtime. &lt;/p&gt;

&lt;h2&gt;
  
  
  M:N mapping ?
&lt;/h2&gt;

&lt;p&gt;Let’s say for example you have 100,000 goroutines and 8 cpu cores ( an octa core processor ), and runtime might create around 10-20 OS threads (M’s).&lt;br&gt;&lt;br&gt;
Then these M’s will get scheduled on the 8 cores by the OS. Each M runs one goroutine at a time. Each P has a queue of goroutines waiting to run. When one goroutine is blocked ( e.g. on a channel or syscall ), then M picks another goroutine from P’s run queue. &lt;/p&gt;

&lt;p&gt;So M:N basically means that M number of goroutines and mapped to N number of OS threads. ( Please note that here M represent a NUMBER of goroutines, not OS thread of GMP model. I know… letter convention sucks ).&lt;/p&gt;

&lt;h1&gt;
  
  
  A simple example: How the go runtime schedules goroutines
&lt;/h1&gt;

&lt;p&gt;This example will touch the surface level of scheduling. I won’t be discussing any scheduling algorithms here.&lt;br&gt;&lt;br&gt;
Let’s say we have one OS thread (M), one processor (P), and three goroutines (G1, G2, G3). When these go routines are started, these are placed in the P’s local run queue. &lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Step 1: M begins running G1&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;The OS scheduler picks &lt;strong&gt;M1&lt;/strong&gt; (an OS thread) to run on a CPU core.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;M1 owns &lt;strong&gt;P1&lt;/strong&gt;, which contains the run queue:&lt;br&gt;&lt;br&gt;
&lt;code&gt;[G1, G2, G3]&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;M1 pops &lt;strong&gt;G1&lt;/strong&gt; from the front of the queue and begins executing it.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Step 2: G1 hits a blocking point&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Let’s say G1 makes a channel receive or waits on a network read.&lt;/p&gt;

&lt;p&gt;Because this is &lt;strong&gt;Go-managed blocking&lt;/strong&gt;, the runtime notices that G1 cannot continue.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;The go runtime &lt;strong&gt;parks G1&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;It moves G1 to some appropriate &lt;strong&gt;wait list&lt;/strong&gt; (e.g., waiting on a channel or the network poller)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;G1 is now &lt;em&gt;blocked&lt;/em&gt;, but importantly:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;M1 is NOT blocked. The OS thread stays free.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Step 3: M1 picks the next runnable goroutine&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Since G1 is parked, M1 looks at P1’s run queue.&lt;/p&gt;

&lt;p&gt;Remaining goroutines:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;[G2, G3]&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;M1 selects &lt;strong&gt;G2&lt;/strong&gt; and begins executing it.&lt;/p&gt;

&lt;p&gt;This is a &lt;strong&gt;goroutine context switch&lt;/strong&gt;, done entirely in user space by the runtime (no OS involvement).&lt;/p&gt;

&lt;p&gt;It switches:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;G1’s PC and stack pointer saved&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;G2’s PC and stack pointer restored&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is extremely fast.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4: OS preempts the thread (OS-level context switch)
&lt;/h3&gt;

&lt;p&gt;While G2 is running, the OS timer interrupt fires.&lt;/p&gt;

&lt;p&gt;The OS scheduler says:&lt;/p&gt;

&lt;p&gt;“Time slice over. Let’s run another OS thread.”&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;M1 is paused&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;OS loads another OS thread, say &lt;strong&gt;M7&lt;/strong&gt;, onto the CPU&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;M7 may belong to a totally different program&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is an &lt;strong&gt;OS context switch&lt;/strong&gt; — heavier and more costly.&lt;/p&gt;

&lt;p&gt;Inside the paused M1:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;G2 is still waiting to resume&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;P1 is still attached to M1&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When the OS eventually puts M1 back onto the CPU, Go continues running G2 from exactly where it left off.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Step 5: G1 becomes unblocked&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Suppose the network poller signals that data arrived.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;The runtime marks G1 as &lt;strong&gt;runnable&lt;/strong&gt; again&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;G1 is placed &lt;strong&gt;back into P1’s run queue&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Queue becomes:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;[G3, G1]&lt;/code&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Step 6: M1 finishes G2 and picks the next goroutine&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;When G2 yields or completes, M1 picks the next goroutine in P1’s queue.&lt;/p&gt;

&lt;p&gt;Next is &lt;strong&gt;G3&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;After G3 yields, M1 picks &lt;strong&gt;G1&lt;/strong&gt;, which is runnable again.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Putting it All Together&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Here’s what happened:&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Go runtime context switched between goroutines (G1 → G2 → G3 → G1)&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Done entirely in user space&lt;/li&gt;
&lt;li&gt;Very cheap&lt;/li&gt;
&lt;li&gt;No OS involvement&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  ** OS scheduler context switched M1 off the CPU**
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;OS-level&lt;/li&gt;
&lt;li&gt;Expensive&lt;/li&gt;
&lt;li&gt;Paused whatever goroutine M1 was running&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  ** P’s run queue kept track of which goroutines were runnable**
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Managed by the Go runtime&lt;/li&gt;
&lt;li&gt;M1 always pulled new work from P1&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  ** Blocked goroutines didn’t block the OS thread**
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Thanks to goroutine parking&lt;/li&gt;
&lt;li&gt;OS thread stayed productive, always busy&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Why in CPU bound tasks making more go routines doesn’t make any sense?
&lt;/h1&gt;

&lt;p&gt;Goroutines shine when tasks involve &lt;strong&gt;waiting&lt;/strong&gt; (network I/O, disk I/O, timers, channels, etc.), because they allow other goroutines to run while one is blocked.&lt;/p&gt;

&lt;p&gt;But in &lt;strong&gt;CPU-bound tasks&lt;/strong&gt;—like computing primes, hashing, compression, physics simulation, image processing—goroutines don’t help beyond a certain point.&lt;/p&gt;

&lt;p&gt;Let’s break down the reasons&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;There are only N CPU cores, so only N goroutines can run at the same time. If your machine has &lt;strong&gt;8 CPU cores&lt;/strong&gt;, only &lt;strong&gt;8 threads/G’s can run simultaneously&lt;/strong&gt;—no matter how many goroutines you spawn. Everything else just waits in queues. So if your CPU can do 8 things at a time, spawning 100,000 CPU-bound goroutines won’t make it faster. It will only add overhead.
&lt;/li&gt;
&lt;li&gt;Extra goroutines increase scheduler overhead. Every extra runnable CPU-bound goroutine: must be queued, must be picked up by the scheduler, must eventually run, must involve G→G context switching. When the tasks are CPU-bound, &lt;strong&gt;each goroutine never blocks&lt;/strong&gt;, so the scheduler has fewer opportunities to efficiently switch them. Too many unblocked goroutines = &lt;strong&gt;too many unnecessary context switches&lt;/strong&gt;. This overhead can reduce total throughput.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;You don’t need more than &lt;code&gt;GOMAXPROCS&lt;/code&gt; goroutines for parallel CPU work. If &lt;code&gt;GOMAXPROCS = 8&lt;/code&gt;, spawning 8 CPU-bound goroutines achieves maximum parallelism.&lt;/p&gt;

&lt;p&gt;Anything more:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;doesn’t increase speed&lt;/li&gt;
&lt;li&gt;increases scheduling overhead&lt;/li&gt;
&lt;li&gt;increases memory usage&lt;/li&gt;
&lt;li&gt;causes more context switching&lt;/li&gt;
&lt;li&gt;lowers cache locality&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So ideal number of goroutines for CPU-bound tasks ≈ &lt;strong&gt;number of CPU cores&lt;/strong&gt;.&lt;/p&gt;

&lt;h1&gt;
  
  
  Conclusion
&lt;/h1&gt;

&lt;p&gt;Goroutines aren’t just lightweight threads — they’re part of a carefully designed runtime system that makes Go highly scalable. By understanding processes, threads, context switching, and the GMP model, it becomes clear why Go doesn’t rely on OS threads alone.&lt;/p&gt;

&lt;p&gt;Goroutines work so well because they use tiny, growable stacks, fast user-space scheduling, and an efficient M:N mapping to OS threads. This lets Go run thousands or even millions of concurrent tasks without the cost of creating thousands of OS threads.&lt;/p&gt;

&lt;p&gt;In the end, the message is simple:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;OS threads handle parallelism; goroutines enable massive concurrency.&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
 &lt;strong&gt;Together, they give Go its power, performance, and simplicity.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This brings us to the end of the blog. I’ve tried my best to explain everything I know in simple terms. AI helped a lot in shaping this article since I’m still improving my writing skills.&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Your feedback would mean a lot — it helps me learn and write better.&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>architecture</category>
      <category>computerscience</category>
      <category>go</category>
    </item>
    <item>
      <title>Creating a very basic gRPC server</title>
      <dc:creator>Arunabh Gupta</dc:creator>
      <pubDate>Mon, 03 Nov 2025 19:09:46 +0000</pubDate>
      <link>https://forem.com/arundevs/creating-a-very-basic-grpc-server-29ca</link>
      <guid>https://forem.com/arundevs/creating-a-very-basic-grpc-server-29ca</guid>
      <description>&lt;p&gt;Hi everyone, I started out learning gRPC and created a very basic gRPC server. Although it doesn't do much, it still has a lot of technical stuff going on. So here is a simple breakdown of my attempt at making a gRPC server. I will try to explain everything in simple words and make everything crystal clear. &lt;/p&gt;

&lt;p&gt;First of all, let's address the question "What the hell is gRPC ???"&lt;br&gt;
In simple words gRPC is a opensource framework for &lt;strong&gt;remote procedure calls (RPC)&lt;/strong&gt; developed by google. It allows a client to call a function on a server as if it is a local function making it ideal for distributed systems and microservices. It uses protocol buffers which can be serialized into smaller binary format, making it much more efficient and faster than JSON format.It also used &lt;strong&gt;HTTP/2&lt;/strong&gt; as the transport protocol, enabling faster, persistent connections and features essential for streaming.&lt;/p&gt;

&lt;p&gt;Before we move further, there are some prerequisites that you need to be familiar with. &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;basics of golang &lt;/li&gt;
&lt;li&gt;pointers, structs, struct embedding, interfaces&lt;/li&gt;
&lt;li&gt;context&lt;/li&gt;
&lt;li&gt;stubs&lt;/li&gt;
&lt;li&gt;protocol buffers&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If you don't know these, don't worry; I'll give a small explanation of all these concepts when used.&lt;/p&gt;

&lt;p&gt;Let's first start with our file system. We will create a folder named "grpcdemo". Inside this directory,o pen up your terminal and run the following command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;go mod init grpcdemo
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;It creates a plain text file named "go.mod" in the current directory. This file is the root of your module and contains all the metadata Go needs to manage your project's dependencies.&lt;/p&gt;

&lt;p&gt;Here is my folder structure. If you are following this aritcle then make sure to keep the folder structure consistent.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;└── 📁grpcdemo
    └── 📁client
        ├── client.go
    └── 📁proto
        ├── greet.proto
    └── 📁server
        ├── server.go
    └── go.mod
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;We'll go over each one of them one by one starting with the &lt;code&gt;greet.proto&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight protobuf"&gt;&lt;code&gt;&lt;span class="na"&gt;syntax&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"proto3"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kn"&gt;package&lt;/span&gt; &lt;span class="nn"&gt;pb&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;option&lt;/span&gt; &lt;span class="na"&gt;go_package&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"grpcdemo/pb"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;service&lt;/span&gt; &lt;span class="n"&gt;GreetService&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;rpc&lt;/span&gt; &lt;span class="n"&gt;SayHello&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;HelloRequest&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;returns&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;HelloResponse&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kd"&gt;message&lt;/span&gt; &lt;span class="nc"&gt;HelloRequest&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kd"&gt;message&lt;/span&gt; &lt;span class="nc"&gt;HelloResponse&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="kd"&gt;message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Let's first discuss what a proto file is. A &lt;code&gt;.proto&lt;/code&gt; file is a simple text file that defines the structure of data you want to serialize and transfer, commonly used in conjunction with Protocol Buffers (protobuf, in short). Protobuf is nothing but a data serialization mechanism. It's language agnostic (independent of the language we are writing our code in). It was developed to serialize data more efficiently across various internal services. Normally we use REST APIs which is fine and relatively easier to implement, but protobufs are preffered in more high performant services. &lt;/p&gt;

&lt;p&gt;First line declares the syntax for proto file since there are versions available.&lt;/p&gt;

&lt;p&gt;The second line specifies the &lt;strong&gt;package name&lt;/strong&gt; for the Protocol Buffer definitions contained within the file.&lt;/p&gt;

&lt;p&gt;Third line is a Go-specific instruction for the protoc compiler. Its primary purpose is to define the Go package name and import path for the generated files (greet.pb.go and greet_grpc.pb.go).&lt;/p&gt;

&lt;p&gt;Protoc is a compiler which compiles the &lt;code&gt;.proto&lt;/code&gt; files and generates functional source code from the schema defined in the proto file. It uses plugins for generating the executable code. &lt;/p&gt;

&lt;p&gt;

&lt;/p&gt;
&lt;div class="crayons-card c-embed text-styles text-styles--secondary"&gt;
    &lt;div class="c-embed__content"&gt;
      &lt;div class="c-embed__body"&gt;
        &lt;h2 class="fs-xl lh-tight"&gt;
          &lt;a href="https://protobuf.dev/installation/" rel="noopener noreferrer" class="c-link"&gt;
            Protocol Buffer Compiler Installation | Protocol Buffers Documentation
          &lt;/a&gt;
        &lt;/h2&gt;
          &lt;p class="truncate-at-3"&gt;
            How to install the protocol buffer compiler.
          &lt;/p&gt;
        &lt;div class="color-secondary fs-s flex items-center"&gt;
            &lt;img alt="favicon" class="c-embed__favicon m-0 mr-2 radius-0" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fprotobuf.dev%2Ffavicons%2Ffavicon.ico"&gt;
          protobuf.dev
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;





&lt;p&gt;After installing &lt;strong&gt;protoc&lt;/strong&gt; use the following command to generate executable go files from out greet.proto file. Make sure you are in the root directory.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;protoc --go_out=../ --go-grpc_out=../ proto/greet.proto
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It executes two separate code generation passes, dictated by the two output flags:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt; --go_out=../, &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;plugin: protoc-gen-go,&lt;/p&gt;

&lt;p&gt;description: Generates files like greet.pb.go.,"Contains the Go                structs (e.g., HelloRequest, HelloResponse) for your data messages, along with serialization and deserialization methods." &lt;code&gt;../&lt;/code&gt; is the path we pass relative to which our go files will be generated. In this case files will be generated inside &lt;code&gt;pb&lt;/code&gt; directory.  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;--go-grpc_out=../,&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;plugin: protoc-gen-go-grpc,&lt;/p&gt;

&lt;p&gt;description: Generates files like greet_grpc.pb.go (name may vary).,Contains the Go interfaces (GreetServiceServer) for implementing the service and the client stubs for calling the service.&lt;/p&gt;

&lt;p&gt;After running the command, file system should look something like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;└── 📁grpcdemo
    └── 📁client
        ├── client.go
    └── 📁pb
        ├── greet_grpc.pb.go
        ├── greet.pb.go
    └── 📁proto
        ├── greet.proto
    └── 📁server
        ├── server.go
    ├── go.mod
    └── go.sum
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can see that it generated a new directory pb along with two go files inside. We will use these go files in our server.go and client.go. Now let's discuss about the &lt;code&gt;services&lt;/code&gt; and &lt;code&gt;messages&lt;/code&gt; in our proto file. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;messages&lt;/strong&gt; &amp;amp; &lt;strong&gt;services&lt;/strong&gt; are the most important parts of a .proto file since they defined the schema of the data and function we need to serialize. &lt;/p&gt;

&lt;p&gt;In messages we define the structure of the data/payload we need to pass between server and client. &lt;/p&gt;

&lt;p&gt;Notice the numbers assigned to each field: name = 1; and message = 1;. In Protocol Buffers, these numbers (not the field names) are used to identify the field in the serialized binary data. They are known as field tags. These tags must be unique within the message and must never be changed once your service is in production, as changing them will break backward compatibility with older clients or servers.&lt;/p&gt;

&lt;p&gt;In services we define the rpc function signature. This is the function that can be called by the client, in our case it's SayHello.  &lt;/p&gt;

&lt;p&gt;Now the &lt;code&gt;greet_grpc.pb.go&lt;/code&gt; and &lt;code&gt;greet.pb.go&lt;/code&gt; files need not to be meddled with since they are automatically generated. If you want to change anything, make the changes in &lt;code&gt;.proto&lt;/code&gt; file. &lt;/p&gt;

&lt;p&gt;It's time for &lt;code&gt;server.go&lt;/code&gt; file.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;package&lt;/span&gt; &lt;span class="n"&gt;main&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s"&gt;"context"&lt;/span&gt;
    &lt;span class="s"&gt;"grpcdemo/pb"&lt;/span&gt;
    &lt;span class="s"&gt;"log"&lt;/span&gt;
    &lt;span class="s"&gt;"net"&lt;/span&gt;

    &lt;span class="s"&gt;"google.golang.org/grpc"&lt;/span&gt;
    &lt;span class="s"&gt;"google.golang.org/protobuf/types/known/emptypb"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;Server&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;UnimplementedGreetServiceServer&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;Server&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;SayHello&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;req&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;HelloRequest&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;HelloResponse&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GetName&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="s"&gt;"Hello "&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="s"&gt;", Welcome to grpc tutorial !!!"&lt;/span&gt;

    &lt;span class="n"&gt;res&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;HelloResponse&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;Message&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;res&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;lis&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;net&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Listen&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"tcp"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;":12345"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"gRPC server is running on port 12345"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Fatalf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"failed to listen : %v"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;grpc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewServer&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;RegisterGreetServiceServer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;Server&lt;/span&gt;&lt;span class="p"&gt;{})&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Serve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lis&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Fatalf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Failed to server: %v"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There are a bunch of things happening in this code.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;Server&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;UnimplementedGreetServiceServer&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The protocol buffer compiler (protoc) generated an &lt;strong&gt;Interface&lt;/strong&gt; called &lt;strong&gt;GreetServiceServer&lt;/strong&gt;. This interface lists every RPC method defined in your .proto file (currently, just SayHello). To build a valid gRPC server, your Server struct must satisfy this entire interface.&lt;/p&gt;

&lt;p&gt;By embedding the UnimplementedGreetServiceServer struct (which provides a default, error-returning implementation for all methods), we accomplish two things:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Initial Satisfaction: You instantly satisfy the entire GreetServiceServer interface without explicitly writing SayHello implementation yet. Doing this would ensure forward compatibility. It's like a default implementation for &lt;code&gt;GreetServiceServer&lt;/code&gt; interface inside &lt;code&gt;greet_grpc.pb.go&lt;/code&gt; file&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The Override: Your manually written SayHello method overrides the placeholder version, making your implementation the one that actually runs.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;(Do check out the greet_grpc.pb.go file to find the &lt;strong&gt;GreetServiceServer&lt;/strong&gt; interface)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;Server&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;SayHello&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;req&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;HelloRequest&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;HelloResponse&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GetName&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="s"&gt;"Hello "&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="s"&gt;", Welcome to grpc tutorial !!!"&lt;/span&gt;

    &lt;span class="n"&gt;res&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;HelloResponse&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;Message&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;res&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the implementation of the SayHello method we defined in the .proto file.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;lis&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;net&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Listen&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"tcp"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;":12345"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"gRPC server is running on port 12345"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Fatalf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"failed to listen : %v"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;grpc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewServer&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;RegisterGreetServiceServer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;Server&lt;/span&gt;&lt;span class="p"&gt;{})&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Serve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lis&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Fatalf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Failed to server: %v"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here comes the fun part. The &lt;code&gt;net.Listen&lt;/code&gt; function creates a TCP listener (lis) on the specified port (:12345). This step effectively tells the operating system, "I want to reserve this port and listen for incoming connections." If the port is already in use or access is denied, the program exits with an error.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;s := grpc.NewServer()&lt;/code&gt; line creates an instance of gRPC server which handles all the complex internal tasks of grpc. &lt;/p&gt;

&lt;p&gt;&lt;code&gt;pb.RegisterGreetServiceServer(s, &amp;amp;Server{})&lt;/code&gt; line links/registers our Server implementation to grpc server instance. In other words it tells the grpc server that if any request comes targetting the "GreetService" then redirect that request to our defined &lt;code&gt;Server&lt;/code&gt; struct. &lt;/p&gt;

&lt;p&gt;Finally &lt;code&gt;s.Serve(lis)&lt;/code&gt; launches our server. It takes the network listener (lis) and begins an infinite loop, continuously accepting new client connections.&lt;/p&gt;

&lt;p&gt;Now let's move on to the client implementation. We will make a function call to the &lt;code&gt;SayHello&lt;/code&gt; as if it was a local function.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;package&lt;/span&gt; &lt;span class="n"&gt;main&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s"&gt;"context"&lt;/span&gt;
    &lt;span class="s"&gt;"grpcdemo/pb"&lt;/span&gt;
    &lt;span class="s"&gt;"log"&lt;/span&gt;
    &lt;span class="s"&gt;"time"&lt;/span&gt;

    &lt;span class="s"&gt;"google.golang.org/grpc"&lt;/span&gt;
    &lt;span class="s"&gt;"google.golang.org/grpc/credentials/insecure"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;grpc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="s"&gt;"localhost:12345"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;grpc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WithTransportCredentials&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;insecure&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewCredentials&lt;/span&gt;&lt;span class="p"&gt;()),&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Fatalf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Failed to connect: %v"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;defer&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;greetClient&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewGreetServiceClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cancel&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WithTimeout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Background&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Second&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;defer&lt;/span&gt; &lt;span class="n"&gt;cancel&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;res&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;greetClient&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;SayHello&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;pb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;HelloRequest&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;Name&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"Billy"&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Fatalf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Failed to greet: %v"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Greeting: %s"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;res&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GetMessage&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Great !!!, now let's first test the code. Open two terminals. In first one run the server.go file using &lt;code&gt;go run main.go&lt;/code&gt; and similarly run client.go file as well. You should see the following output on client terminal.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;2025/11/03 22:36:20 Greeting: Hello Billy, Welcome to grpc tutorial !!!
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Great Work !!!&lt;br&gt;
Here is a breakdown of the client.go file.  &lt;/p&gt;

&lt;p&gt;&lt;code&gt;grpc.NewClient("localhost:12345",grpc.WithTransportCredentials(insecure.NewCredentials()),)&lt;/code&gt;&lt;br&gt;
This line creates a persistent connection to the open server running on port ":12345". Since this is a simple implementation, I haven't passed any proper credentials. &lt;/p&gt;

&lt;p&gt;&lt;code&gt;greetClient := pb.NewGreetServiceClient(conn)&lt;/code&gt; &lt;/p&gt;

&lt;p&gt;The function pb.NewGreetServiceClient() is generated by the Protocol Buffer compiler. It takes the established network connection (conn) and returns an object (greetClient). All methods called on this stub (like SayHello) are automatically handled by gRPC, taking care of all the complicated underlying networking stuff. &lt;/p&gt;

&lt;p&gt;&lt;code&gt;res, err := greetClient.SayHello(ctx, &amp;amp;pb.HelloRequest{Name: "Billy"})&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Finally, greetClient is used to call the SayHello method on the client stub. This is what we call an &lt;strong&gt;RPC&lt;/strong&gt;. Print out the response variable to check if everything worked out fine or not. &lt;/p&gt;

&lt;p&gt;That's the end of this short project. I definitely got to learn a lot about protocol buffers and how they work with gRPC. The biggest takeaways were realizing how Protocol Buffers define the contract and how the client stub hides the network overhead. Understanding the need for struct embedding to achieve forward compatibility was the critical Go-specific lesson. Please let me know if I missed out something or went wrong somewhere. &lt;/p&gt;

&lt;p&gt;Here's the &lt;a href="https://github.com/Arunabh-gupta/grpcdemo" rel="noopener noreferrer"&gt;source code&lt;/a&gt; for this project. You can play around using this as a base. Try to create more services, more methods inside the GreetService only and interact with the gRPC server. &lt;/p&gt;

</description>
      <category>go</category>
      <category>grpc</category>
      <category>webdev</category>
      <category>microservices</category>
    </item>
  </channel>
</rss>
