<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: compilersutra</title>
    <description>The latest articles on Forem by compilersutra (@aabhinavg).</description>
    <link>https://forem.com/aabhinavg</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1030092%2Fea9dd738-34c3-4265-9f09-1782daf49d3e.jpeg</url>
      <title>Forem: compilersutra</title>
      <link>https://forem.com/aabhinavg</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/aabhinavg"/>
    <language>en</language>
    <item>
      <title>From 0 to 1M Impressions: Building a Niche Compiler Blog</title>
      <dc:creator>compilersutra</dc:creator>
      <pubDate>Tue, 14 Apr 2026 14:30:05 +0000</pubDate>
      <link>https://forem.com/aabhinavg/from-0-to-1m-impressions-building-a-niche-compiler-blog-2hk1</link>
      <guid>https://forem.com/aabhinavg/from-0-to-1m-impressions-building-a-niche-compiler-blog-2hk1</guid>
      <description>&lt;p&gt;I’ve been working on a niche site focused on compilers and systems:&lt;br&gt;
👉 &lt;a href="https://www.compilersutra.com" rel="noopener noreferrer"&gt;https://www.compilersutra.com&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Recently, I checked my performance over the last 12 months — and the results surprised me:&lt;/p&gt;

&lt;p&gt;1.09M impressions&lt;br&gt;
11K clicks&lt;br&gt;
Ranking for topics like LLVM, OpenCL, TVM&lt;/p&gt;

&lt;p&gt;All of this came purely from organic search.&lt;/p&gt;

&lt;p&gt;💡 What worked for me&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Picking a niche most people ignore&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Compilers, low-level systems, and ML compilers aren’t “mainstream” topics.&lt;/p&gt;

&lt;p&gt;But that’s exactly why they work.&lt;/p&gt;

&lt;p&gt;Less noise → more authority over time.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Writing for depth, not just keywords&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Instead of chasing trends, I focused on:&lt;/p&gt;

&lt;p&gt;Explaining concepts deeply&lt;br&gt;
Covering real-world use cases&lt;br&gt;
Connecting theory → practical systems&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Consistency over intensity&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;No crazy posting schedule.&lt;/p&gt;

&lt;p&gt;Just consistent effort over time.&lt;/p&gt;

&lt;p&gt;That’s what compounds.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Structured content &amp;gt; random blogs&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I started thinking in terms of:&lt;/p&gt;

&lt;p&gt;Learning paths&lt;br&gt;
Roadmaps&lt;br&gt;
Connected topics&lt;/p&gt;

&lt;p&gt;Instead of isolated articles.&lt;/p&gt;

&lt;p&gt;📈 What I’m focusing on next&lt;br&gt;
Expanding compiler-related topics (LLVM, MLIR, TVM)&lt;br&gt;
Building structured learning tracks&lt;br&gt;
Improving user experience and engagement&lt;br&gt;
[🤝 Let’s connect]&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmddphelr8xmbhew9iye1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmddphelr8xmbhew9iye1.png" alt="compilersutra seo" width="800" height="299"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you’re interested in:&lt;/p&gt;

&lt;p&gt;Compilers&lt;br&gt;
Systems programming&lt;br&gt;
Low-level optimization&lt;/p&gt;

&lt;p&gt;Check it out 👉 &lt;a href="https://www.compilersutra.com" rel="noopener noreferrer"&gt;https://www.compilersutra.com&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Would love your feedback and suggestions!&lt;/p&gt;

</description>
      <category>buildinpublic</category>
      <category>computerscience</category>
      <category>marketing</category>
      <category>programming</category>
    </item>
    <item>
      <title>Memory Hierarch</title>
      <dc:creator>compilersutra</dc:creator>
      <pubDate>Mon, 13 Apr 2026 11:49:45 +0000</pubDate>
      <link>https://forem.com/aabhinavg/memory-hierarch-ej1</link>
      <guid>https://forem.com/aabhinavg/memory-hierarch-ej1</guid>
      <description>&lt;p&gt;If you're working on compilers, runtimes, or low-level systems…&lt;br&gt;
Stop asking “what is cache?”&lt;/p&gt;

&lt;p&gt;Start asking 👉 “what kind of miss did my code create?”&lt;/p&gt;

&lt;p&gt;💡 One bad memory access = hundreds of cycles lost&lt;br&gt;
💡 L1 → L3 → DRAM = massive slowdown&lt;br&gt;
💡 Performance = access pattern, not just instructions&lt;br&gt;
I broke it all down with real benchmarks (Ryzen 9700X) 👇&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.compilersutra.com/docs/coa/memory-hierarchy/" rel="noopener noreferrer"&gt;https://www.compilersutra.com/docs/coa/memory-hierarchy/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;⚡ Learn:&lt;br&gt;
• Cache misses &amp;amp; set conflicts&lt;br&gt;
• False sharing &amp;amp; multithreading pitfalls&lt;br&gt;
• TLB &amp;amp; page-walk cost&lt;br&gt;
• Why loop tiling gives 30x speedups&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>tutorial</category>
      <category>beginners</category>
    </item>
    <item>
      <title>AMD ML Complete Stack</title>
      <dc:creator>compilersutra</dc:creator>
      <pubDate>Sun, 12 Apr 2026 07:02:28 +0000</pubDate>
      <link>https://forem.com/aabhinavg/amd-ml-complete-stack-3hnm</link>
      <guid>https://forem.com/aabhinavg/amd-ml-complete-stack-3hnm</guid>
      <description>&lt;p&gt;I wrote 6 lines of Triton…&lt;/p&gt;

&lt;p&gt;and it turned into thousands of GPU instructions.&lt;/p&gt;

&lt;p&gt;Python → TTIR → TTGIR → LLVM → AMDGCN → HSACO&lt;/p&gt;

&lt;p&gt;👉 a + b → buffer_load_b128&lt;/p&gt;

&lt;p&gt;👉 mask → v_cmp + conditional execution&lt;/p&gt;

&lt;p&gt;Here’s the truth:&lt;/p&gt;

&lt;p&gt;Your code is NOT what runs on the GPU.&lt;/p&gt;

&lt;p&gt;The compiler builds an entire execution pipeline in between.&lt;/p&gt;

&lt;p&gt;I dumped every stage and traced one kernel end-to-end 👇&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.compilersutra.com/docs/ml-compilers/mlcompilerstack/" rel="noopener noreferrer"&gt;https://www.compilersutra.com/docs/ml-compilers/mlcompilerstack/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After this, ML compilers don’t feel like “magic” anymore.&lt;/p&gt;

</description>
      <category>gpu</category>
      <category>cpu</category>
      <category>ai</category>
      <category>llm</category>
    </item>
    <item>
      <title>Introduction to ML Compilers + Roadmap (MLIR, TVM, GPU Kernels)</title>
      <dc:creator>compilersutra</dc:creator>
      <pubDate>Sat, 11 Apr 2026 11:33:31 +0000</pubDate>
      <link>https://forem.com/aabhinavg/introduction-to-ml-compilers-roadmap-mlir-tvm-gpu-kernels-24hb</link>
      <guid>https://forem.com/aabhinavg/introduction-to-ml-compilers-roadmap-mlir-tvm-gpu-kernels-24hb</guid>
      <description>&lt;p&gt;Most people think they are running Python when they train ML models.&lt;/p&gt;

&lt;p&gt;They are not.&lt;/p&gt;

&lt;p&gt;Python is only the interface.&lt;/p&gt;

&lt;p&gt;The real execution happens somewhere completely different — inside an ML compiler stack.&lt;/p&gt;




&lt;h2&gt;
  
  
  🧠 What actually happens?
&lt;/h2&gt;

&lt;p&gt;When you write something like:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;matmul → add → relu&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;It looks simple.&lt;/p&gt;

&lt;p&gt;But internally, the system transforms it into multiple layers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Python (model definition)&lt;/li&gt;
&lt;li&gt;Graph (tensor operations)&lt;/li&gt;
&lt;li&gt;Execution plan (optimized structure)&lt;/li&gt;
&lt;li&gt;Kernels (GPU/CPU instructions)&lt;/li&gt;
&lt;li&gt;Hardware execution&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At no point does the GPU “run Python”.&lt;/p&gt;

&lt;p&gt;It runs &lt;strong&gt;compiled kernels&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  ⚙️ Why ML Compilers exist
&lt;/h2&gt;

&lt;p&gt;Because raw model code is inefficient for hardware.&lt;/p&gt;

&lt;p&gt;Without a compiler:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Too many kernel launches&lt;/li&gt;
&lt;li&gt;Unnecessary memory transfers&lt;/li&gt;
&lt;li&gt;No operator fusion&lt;/li&gt;
&lt;li&gt;Poor GPU utilization&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With a compiler:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Operations are fused&lt;/li&gt;
&lt;li&gt;Memory movement is reduced&lt;/li&gt;
&lt;li&gt;Execution is optimized for hardware&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  🔥 Key concepts covered
&lt;/h2&gt;

&lt;p&gt;This article builds the foundation for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;MLIR (multi-level IR systems)&lt;/li&gt;
&lt;li&gt;TVM (end-to-end ML compiler stack)&lt;/li&gt;
&lt;li&gt;GPU kernel execution model&lt;/li&gt;
&lt;li&gt;Operator fusion &amp;amp; memory planning&lt;/li&gt;
&lt;li&gt;Compilation pipeline design&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🧭 Roadmap (what you’ll learn)
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Tensors, shapes, memory layout&lt;/li&gt;
&lt;li&gt;CPU vs GPU execution model&lt;/li&gt;
&lt;li&gt;Compiler basics (IR, lowering, passes)&lt;/li&gt;
&lt;li&gt;ML compiler optimizations&lt;/li&gt;
&lt;li&gt;Real systems (TVM, MLIR, XLA)&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  📘 Full Article
&lt;/h2&gt;

&lt;p&gt;👉 [&lt;a href="https://www.compilersutra.com/docs/ml-compile" rel="noopener noreferrer"&gt;https://www.compilersutra.com/docs/ml-compile&lt;/a&gt;&lt;/p&gt;

</description>
      <category>architecture</category>
      <category>deeplearning</category>
      <category>machinelearning</category>
      <category>performance</category>
    </item>
    <item>
      <title>Building CompilerSutra</title>
      <dc:creator>compilersutra</dc:creator>
      <pubDate>Thu, 02 Apr 2026 04:09:59 +0000</pubDate>
      <link>https://forem.com/aabhinavg/building-compilersutra-26a9</link>
      <guid>https://forem.com/aabhinavg/building-compilersutra-26a9</guid>
      <description>&lt;p&gt;🚀 Building practical content on compilers, LLVM, MLIR, and performance.&lt;/p&gt;

&lt;p&gt;If this sounds interesting, you can join here:&lt;br&gt;
&lt;a href="https://docs.google.com/forms/d/e/1FAIpQLSebP1JfLFDp0ckTxOhODKPNVeI1e21rUqMJ0fbBwJoaa-i4Yw/viewform" rel="noopener noreferrer"&gt;link&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Would love to know what topics you’d like covered.&lt;/p&gt;

</description>
      <category>llvm</category>
      <category>computerscience</category>
      <category>college</category>
    </item>
    <item>
      <title>How a CPU Actually Executes Your Code (Most Developers Get This Wrong)</title>
      <dc:creator>compilersutra</dc:creator>
      <pubDate>Sun, 29 Mar 2026 08:27:37 +0000</pubDate>
      <link>https://forem.com/aabhinavg/how-a-cpu-actually-executes-your-code-most-developers-get-this-wrong-4dci</link>
      <guid>https://forem.com/aabhinavg/how-a-cpu-actually-executes-your-code-most-developers-get-this-wrong-4dci</guid>
      <description>&lt;p&gt;Read full blog at &lt;a href="https://www.compilersutra.com/docs/coa/" rel="noopener noreferrer"&gt;https://www.compilersutra.com/docs/coa/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Most developers think the CPU “runs code”.&lt;/p&gt;

&lt;p&gt;It doesn’t.&lt;/p&gt;

&lt;p&gt;It executes &lt;strong&gt;raw bytes&lt;/strong&gt; — billions of times per second — using a tightly optimized loop called the &lt;strong&gt;instruction cycle&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Understanding this is the difference between writing code… and writing &lt;strong&gt;fast code&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  🧠 The Reality
&lt;/h2&gt;

&lt;p&gt;When your program runs, the CPU does NOT see:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;variables&lt;/li&gt;
&lt;li&gt;loops&lt;/li&gt;
&lt;li&gt;functions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It only sees:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;instruction bytes&lt;/li&gt;
&lt;li&gt;memory&lt;/li&gt;
&lt;li&gt;registers&lt;/li&gt;
&lt;li&gt;a pointer to the next instruction (PC)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Everything else is already gone.&lt;/p&gt;




&lt;h2&gt;
  
  
  ⚙️ The Instruction Cycle (Simplified)
&lt;/h2&gt;

&lt;p&gt;Every instruction goes through this loop:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Fetch&lt;/strong&gt; → Get instruction from memory&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Decode&lt;/strong&gt; → Understand what it means&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Execute&lt;/strong&gt; → Perform the operation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Writeback&lt;/strong&gt; → Store the result&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This happens &lt;strong&gt;billions of times per second&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  ⚡ Why This Matters
&lt;/h2&gt;

&lt;p&gt;Two pieces of code can look similar…&lt;/p&gt;

&lt;p&gt;…but run VERY differently.&lt;/p&gt;

&lt;p&gt;Why?&lt;/p&gt;

&lt;p&gt;Because performance depends on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;memory access (cache vs RAM)&lt;/li&gt;
&lt;li&gt;instruction dependencies&lt;/li&gt;
&lt;li&gt;pipeline behavior inside the CPU&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🚨 Example
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;mov eax, [rbx]
add ecx, eax
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If &lt;code&gt;[rbx]&lt;/code&gt; hits in cache → fast&lt;br&gt;
If it goes to RAM → 200+ cycles stall&lt;/p&gt;

&lt;p&gt;👉 The CPU isn’t slow.&lt;br&gt;
👉 Memory is.&lt;/p&gt;

&lt;h2&gt;
  
  
  🔥 The Real Trick: Pipelining
&lt;/h2&gt;

&lt;p&gt;Modern CPUs don’t wait for one instruction to finish.&lt;/p&gt;

&lt;p&gt;They overlap them:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;one instruction in Fetch&lt;/li&gt;
&lt;li&gt;one in Decode&lt;/li&gt;
&lt;li&gt;one in Execute&lt;/li&gt;
&lt;li&gt;one in Writeback&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;👉 This is called a &lt;strong&gt;pipeline&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That’s how CPUs stay fast.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Insight
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;Performance is NOT just about instructions.&lt;br&gt;
It’s about how the CPU &lt;strong&gt;feeds and executes them&lt;/strong&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Full Interactive Breakdown
&lt;/h2&gt;

&lt;p&gt;I built a full version with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;pipeline animations&lt;/li&gt;
&lt;li&gt;cache stall visualizations&lt;/li&gt;
&lt;li&gt;real execution flow&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;👉 &lt;a href="https://www.compilersutra.com/docs/coa/" rel="noopener noreferrer"&gt;https://www.compilersutra.com/docs/coa/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is part of my deep-dive series on compilers, LLVM, and CPU performance.&lt;/p&gt;

</description>
      <category>development</category>
      <category>developer</category>
      <category>systemdesign</category>
      <category>computerscience</category>
    </item>
    <item>
      <title>Adding MCQs for LLVM &amp; Systems Learning on CompilerSutra</title>
      <dc:creator>compilersutra</dc:creator>
      <pubDate>Sat, 28 Mar 2026 02:18:37 +0000</pubDate>
      <link>https://forem.com/aabhinavg/adding-mcqs-for-llvm-systems-learning-on-compilersutra-1f2g</link>
      <guid>https://forem.com/aabhinavg/adding-mcqs-for-llvm-systems-learning-on-compilersutra-1f2g</guid>
      <description>&lt;p&gt;*&lt;em&gt;I just added an MCQ section for Compiler &amp;amp; LLVM learners!&lt;br&gt;
*&lt;/em&gt;&lt;br&gt;
If you're preparing for compiler interviews or want to strengthen your fundamentals, this might help 👇&lt;/p&gt;

&lt;p&gt;🔗 &lt;a href="https://www.compilersutra.com/docs/mcq/" rel="noopener noreferrer"&gt;https://www.compilersutra.com/docs/mcq/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;💡 What you'll find:&lt;br&gt;
• Compiler design MCQs&lt;br&gt;
• LLVM-focused questions&lt;br&gt;
• Concept-based learning (not just memorization)&lt;br&gt;
• Helpful for interviews + self-assessment&lt;/p&gt;

&lt;p&gt;This is an early-stage feature, so feedback is super welcome 🙌&lt;/p&gt;

&lt;p&gt;Next planned improvements:&lt;br&gt;
• Difficulty levels&lt;br&gt;
• Topic-wise segregation&lt;br&gt;
• Explanations for every question&lt;/p&gt;

&lt;p&gt;If you're into compilers, low-level systems, or LLVM would love your thoughts!&lt;/p&gt;

</description>
      <category>mcq</category>
      <category>cpp</category>
      <category>opensource</category>
      <category>programming</category>
    </item>
    <item>
      <title>GCC vs Clang: Same Instructions, Different Performance (AGU Insight)</title>
      <dc:creator>compilersutra</dc:creator>
      <pubDate>Fri, 27 Mar 2026 18:01:58 +0000</pubDate>
      <link>https://forem.com/aabhinavg/gcc-vs-clang-same-instructions-different-performance-agu-insight-1pae</link>
      <guid>https://forem.com/aabhinavg/gcc-vs-clang-same-instructions-different-performance-agu-insight-1pae</guid>
      <description>&lt;p&gt;*&lt;em&gt;I noticed something interesting while running a GCC vs Clang benchmark.&lt;br&gt;
*&lt;/em&gt;&lt;br&gt;
Same code. Same machine.&lt;br&gt;
Both loops are scalar (no vectorization).&lt;/p&gt;

&lt;p&gt;Yet…  &lt;em&gt;GCC consistently used fewer CPU cycles.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;At first, this doesn’t make sense.&lt;/p&gt;

&lt;p&gt;If both:&lt;/p&gt;

&lt;p&gt;execute roughly the same instructions&lt;br&gt;
are not vectorised&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why is there a performance gap?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;🔍 The Missing Piece: It’s Not Just Instructions&lt;br&gt;
Most people focus on:&lt;br&gt;
instruction count&lt;br&gt;
vectorization&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;But in this case, that’s not the full story.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;What actually matters more is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;how address computations are structured&lt;/li&gt;
&lt;li&gt;how instructions are scheduled&lt;/li&gt;
&lt;li&gt;how well latency is hidden&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here is the data &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsmoirlsj1ngynh3ey7eo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsmoirlsj1ngynh3ey7eo.png" alt="GCC VS CLANG" width="800" height="380"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;⚙️ AGU Pressure (Address Generation Units)&lt;/p&gt;

&lt;p&gt;On x86 CPUs, memory instructions rely on AGUs (Address Generation Units).&lt;/p&gt;

&lt;p&gt;Complex addressing patterns like:&lt;/p&gt;

&lt;p&gt;base + index * scale + offset&lt;/p&gt;

&lt;p&gt;&lt;em&gt;👉 increase AGU pressure&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Whereas simpler patterns like:&lt;br&gt;
pointer++&lt;br&gt;
👉 are cheaper and easier for the CPU to execute efficiently&lt;/p&gt;

&lt;p&gt;🧪 What I Observed&lt;br&gt;
GCC:&lt;br&gt;
Generates simpler addressing patterns&lt;br&gt;
Reduces AGU contention&lt;br&gt;
Keeps execution more consistent&lt;br&gt;
Clang:&lt;br&gt;
Shows higher AGU pressure&lt;br&gt;
More stalls&lt;br&gt;
Less efficient scheduling (in this case)&lt;/p&gt;

&lt;p&gt;⚡ Key Takeaway&lt;br&gt;
It’s not just about what instructions exist.&lt;/p&gt;

&lt;p&gt;It’s about:&lt;br&gt;
How efficiently the compiler feeds the CPU pipeline&lt;/p&gt;

&lt;p&gt;Same instruction count ≠ same performance.&lt;/p&gt;

&lt;p&gt;📊 Why This Matters&lt;/p&gt;

&lt;p&gt;In tight loops:&lt;/p&gt;

&lt;p&gt;AGU pressure&lt;br&gt;
addressing patterns&lt;br&gt;
instruction scheduling&lt;/p&gt;

&lt;p&gt;👉 can matter as much as (or more than) vectorization&lt;/p&gt;

&lt;p&gt;🔗 Want to Dive Deeper?&lt;/p&gt;

&lt;p&gt;👉 &lt;a href="https://compilersutra.com/docs/articles/gcc_vs_clang_assembly_part2a/" rel="noopener noreferrer"&gt;Full benchmark + assembly breakdown:&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;👉 &lt;a href="https://compilersutra.com/docs/articles/gcc_vs_clang_real_benchmarks_2026_reporter/&amp;lt;br&amp;gt;%0A![CLI%20COMMAND%20USED](https://dev-to-uploads.s3.amazonaws.com/uploads/articles/f4rnzz1p4mcutwhreb8j.png)" rel="noopener noreferrer"&gt;Complete analysis article:&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;💬 Discussion&lt;/p&gt;

&lt;p&gt;Have you seen cases where:&lt;/p&gt;

&lt;p&gt;similar assembly&lt;br&gt;
same instruction count&lt;/p&gt;

&lt;p&gt;👉 still results in very different performance?&lt;/p&gt;

&lt;p&gt;Would love to hear your observations.&lt;/p&gt;

</description>
      <category>gcc</category>
      <category>computerscience</category>
      <category>performance</category>
      <category>ai</category>
    </item>
    <item>
      <title>Cpp Tip for the Performance</title>
      <dc:creator>compilersutra</dc:creator>
      <pubDate>Sun, 13 Apr 2025 07:37:14 +0000</pubDate>
      <link>https://forem.com/aabhinavg/cpp-tip-for-the-performance-kfm</link>
      <guid>https://forem.com/aabhinavg/cpp-tip-for-the-performance-kfm</guid>
      <description>&lt;p&gt;C++ Tip # 1: &lt;br&gt;
&lt;a href="https://lnkd.in/gZ6mqHyW" rel="noopener noreferrer"&gt;https://lnkd.in/gZ6mqHyW&lt;/a&gt;&lt;br&gt;
C++Tip #2:&lt;br&gt;
 &lt;a href="https://lnkd.in/gPyaC7B6" rel="noopener noreferrer"&gt;https://lnkd.in/gPyaC7B6&lt;/a&gt;&lt;br&gt;
C++Tip #3: &lt;a href="https://lnkd.in/gjDQE9Je" rel="noopener noreferrer"&gt;https://lnkd.in/gjDQE9Je&lt;/a&gt;&lt;br&gt;
C++ Tip #4: &lt;a href="https://lnkd.in/gR4iYWSx" rel="noopener noreferrer"&gt;https://lnkd.in/gR4iYWSx&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;🚀 𝗖++ 𝗧𝗶𝗽 #𝟱:&lt;br&gt;
Prefer nullptr over NULL or 0 — Type-Safe and Modern&lt;/p&gt;

&lt;p&gt;🔒 𝘯𝘶𝘭𝘭𝘱𝘵𝘳 𝘸𝘢𝘴 𝘪𝘯𝘵𝘳𝘰𝘥𝘶𝘤𝘦𝘥 𝘪𝘯 𝘊++11 𝘢𝘯𝘥 𝘪𝘴 𝘵𝘩𝘦 𝘤𝘭𝘦𝘢𝘳, 𝘵𝘺𝘱𝘦-𝘴𝘢𝘧𝘦 𝘸𝘢𝘺 𝘵𝘰 𝘳𝘦𝘱𝘳𝘦𝘴𝘦𝘯𝘵 𝘢 𝘯𝘶𝘭𝘭 𝘱𝘰𝘪𝘯𝘵𝘦𝘳.&lt;/p&gt;

&lt;p&gt;💥 𝐖𝐡𝐲 𝐚𝐯𝐨𝐢𝐝 𝐍𝐔𝐋𝐋 𝐨𝐫 𝟎?&lt;br&gt;
𝘕𝘜𝘓𝘓 𝘪𝘴 𝘫𝘶𝘴𝘵 𝘢 𝘮𝘢𝘤𝘳𝘰 𝘧𝘰𝘳 0 (𝘰𝘳 ((𝘷𝘰𝘪𝘥*)0) 𝘪𝘯 𝘊), 𝘸𝘩𝘪𝘤𝘩 𝘤𝘢𝘯 𝘢𝘤𝘤𝘪𝘥𝘦𝘯𝘵𝘢𝘭𝘭𝘺 𝘮𝘢𝘵𝘤𝘩 𝘰𝘷𝘦𝘳𝘭𝘰𝘢𝘥𝘴 𝘰𝘳 𝘵𝘦𝘮𝘱𝘭𝘢𝘵𝘦𝘴 𝘯𝘰𝘵 𝘮𝘦𝘢𝘯𝘵 𝘧𝘰𝘳 𝘱𝘰𝘪𝘯𝘵𝘦rs.&lt;/p&gt;

&lt;p&gt;𝟎 𝐢𝐬 𝐚𝐦𝐛𝐢𝐠𝐮𝐨𝐮𝐬 — 𝐢𝐬 𝐢𝐭 𝐚𝐧 𝐢𝐧𝐭𝐞𝐠𝐞𝐫 𝐨𝐫 𝐚 𝐧𝐮𝐥𝐥 𝐩𝐨𝐢𝐧𝐭𝐞𝐫?&lt;/p&gt;

&lt;p&gt;✅ 𝐁𝐞𝐭𝐭𝐞𝐫 𝐀𝐩𝐩𝐫𝐨𝐚𝐜𝐡:&lt;br&gt;
𝘜𝘴𝘦 𝘯𝘶𝘭𝘭𝘱𝘵𝘳 — 𝘪𝘵 𝘩𝘢𝘴 𝘵𝘺𝘱𝘦 𝘴𝘵𝘥::𝘯𝘶𝘭𝘭𝘱𝘵𝘳_𝘵, 𝘴𝘰 𝘪𝘵 𝘰𝘯𝘭𝘺 𝘤𝘰𝘯𝘷𝘦𝘳𝘵𝘴 𝘵𝘰 𝘱𝘰𝘪𝘯𝘵𝘦𝘳 𝘵𝘺𝘱𝘦𝘴&lt;/p&gt;

&lt;p&gt;🔐 𝐌𝐨𝐝𝐞𝐫𝐧 𝐂++ 𝐢𝐬 𝐚𝐛𝐨𝐮𝐭 𝐞𝐱𝐩𝐫𝐞𝐬𝐬𝐢𝐯𝐞𝐧𝐞𝐬𝐬 + 𝐬𝐚𝐟𝐞𝐭𝐲.&lt;br&gt;
Don’t write like it's 1998. Upgrade to the features C++11 and beyond offers!&lt;/p&gt;

&lt;p&gt;Follow CompilerSutra for more such tips and subscribe 👉 &lt;a href="https://compilersutra.com" rel="noopener noreferrer"&gt;https://compilersutra.com&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
![Image description](https://dev-to-uploads.s3.amazonaws.com/uploads/articles/1e3w3fvdse45nhx3mc6m.png)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  compilersutra
&lt;/h1&gt;

</description>
      <category>cpp</category>
      <category>tutorial</category>
      <category>performance</category>
      <category>ai</category>
    </item>
    <item>
      <title>Learning Compiler and Parallel Programming in 2025</title>
      <dc:creator>compilersutra</dc:creator>
      <pubDate>Sun, 06 Apr 2025 01:10:31 +0000</pubDate>
      <link>https://forem.com/aabhinavg/learning-compiler-and-parallel-programming-in-2025-1mfg</link>
      <guid>https://forem.com/aabhinavg/learning-compiler-and-parallel-programming-in-2025-1mfg</guid>
      <description></description>
      <category>cpp</category>
      <category>learning</category>
      <category>programming</category>
      <category>gnu</category>
    </item>
    <item>
      <title>Introduction to Parallel Programming: Unlocking the Power of GPUs(Part 1)</title>
      <dc:creator>compilersutra</dc:creator>
      <pubDate>Sun, 06 Apr 2025 01:04:07 +0000</pubDate>
      <link>https://forem.com/aabhinavg/introduction-to-parallel-programming-unlocking-the-power-of-gpuspart-1-h2h</link>
      <guid>https://forem.com/aabhinavg/introduction-to-parallel-programming-unlocking-the-power-of-gpuspart-1-h2h</guid>
      <description>&lt;p&gt;Parallel programming is a powerful technique that allows us to take full advantage of the capabilities of modern computing systems, particularly GPUs. By breaking down a task into smaller sub-tasks and running them concurrently, we can achieve higher performance and solve complex problems more efficiently.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9t8r9agig1yo43rhxigi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9t8r9agig1yo43rhxigi.png" alt="Image description" width="800" height="800"&gt;&lt;/a&gt;&lt;br&gt;
For More &lt;a href="https://www.compilersutra.com/docs/gpu/parallel_programming/intro_to_parallel_programming/" rel="noopener noreferrer"&gt;visit&lt;/a&gt; &lt;/p&gt;

&lt;p&gt;In this post, we’ll explore the basics of parallel programming, its importance in modern computing, and how you can get started with GPU programming to accelerate your applications.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why Parallel Programming?&lt;/strong&gt;&lt;br&gt;
In the world of computing, many tasks can be parallelized, meaning that they can be broken into smaller pieces that can be processed simultaneously. This is especially true for applications requiring massive computational power, like machine learning, simulations, image processing, and scientific computing.&lt;/p&gt;

&lt;p&gt;Before GPUs, most computations were done on a single CPU core, which had limitations in processing speed. With parallel computing, multiple processors (cores) can work together to solve different parts of a problem simultaneously, greatly improving performance.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What is a GPU?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A Graphics Processing Unit (GPU) is a highly parallel processor designed to handle tasks related to graphics rendering. However, it’s not limited to just graphical applications. Over the years, GPUs have become essential for accelerating non-graphical tasks, particularly in fields like machine learning, data science, and scientific computing.&lt;/p&gt;

&lt;p&gt;Unlike traditional CPUs, which are optimized for single-threaded performance, GPUs are designed to handle thousands of threads simultaneously, making them ideal for parallel tasks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key Concepts in Parallel Programming&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Concurrency vs. Parallelism:&lt;/em&gt;&lt;br&gt;
Concurrency refers to the concept of multiple tasks being executed in overlapping periods but not necessarily simultaneously.&lt;/p&gt;

&lt;p&gt;Parallelism, on the other hand, is about performing tasks simultaneously using multiple processors or cores.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Threads:&lt;/em&gt;&lt;br&gt;
A thread is the smallest unit of execution in a process. In parallel programming, you typically create multiple threads to handle different parts of the computation simultaneously.&lt;/p&gt;

&lt;p&gt;GPUs can execute thousands of threads in parallel, making them much faster for certain types of problems.&lt;/p&gt;

&lt;p&gt;Synchronization:&lt;/p&gt;

&lt;p&gt;When multiple threads are running simultaneously, it's crucial to synchronize them to avoid conflicts, such as multiple threads trying to access the same data at the same time.&lt;/p&gt;

&lt;p&gt;Memory Management:&lt;/p&gt;

&lt;p&gt;Efficient use of memory is key to parallel programming. GPUs have a different memory architecture compared to CPUs, and understanding how to optimize memory access can drastically improve performance.&lt;/p&gt;

&lt;p&gt;Getting Started with GPU Parallel Programming&lt;br&gt;
Now that we have a basic understanding of parallel programming, let's see how to get started with GPU programming. Here are a few tools and frameworks that make it easier:&lt;/p&gt;

&lt;p&gt;CUDA (Compute Unified Device Architecture):&lt;/p&gt;

&lt;p&gt;CUDA is a programming model and API created by NVIDIA that allows you to use GPUs for general-purpose computing. It supports C, C++, and Python and provides a rich set of libraries and tools to accelerate your programs.&lt;/p&gt;

&lt;p&gt;OpenCL:&lt;/p&gt;

&lt;p&gt;OpenCL (Open Computing Language) is an open standard for parallel programming across heterogeneous systems, including CPUs, GPUs, and other processors. It supports multiple programming languages, including C and C++.&lt;/p&gt;

&lt;p&gt;TensorFlow &amp;amp; PyTorch:&lt;/p&gt;

&lt;p&gt;Both TensorFlow and PyTorch support GPU acceleration out of the box. These frameworks are especially popular in the machine learning and data science communities for training deep learning models.&lt;/p&gt;

&lt;p&gt;NVIDIA cuDNN:&lt;/p&gt;

&lt;p&gt;cuDNN is a GPU-accelerated library for deep neural networks. It is optimized for deep learning operations and is commonly used with frameworks like TensorFlow, Keras, and PyTorch.&lt;/p&gt;

&lt;p&gt;Conclusion&lt;br&gt;
Parallel programming is essential for taking full advantage of modern computing power, and GPUs are an incredible tool for speeding up computation. By learning parallel programming concepts and tools like CUDA and OpenCL, you can harness the power of GPUs to accelerate your applications in fields like machine learning, simulation, and more.&lt;/p&gt;

&lt;p&gt;Want to learn more about GPU programming? Check out the full guide on CompilerSutra for more in-depth explanations, code examples, and best practices.&lt;br&gt;
 let's unlock the true power of parallel computing!&lt;/p&gt;

</description>
      <category>programming</category>
      <category>gpu</category>
      <category>opensource</category>
      <category>mojo</category>
    </item>
    <item>
      <title>LLVM vs. GCC: A Comprehensive Comparison</title>
      <dc:creator>compilersutra</dc:creator>
      <pubDate>Fri, 14 Mar 2025 16:39:01 +0000</pubDate>
      <link>https://forem.com/aabhinavg/llvm-vs-gcc-a-comprehensive-comparison-33hj</link>
      <guid>https://forem.com/aabhinavg/llvm-vs-gcc-a-comprehensive-comparison-33hj</guid>
      <description>&lt;p&gt;For a deeper dive, check out the full article on &lt;strong&gt;CompilerSutra&lt;/strong&gt;:&lt;br&gt;&lt;br&gt;
👉 &lt;a href="https://www.compilersutra.com/docs/compilers/gcc_vs_llvm/" rel="noopener noreferrer"&gt;LLVM vs. GCC: A Detailed Comparison&lt;/a&gt;  &lt;/p&gt;

&lt;h1&gt;
  
  
  LLVM vs. GCC: A Comprehensive Comparison
&lt;/h1&gt;

&lt;p&gt;When it comes to compiler toolchains, &lt;strong&gt;LLVM&lt;/strong&gt; and &lt;strong&gt;GCC&lt;/strong&gt; are the two most widely used and debated options. Each has its strengths, trade-offs, and use cases, making it essential to understand their differences before choosing one for your project.  &lt;/p&gt;

&lt;h2&gt;
  
  
  🔹 &lt;strong&gt;What is LLVM?&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;LLVM is a modern, modular, and reusable compiler infrastructure. It is designed to support multiple languages and architectures while providing powerful optimization capabilities.  &lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Key Features of LLVM:&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Modular and reusable design
&lt;/li&gt;
&lt;li&gt;Better optimization for modern architectures
&lt;/li&gt;
&lt;li&gt;Intermediate representation (LLVM IR) allows advanced transformations
&lt;/li&gt;
&lt;li&gt;Clang frontend provides faster compilation and better diagnostics
&lt;/li&gt;
&lt;li&gt;Supports Just-In-Time (JIT) compilation
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  🔹 &lt;strong&gt;What is GCC?&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;GCC (GNU Compiler Collection) is a mature and widely used open-source compiler that supports multiple programming languages, including C, C++, Fortran, and more.  &lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Key Features of GCC:&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Mature and well-tested over decades
&lt;/li&gt;
&lt;li&gt;Supports a wide range of architectures
&lt;/li&gt;
&lt;li&gt;Strong optimization capabilities
&lt;/li&gt;
&lt;li&gt;Rich debugging and profiling tools
&lt;/li&gt;
&lt;li&gt;Strict adherence to language standards
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  🆚 &lt;strong&gt;LLVM vs. GCC: Key Differences&lt;/strong&gt;
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;LLVM&lt;/th&gt;
&lt;th&gt;GCC&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Compilation Speed&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Faster (due to Clang frontend)&lt;/td&gt;
&lt;td&gt;Slower compared to LLVM&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Optimization&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;More aggressive optimizations via LLVM IR&lt;/td&gt;
&lt;td&gt;Strong optimizations but less modular&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Debugging &amp;amp; Errors&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Better error messages &amp;amp; diagnostics&lt;/td&gt;
&lt;td&gt;Standard error reporting&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Modularity&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Highly modular (can be used as a library)&lt;/td&gt;
&lt;td&gt;Monolithic design&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;JIT Compilation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Supports JIT compilation&lt;/td&gt;
&lt;td&gt;No built-in JIT&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Language Support&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Supports many languages via Clang&lt;/td&gt;
&lt;td&gt;Broad language support, including legacy ones&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Adoption&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Used in modern projects like Swift, Rust, and Android&lt;/td&gt;
&lt;td&gt;Used in Linux kernel, embedded systems, and legacy projects&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  📌 &lt;strong&gt;Which One Should You Choose?&lt;/strong&gt;
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Use &lt;strong&gt;LLVM/Clang&lt;/strong&gt; if you need &lt;strong&gt;faster compilation, better diagnostics, JIT capabilities, and modularity&lt;/strong&gt;.
&lt;/li&gt;
&lt;li&gt;Use &lt;strong&gt;GCC&lt;/strong&gt; if you need &lt;strong&gt;strong compatibility, strict standards adherence, and support for legacy architectures&lt;/strong&gt;.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For a deeper dive, check out the full article on &lt;strong&gt;CompilerSutra&lt;/strong&gt;:&lt;br&gt;&lt;br&gt;
👉 &lt;a href="https://www.compilersutra.com/docs/compilers/gcc_vs_llvm/" rel="noopener noreferrer"&gt;LLVM vs. GCC: A Detailed Comparison&lt;/a&gt;  &lt;/p&gt;

</description>
      <category>opensource</category>
      <category>programming</category>
      <category>gnu</category>
    </item>
  </channel>
</rss>
