<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: LumGenLab</title>
    <description>The latest articles on Forem by LumGenLab (@lumgenlab).</description>
    <link>https://forem.com/lumgenlab</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3309744%2F9a9d015b-720f-4dcd-aeb0-08390998ea55.jpg</url>
      <title>Forem: LumGenLab</title>
      <link>https://forem.com/lumgenlab</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/lumgenlab"/>
    <language>en</language>
    <item>
      <title>I Built a GPT Model from Scratch in C++ (Runs on 2GB RAM!)</title>
      <dc:creator>LumGenLab</dc:creator>
      <pubDate>Mon, 01 Sep 2025 17:05:47 +0000</pubDate>
      <link>https://forem.com/lumgenlab/i-built-a-gpt-model-from-scratch-in-c-runs-on-2gb-ram-16nj</link>
      <guid>https://forem.com/lumgenlab/i-built-a-gpt-model-from-scratch-in-c-runs-on-2gb-ram-16nj</guid>
      <description>&lt;p&gt;Ever wondered what it takes to build a transformer from absolute scratch? No PyTorch, no TensorFlow, just raw C++ and mathematical determination.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Challenge
&lt;/h2&gt;

&lt;p&gt;Most GPT implementations today rely on heavyweight frameworks that abstract away the core mechanics. I wanted to understand every matrix multiplication, every gradient calculation, and every optimization step. So I built LumGPT - a complete GPT implementation in pure C++.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Constraints
&lt;/h2&gt;

&lt;p&gt;My hardware isn't exactly cutting edge:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AMD Phenom Triple-Core @ 2.4GHz (2008 era)&lt;/li&gt;
&lt;li&gt;2GB DDR2 RAM with only 700MB free&lt;/li&gt;
&lt;li&gt;No GPU (GTX 210 doesn't count)&lt;/li&gt;
&lt;li&gt;Regular HDD storage&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The question was: can you train a transformer on hardware that predates the transformer paper?&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;LumGPT includes everything you'd expect from a production transformer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Multi-head attention with causal masking&lt;/li&gt;
&lt;li&gt;Layer normalization (pre-LN like GPT-2)&lt;/li&gt;
&lt;li&gt;Feed-forward networks with GELU activation&lt;/li&gt;
&lt;li&gt;AdamW optimizer with weight decay&lt;/li&gt;
&lt;li&gt;Advanced sampling (temperature + top-k)&lt;/li&gt;
&lt;li&gt;Custom tensor operations optimized for cache efficiency&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Results
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Memory footprint: 32MB&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;CPU usage: 45%&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;Training time: 8 minutes per 200 iterations&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Loss progression:
Step 0: 4.5875
Step 200: 3.1597
Step 2000: 3.2377
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The model converged reasonably well on TinyShakespeare, but here's where it gets interesting.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Dataset Experiment
&lt;/h2&gt;

&lt;p&gt;TinyShakespeare has 40,000 lines but only 65 unique characters. I tried something different: a custom dataset of 202 modern jokes (Nasiruddin collection) with only 3,000 lines but 82 unique characters.&lt;/p&gt;

&lt;p&gt;The smaller dataset with richer vocabulary actually showed better learning characteristics. Sometimes data quality beats quantity.&lt;/p&gt;

&lt;h2&gt;
  
  
  Technical Deep Dive
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Memory Management
&lt;/h3&gt;

&lt;p&gt;Every tensor operation is optimized for cache locality. No dynamic allocations during training loops. Thread-local RNG for reproducibility without global state.&lt;/p&gt;

&lt;h3&gt;
  
  
  Mathematical Precision
&lt;/h3&gt;

&lt;p&gt;All gradients computed from first principles. Layer norm backward pass implements the full mathematical derivation, not approximations. Combined softmax-cross entropy gradients for numerical stability.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Attention Implementation
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Scaled dot-product with causal masking&lt;/span&gt;
&lt;span class="n"&gt;matmul&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Q_head&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;K_head_T&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;scores&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kt"&gt;double&lt;/span&gt; &lt;span class="n"&gt;scale&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;sqrt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;d_k&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="c1"&gt;// Apply causal mask&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;seq_len&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;seq_len&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;scores&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;seq_len&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;NEG_INF&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No shortcuts. Every operation follows the mathematical definitions exactly.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;This is just version 1. The next iteration will include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;4-bit quantization with QAT&lt;/li&gt;
&lt;li&gt;RoPE positional embeddings&lt;/li&gt;
&lt;li&gt;ALiBi attention bias&lt;/li&gt;
&lt;li&gt;Eigen 3.4.0 integration&lt;/li&gt;
&lt;li&gt;Custom inference optimizations&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why This Matters
&lt;/h2&gt;

&lt;p&gt;Framework abstractions are useful, but they can hide fundamental understanding. Building from scratch taught me why certain architectural choices matter, how gradients actually flow through transformers, and where the computational bottlenecks really are.&lt;/p&gt;

&lt;p&gt;Plus, proving that meaningful AI can run on decade-old hardware opens possibilities for edge deployment and democratized access.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Code
&lt;/h2&gt;

&lt;p&gt;The complete implementation is open source on &lt;a href="//github.com/LumGenLab/LumGPT"&gt;GitHub&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Building your own transformer is challenging but incredibly rewarding. You gain intuition that no amount of framework usage can provide.&lt;/p&gt;




&lt;p&gt;What's your experience with implementing models from scratch? Have you tried building transformers without frameworks?&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>deeplearning</category>
      <category>opensource</category>
      <category>cpp</category>
    </item>
    <item>
      <title>I Built a Ray Tracer That Runs on a Dinosaur PC and It's Cooking RTX 4090s</title>
      <dc:creator>LumGenLab</dc:creator>
      <pubDate>Wed, 20 Aug 2025 15:25:40 +0000</pubDate>
      <link>https://forem.com/lumgenlab/i-built-a-ray-tracer-that-runs-on-a-dinosaur-pc-and-its-cooking-rtx-4090s-18ma</link>
      <guid>https://forem.com/lumgenlab/i-built-a-ray-tracer-that-runs-on-a-dinosaur-pc-and-its-cooking-rtx-4090s-18ma</guid>
      <description>&lt;p&gt;So... I just spent 30 minutes building a ray tracer from scratch in pure C++17, and the results are honestly blowing my mind. Not because it's some groundbreaking new algorithm, but because it's running on hardware that's older than some of the developers reading this post.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Setup (Prepare to Laugh)
&lt;/h2&gt;

&lt;p&gt;Here's what I'm working with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;CPU&lt;/strong&gt;: AMD Phenom™ Triple-Core 2.40 GHz (yes, triple-core was a thing)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;RAM&lt;/strong&gt;: 2GB DDR2 (with 564MB actually free)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GPU&lt;/strong&gt;: GTX 210 (which doesn't even show up as available, just outputs to monitor)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Storage&lt;/strong&gt;: 149GB HDD (the clicking kind)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is literally a PC from the Pentium dual-core era that someone slapped an AMD sticker on. I'm pretty sure my phone has more computing power.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why I Did This
&lt;/h2&gt;

&lt;p&gt;I was getting frustrated with modern rendering engines. Want to do some ray tracing in Blender? Better have 16GB+ RAM. Unreal Engine 5 with Lumen? Hope you've got that RTX 4090 ready. &lt;/p&gt;

&lt;p&gt;But here's the thing - ray tracing is just math. Really beautiful, elegant math. So I thought: what if I stripped away all the bloat and just focused on the core algorithms?&lt;/p&gt;

&lt;p&gt;All available softwares like Blender, Maya, Unreal Engine, Unity even the growing Godot Engine can't run on my PC so I thought to make an advanced ray tracing engine that can run on PC while powerful than these software renderers. No OpenGL, no external libraries, no fancy frameworks. Just pure C++17 and the Windows API for display. One file. One simple, beautiful file.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Results Speak for Themselves
&lt;/h2&gt;

&lt;p&gt;Rendered Scene 1&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5bi6xk764tw37fbyz754.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5bi6xk764tw37fbyz754.jpg" alt="Rendered Scene 1" width="785" height="593"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Rendered Scene 2&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6rt6aedhfy1ft1wzaqe5.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6rt6aedhfy1ft1wzaqe5.jpg" alt="Rendered Scene 2" width="784" height="594"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;These renders took about &lt;strong&gt;40 seconds each&lt;/strong&gt; at 800x600 with 200 samples per pixel and max_depth of 50. On my dinosaur rig. While using only &lt;strong&gt;11.4MB of RAM&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Let that sink in for a moment.&lt;/p&gt;
&lt;h2&gt;
  
  
  What's Under the Hood
&lt;/h2&gt;

&lt;p&gt;The engine includes everything you'd expect from a modern ray tracer:&lt;/p&gt;
&lt;h3&gt;
  
  
  🎨 Physically Based Materials
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Lambertian diffuse surfaces (that red sphere)&lt;/li&gt;
&lt;li&gt;Metals with configurable roughness &lt;/li&gt;
&lt;li&gt;Glass with proper Fresnel reflectance (look at those refractions!)&lt;/li&gt;
&lt;li&gt;Emissive materials for light sources&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  🔬 Advanced Math
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Full 3D vector operations with reflection/refraction&lt;/li&gt;
&lt;li&gt;4x4 transformation matrices&lt;/li&gt;
&lt;li&gt;Monte Carlo integration for path tracing&lt;/li&gt;
&lt;li&gt;Importance sampling to reduce noise&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  ⚡ Performance Features
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Multi-threaded rendering (maxes out all 3 cores)&lt;/li&gt;
&lt;li&gt;AABB bounding volume acceleration&lt;/li&gt;
&lt;li&gt;Efficient memory management&lt;/li&gt;
&lt;li&gt;Lock-free progress tracking&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  🎯 Geometric Primitives
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Analytical sphere intersections&lt;/li&gt;
&lt;li&gt;Triangle meshes (the blue cube is made of 12 triangles)&lt;/li&gt;
&lt;li&gt;Automatic normal calculation&lt;/li&gt;
&lt;li&gt;UV coordinate generation&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  The Code Philosophy
&lt;/h2&gt;

&lt;p&gt;I kept everything in &lt;strong&gt;one file&lt;/strong&gt;. Yes, one single &lt;code&gt;rayengine.cpp&lt;/code&gt; file. Here's why:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;No dependency hell&lt;/strong&gt; - Download, compile, run&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Easy to understand&lt;/strong&gt; - Everything's right there&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fast compilation&lt;/strong&gt; - Builds in seconds&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Maximum portability&lt;/strong&gt; - Works anywhere C++17 does
&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="c1"&gt;// The entire camera ray generation in ~20 lines&lt;/span&gt;
&lt;span class="n"&gt;Ray&lt;/span&gt; &lt;span class="nf"&gt;get_ray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;double&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;double&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;mt19937&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;rng&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;Vec3&lt;/span&gt; &lt;span class="n"&gt;rd&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;lens_radius&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;random_in_unit_disk&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rng&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;Vec3&lt;/span&gt; &lt;span class="n"&gt;offset&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;rd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;rd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;Ray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;origin&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;offset&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;lower_left_corner&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;horizontal&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;vertical&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;origin&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;offset&lt;/span&gt;
    &lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Clean, readable, and it just works.&lt;/p&gt;
&lt;h2&gt;
  
  
  Performance That Makes You Think
&lt;/h2&gt;

&lt;p&gt;Here's what really gets me excited about this project:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Memory&lt;/strong&gt;: 11.4MB vs Blender's 500MB+&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dependencies&lt;/strong&gt;: 0 vs thousands&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Build time&lt;/strong&gt;: 3 seconds vs hours of CMake hell&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Quality&lt;/strong&gt;: Matches industry renderers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When you see glass that refracts properly, metals that reflect accurately, and shadows that feel real and all coming from a PC that predates YouTube HD - it makes you question why modern engines are so bloated.&lt;/p&gt;
&lt;h2&gt;
  
  
  The Technical Wins
&lt;/h2&gt;

&lt;p&gt;The threading implementation is something I'm particularly proud of. Instead of fighting with complex synchronization, I just split the image into rows and let each thread work independently:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Simple but effective parallel rendering&lt;/span&gt;
&lt;span class="n"&gt;threads&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;emplace_back&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;start_row&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;end_row&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;]()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;start_row&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;end_row&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;render_row&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image_height&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Result? 100% CPU utilization with zero system lag. The progress bar updates smoothly, the mouse still works, and all three cores are sweating to give me those pixels.&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Proves
&lt;/h2&gt;

&lt;p&gt;Modern graphics programming has lost its way. We've become so dependent on massive engines and external libraries that we've forgotten the fundamentals still work amazingly well.&lt;/p&gt;

&lt;p&gt;You don't need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;❌ A $1,600 graphics card&lt;/li&gt;
&lt;li&gt;❌ 32GB of RAM
&lt;/li&gt;
&lt;li&gt;❌ Gigabytes of engine downloads&lt;/li&gt;
&lt;li&gt;❌ Complex build systems&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You just need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ Good algorithms&lt;/li&gt;
&lt;li&gt;✅ Clean math&lt;/li&gt;
&lt;li&gt;✅ Efficient code&lt;/li&gt;
&lt;li&gt;✅ Understanding of the physics&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Try It Yourself
&lt;/h2&gt;

&lt;p&gt;The entire engine is open source on GitHub:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;🔗 &lt;a href="https://github.com/LumGenLab/Ray-Engine" rel="noopener noreferrer"&gt;Ray Engine Repository&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Building is stupid simple:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;g++ &lt;span class="nt"&gt;-std&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;c++17 &lt;span class="nt"&gt;-O3&lt;/span&gt; &lt;span class="nt"&gt;-m64&lt;/span&gt; &lt;span class="nt"&gt;-flto&lt;/span&gt; &lt;span class="nt"&gt;-pthread&lt;/span&gt; &lt;span class="nt"&gt;-mwindows&lt;/span&gt; &lt;span class="nt"&gt;-static-libgcc&lt;/span&gt; &lt;span class="nt"&gt;-static-libstdc&lt;/span&gt;++ &lt;span class="nt"&gt;-o&lt;/span&gt; ray_engine rayengine.cpp &lt;span class="nt"&gt;-lgdi32&lt;/span&gt; &lt;span class="nt"&gt;-luser32&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. No CMake, no vcpkg, no package managers. Just compile and watch the magic happen.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next?
&lt;/h2&gt;

&lt;p&gt;I'm thinking about adding:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;BVH acceleration structures for complex scenes&lt;/li&gt;
&lt;li&gt;Volumetric rendering (fog, smoke, god rays)&lt;/li&gt;
&lt;li&gt;Mesh loading (OBJ files)&lt;/li&gt;
&lt;li&gt;Maybe even some GPU compute shaders&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But honestly? Part of me wants to see how far I can push this single-file approach. There's something beautiful about keeping it simple.&lt;/p&gt;

&lt;h2&gt;
  
  
  Challenges For You
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;Code an advanced ray tracer from scratch using pure C++17 standard libraries but it should run on my PC not on Core i9.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;OR&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Build a 3D modelling software from scratch in C++17 using standard libraries but super efficient, ultra powerful, and should work on that dinosaur PC.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Real Lesson
&lt;/h2&gt;

&lt;p&gt;Sometimes the best solution isn't the most complex one. Sometimes it's just good math, clean code, and a refusal to accept that "this is how things are done."&lt;/p&gt;

&lt;p&gt;If a ray tracer can look this good on hardware from 2008, maybe we should stop assuming we need cutting-edge everything to create something beautiful.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Comment what should I code next?&lt;/strong&gt; I'm thinking either a software rasterizer or maybe diving into fluid simulation or even a new 3D modelling software. What would you like to see built from scratch? 🚀&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Built with ❤️ and way too much coffee on the world's most patient dinosaur PC&lt;/em&gt;&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>pathtracing</category>
      <category>cpp</category>
      <category>rendering</category>
    </item>
  </channel>
</rss>
