<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Rijul Rajesh</title>
    <description>The latest articles on Forem by Rijul Rajesh (@rijultp).</description>
    <link>https://forem.com/rijultp</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1207862%2Ff06197aa-d585-4225-94a6-86243238376f.png</url>
      <title>Forem: Rijul Rajesh</title>
      <link>https://forem.com/rijultp</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/rijultp"/>
    <language>en</language>
    <item>
      <title>Understanding Reinforcement Learning with Neural Networks Part 5: Connecting Reward, Derivative, and Step Size</title>
      <dc:creator>Rijul Rajesh</dc:creator>
      <pubDate>Fri, 15 May 2026 20:33:05 +0000</pubDate>
      <link>https://forem.com/rijultp/understanding-reinforcement-learning-with-neural-networks-part-5-connecting-reward-derivative-2dk</link>
      <guid>https://forem.com/rijultp/understanding-reinforcement-learning-with-neural-networks-part-5-connecting-reward-derivative-2dk</guid>
      <description>&lt;p&gt;In the &lt;a href="https://dev.to/rijultp/understanding-reinforcement-learning-with-neural-networks-part-4-positive-and-negative-rewards-23h0"&gt;previous article&lt;/a&gt;, we explored the reward system in reinforcement learning&lt;/p&gt;

&lt;p&gt;In this article, we will begin calculating the &lt;strong&gt;step size&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  First Update
&lt;/h2&gt;

&lt;p&gt;In this example, the learning rate is &lt;strong&gt;1.0&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2gaf2bcc6qnlfoo97obt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2gaf2bcc6qnlfoo97obt.png" alt=" " width="800" height="145"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;So, the step size is &lt;strong&gt;0.5&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Next, we update the bias by subtracting the step size from the old bias value &lt;strong&gt;0.0&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzivto11xex0mqmywf4a5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzivto11xex0mqmywf4a5.png" alt=" " width="595" height="162"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  After the Update
&lt;/h2&gt;

&lt;p&gt;Now that the bias has been updated, we run the model again.&lt;/p&gt;

&lt;p&gt;The new probability of going to &lt;strong&gt;Place B&lt;/strong&gt; becomes &lt;strong&gt;0.4&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiu35h5j6ol9yrzupnf2l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiu35h5j6ol9yrzupnf2l.png" alt=" " width="644" height="129"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This means the probability of going to &lt;strong&gt;Place A&lt;/strong&gt; is:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwxr65rqf5vyhy634ii90.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwxr65rqf5vyhy634ii90.png" alt=" " width="391" height="88"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgb7teytxtd16uruvt201.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgb7teytxtd16uruvt201.png" alt=" " width="657" height="175"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Choosing Again
&lt;/h2&gt;

&lt;p&gt;We now pick a random number between 0 and 1, and get &lt;strong&gt;0.9&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd024wbn97kuxo2q5jr16.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd024wbn97kuxo2q5jr16.png" alt=" " width="672" height="227"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Since &lt;strong&gt;0.9&lt;/strong&gt; falls in the region representing &lt;strong&gt;Place B&lt;/strong&gt;, we choose Place B.&lt;/p&gt;

&lt;h2&gt;
  
  
  Computing the Gradient Again
&lt;/h2&gt;

&lt;p&gt;To update the bias, we again compute the derivative.&lt;/p&gt;

&lt;p&gt;First, we assume that choosing &lt;strong&gt;Place B&lt;/strong&gt; was the correct action.&lt;/p&gt;

&lt;p&gt;So ideally:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq6ugtinbby8aquykebvq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq6ugtinbby8aquykebvq.png" alt=" " width="328" height="93"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now we compute the difference between the ideal value &lt;strong&gt;1.0&lt;/strong&gt; and the actual value &lt;strong&gt;0.4&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Using this, we calculate the derivative with respect to the bias, which gives:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0a4sdax4nf8qdpfaovud.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0a4sdax4nf8qdpfaovud.png" alt=" " width="479" height="69"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzmm7f8258xmctg9gt30h.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzmm7f8258xmctg9gt30h.png" alt=" " width="579" height="365"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Checking the Reward
&lt;/h2&gt;

&lt;p&gt;Now we check whether this was actually a good decision.&lt;/p&gt;

&lt;p&gt;Place B gives a large portion of fries, but our hunger input is &lt;strong&gt;0.0&lt;/strong&gt;, meaning we are not very hungry.&lt;/p&gt;

&lt;p&gt;So this was not a good choice.&lt;/p&gt;

&lt;p&gt;Therefore, the reward is:&lt;/p&gt;

&lt;p&gt;Reward = -1&lt;/p&gt;




&lt;h2&gt;
  
  
  Updating with Reward
&lt;/h2&gt;

&lt;p&gt;We multiply the derivative by the reward:&lt;/p&gt;

&lt;p&gt;-0.6 x -1 = 0.6&lt;/p&gt;

&lt;p&gt;So the updated derivative becomes &lt;strong&gt;0.6&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Second Step Update
&lt;/h2&gt;

&lt;p&gt;Now we calculate the step size again:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6rbgqh9tj72k0szf15b1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6rbgqh9tj72k0szf15b1.png" alt=" " width="800" height="152"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Final Result
&lt;/h2&gt;

&lt;p&gt;We plug the new bias back into the neural network.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp3ifab68esfkfjvpn7zm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp3ifab68esfkfjvpn7zm.png" alt=" " width="628" height="194"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now the probability of going to &lt;strong&gt;Place B&lt;/strong&gt; has decreased.&lt;/p&gt;

&lt;p&gt;This means that when hunger is low, the model is more likely to choose &lt;strong&gt;Place A&lt;/strong&gt;, which is the correct behavior.&lt;/p&gt;

&lt;p&gt;This shows that the reinforcement learning algorithm, specifically &lt;strong&gt;policy gradients&lt;/strong&gt;, is working as expected.&lt;/p&gt;




&lt;p&gt;In the next article, we will explore how to further train the model using different input values.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Looking for an easier way to install tools, libraries, or entire repositories?&lt;/strong&gt;&lt;br&gt;
Try &lt;strong&gt;Installerpedia&lt;/strong&gt;: a &lt;strong&gt;community-driven, structured installation platform&lt;/strong&gt; that lets you install almost anything with &lt;strong&gt;minimal hassle&lt;/strong&gt; and &lt;strong&gt;clear, reliable guidance&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Just run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ipm &lt;span class="nb"&gt;install &lt;/span&gt;repo-name
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;… and you’re done! 🚀&lt;/p&gt;

&lt;p&gt;&lt;a href="https://hexmos.com/ipm" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm2s3mzj8pfcq94a1y4at.png" alt="Installerpedia Screenshot" width="800" height="336"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;🔗 &lt;a href="https://hexmos.com/ipm/" rel="noopener noreferrer"&gt;&lt;strong&gt;Explore Installerpedia here&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Understanding Reinforcement Learning with Neural Networks Part 4: Positive and Negative Rewards</title>
      <dc:creator>Rijul Rajesh</dc:creator>
      <pubDate>Wed, 13 May 2026 20:46:18 +0000</pubDate>
      <link>https://forem.com/rijultp/understanding-reinforcement-learning-with-neural-networks-part-4-positive-and-negative-rewards-23h0</link>
      <guid>https://forem.com/rijultp/understanding-reinforcement-learning-with-neural-networks-part-4-positive-and-negative-rewards-23h0</guid>
      <description>&lt;p&gt;In the &lt;a href="https://dev.to/rijultp/understanding-reinforcement-learning-with-neural-networks-part-3-guessing-the-ideal-output-3m47"&gt;previous article&lt;/a&gt;, we began the process of guessing the ideal output.&lt;/p&gt;

&lt;p&gt;Let us continue with the same example.&lt;/p&gt;

&lt;p&gt;Suppose we receive a &lt;strong&gt;small number of fries&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Since our hunger level is &lt;strong&gt;0&lt;/strong&gt;, this is actually a good outcome.&lt;/p&gt;

&lt;p&gt;In this case, we should assign a &lt;strong&gt;reward of 1&lt;/strong&gt;.&lt;/p&gt;




&lt;p&gt;Now consider the opposite situation.&lt;/p&gt;

&lt;p&gt;Suppose we receive a &lt;strong&gt;large order of fries&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Since we are not hungry enough to eat all the fries, this means we made a poor decision.&lt;/p&gt;

&lt;p&gt;In that case, we assign a &lt;strong&gt;reward of -1&lt;/strong&gt;.&lt;/p&gt;




&lt;p&gt;In general:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Any &lt;strong&gt;positive reward&lt;/strong&gt; indicates a good decision&lt;/li&gt;
&lt;li&gt;Any &lt;strong&gt;negative reward&lt;/strong&gt; indicates a bad decision&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Updating the Derivative with the Reward
&lt;/h2&gt;

&lt;p&gt;We now use this reward to update the derivative.&lt;/p&gt;

&lt;p&gt;To do this, we simply multiply the derivative by the reward.&lt;/p&gt;




&lt;h3&gt;
  
  
  Case 1: Correct Decision
&lt;/h3&gt;

&lt;p&gt;If the reward is &lt;strong&gt;1&lt;/strong&gt;, then:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foqooiniacv6ovw8q9i3e.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foqooiniacv6ovw8q9i3e.png" alt=" " width="360" height="93"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The derivative remains unchanged.&lt;/p&gt;

&lt;p&gt;This means the derivative is already pointing in the correct direction.&lt;/p&gt;




&lt;h3&gt;
  
  
  Case 2: Incorrect Decision
&lt;/h3&gt;

&lt;p&gt;If the reward is &lt;strong&gt;-1&lt;/strong&gt;, then:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fld54pyb01v6hd9qdjetd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fld54pyb01v6hd9qdjetd.png" alt=" " width="376" height="101"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now the derivative changes sign.&lt;/p&gt;

&lt;p&gt;This causes the optimization process to move the bias in the opposite direction.&lt;/p&gt;

&lt;p&gt;In other words, the negative reward flips the direction of the update so the neural network can learn from the bad decision.&lt;/p&gt;

&lt;p&gt;In the next article, we will explore how to calculate the &lt;strong&gt;step size&lt;/strong&gt; for updating the parameters.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Looking for an easier way to install tools, libraries, or entire repositories?&lt;/strong&gt;&lt;br&gt;
Try &lt;strong&gt;Installerpedia&lt;/strong&gt;: a &lt;strong&gt;community-driven, structured installation platform&lt;/strong&gt; that lets you install almost anything with &lt;strong&gt;minimal hassle&lt;/strong&gt; and &lt;strong&gt;clear, reliable guidance&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Just run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ipm &lt;span class="nb"&gt;install &lt;/span&gt;repo-name
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;… and you’re done! 🚀&lt;/p&gt;

&lt;p&gt;&lt;a href="https://hexmos.com/ipm" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm2s3mzj8pfcq94a1y4at.png" alt="Installerpedia Screenshot" width="800" height="336"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;🔗 &lt;a href="https://hexmos.com/ipm/" rel="noopener noreferrer"&gt;&lt;strong&gt;Explore Installerpedia here&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Understanding Reinforcement Learning with Neural Networks Part 3: Guessing the Ideal Output</title>
      <dc:creator>Rijul Rajesh</dc:creator>
      <pubDate>Mon, 11 May 2026 18:48:10 +0000</pubDate>
      <link>https://forem.com/rijultp/understanding-reinforcement-learning-with-neural-networks-part-3-guessing-the-ideal-output-3m47</link>
      <guid>https://forem.com/rijultp/understanding-reinforcement-learning-with-neural-networks-part-3-guessing-the-ideal-output-3m47</guid>
      <description>&lt;p&gt;In the &lt;a href="https://dev.to/rijultp/understanding-reinforcement-learning-with-neural-networks-part-2-why-backpropagation-is-not-enough-2el2"&gt;previous article&lt;/a&gt;, we explored the limitations of backpropagation and why it is not ideal when the correct output values are unknown.&lt;/p&gt;

&lt;p&gt;In this article, we will begin exploring the core ideas behind reinforcement learning.&lt;/p&gt;

&lt;h2&gt;
  
  
  Starting Example
&lt;/h2&gt;

&lt;p&gt;Let us begin by assuming that we are &lt;strong&gt;not hungry&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;We will feed the value &lt;strong&gt;0.0&lt;/strong&gt; into the neural network.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxbrdbq7cqde9q2bzo575.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxbrdbq7cqde9q2bzo575.png" alt=" " width="800" height="175"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The neural network outputs a probability of &lt;strong&gt;0.5&lt;/strong&gt; for going to &lt;strong&gt;Place B&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;So:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Probability of going to Place B = &lt;strong&gt;p(B) = 0.5&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Probability of going to Place A = &lt;strong&gt;1 - p(B) = 0.5&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Visualizing the Probabilities
&lt;/h2&gt;

&lt;p&gt;We can represent these probabilities using a line.&lt;/p&gt;

&lt;p&gt;First, we draw a line segment with length &lt;strong&gt;0.5&lt;/strong&gt; to represent the probability of going to &lt;strong&gt;Place A&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Then, we append another line segment to represent the probability of going to &lt;strong&gt;Place B&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F67ez8dzdjhuk5z06ik5z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F67ez8dzdjhuk5z06ik5z.png" alt=" " width="800" height="180"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Together, these form a line ranging from &lt;strong&gt;0 to 1&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Choosing an Action
&lt;/h2&gt;

&lt;p&gt;To decide which place to go for a snack, we randomly pick a number between &lt;strong&gt;0 and 1&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Let us pick &lt;strong&gt;0.2&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2ckpjtnj21fhvfgecxti.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2ckpjtnj21fhvfgecxti.png" alt=" " width="800" height="213"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Since &lt;strong&gt;0.2&lt;/strong&gt; falls inside the region representing &lt;strong&gt;Place A&lt;/strong&gt;, we choose to go to Place A.&lt;/p&gt;




&lt;h2&gt;
  
  
  Making a Guess About the Correct Action
&lt;/h2&gt;

&lt;p&gt;Now, let us assume that going to &lt;strong&gt;Place A&lt;/strong&gt; when hunger = 0 was the correct decision.&lt;/p&gt;

&lt;p&gt;Ideally:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The probability of going to Place A, &lt;strong&gt;p(A)&lt;/strong&gt;, should be &lt;strong&gt;1&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;The probability of going to Place B, &lt;strong&gt;p(B)&lt;/strong&gt;, should be &lt;strong&gt;0&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These ideal values are based on our guess about what the correct action should have been.&lt;/p&gt;




&lt;h2&gt;
  
  
  Moving Toward Optimization
&lt;/h2&gt;

&lt;p&gt;Using these guessed ideal values, we can calculate the difference between:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the ideal probability for &lt;strong&gt;p(A)&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;the actual probability produced by the neural network&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This allows us to calculate the derivative of the difference with respect to the bias we want to optimize.&lt;/p&gt;




&lt;p&gt;In the next article, we will continue exploring how this optimization process works in reinforcement learning.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Looking for an easier way to install tools, libraries, or entire repositories?&lt;/strong&gt;&lt;br&gt;
Try &lt;strong&gt;Installerpedia&lt;/strong&gt;: a &lt;strong&gt;community-driven, structured installation platform&lt;/strong&gt; that lets you install almost anything with &lt;strong&gt;minimal hassle&lt;/strong&gt; and &lt;strong&gt;clear, reliable guidance&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Just run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ipm &lt;span class="nb"&gt;install &lt;/span&gt;repo-name
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;… and you’re done! 🚀&lt;/p&gt;

&lt;p&gt;&lt;a href="https://hexmos.com/ipm" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm2s3mzj8pfcq94a1y4at.png" alt="Installerpedia Screenshot" width="800" height="336"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;🔗 &lt;a href="https://hexmos.com/ipm/" rel="noopener noreferrer"&gt;&lt;strong&gt;Explore Installerpedia here&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Understanding Reinforcement Learning with Neural Networks Part 2: Why Backpropagation Is Not Enough</title>
      <dc:creator>Rijul Rajesh</dc:creator>
      <pubDate>Sun, 10 May 2026 19:53:16 +0000</pubDate>
      <link>https://forem.com/rijultp/understanding-reinforcement-learning-with-neural-networks-part-2-why-backpropagation-is-not-enough-2el2</link>
      <guid>https://forem.com/rijultp/understanding-reinforcement-learning-with-neural-networks-part-2-why-backpropagation-is-not-enough-2el2</guid>
      <description>&lt;p&gt;In the &lt;a href="https://dev.to/rijultp/understanding-reinforcement-learning-with-neural-networks-part-1-learning-without-correct-answers-47ld"&gt;previous article&lt;/a&gt;, we explored an example where reinforcement learning is required and standard methods do not work.&lt;/p&gt;

&lt;p&gt;In this article, we will understand why &lt;strong&gt;policy gradients&lt;/strong&gt; are needed, and why the standard &lt;strong&gt;backpropagation&lt;/strong&gt; method does not work in certain situations.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Backpropagation Normally Works
&lt;/h2&gt;

&lt;p&gt;Assume we have the following training data, where the desired outputs are already known:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Input (Hunger)&lt;/th&gt;
&lt;th&gt;Output p(B)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;0.0&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1.0&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;0.1&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;0.9&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;With this data, we can feed the input values into the neural network one at a time.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8scwonpe9136c8nap1pi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8scwonpe9136c8nap1pi.png" alt=" " width="800" height="150"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The neural network produces an output, and we compare it with the &lt;strong&gt;ideal output value&lt;/strong&gt; from the training data.&lt;/p&gt;

&lt;p&gt;Using this difference, we can measure how wrong the network is.&lt;/p&gt;




&lt;h2&gt;
  
  
  Using Derivatives to Update the Bias
&lt;/h2&gt;

&lt;p&gt;We can calculate these differences for different values of the bias and visualize how the error changes as the bias changes.&lt;/p&gt;

&lt;p&gt;From this graph, we can calculate the &lt;strong&gt;derivative&lt;/strong&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If the derivative is &lt;strong&gt;negative&lt;/strong&gt;, we shift the bias to the right&lt;/li&gt;
&lt;li&gt;If the derivative is &lt;strong&gt;positive&lt;/strong&gt;, we shift the bias to the left&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The derivative correctly tells us which direction to move because the training data already contains the ideal output values.&lt;/p&gt;

&lt;p&gt;This is the basic idea behind &lt;strong&gt;backpropagation&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem in Reinforcement Learning
&lt;/h2&gt;

&lt;p&gt;However, in reinforcement learning, we do not know the ideal output values in advance.&lt;/p&gt;

&lt;p&gt;For example, we do not already know whether choosing Place A or Place B is the correct action.&lt;/p&gt;

&lt;p&gt;Because of this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;we cannot calculate the difference between the neural network’s output and the ideal output&lt;/li&gt;
&lt;li&gt;without these differences, we cannot calculate derivatives in the normal way&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  A Different Approach
&lt;/h2&gt;

&lt;p&gt;Instead, we can &lt;strong&gt;guess&lt;/strong&gt; what the ideal outputs should be and use those guesses to estimate the derivatives.&lt;/p&gt;

&lt;p&gt;This idea forms the foundation of &lt;strong&gt;policy gradients&lt;/strong&gt; in reinforcement learning.&lt;/p&gt;

&lt;p&gt;In the next article, we will explore how reinforcement learning and policy gradients help us solve this problem.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Looking for an easier way to install tools, libraries, or entire repositories?&lt;/strong&gt;&lt;br&gt;
Try &lt;strong&gt;Installerpedia&lt;/strong&gt;: a &lt;strong&gt;community-driven, structured installation platform&lt;/strong&gt; that lets you install almost anything with &lt;strong&gt;minimal hassle&lt;/strong&gt; and &lt;strong&gt;clear, reliable guidance&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Just run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ipm &lt;span class="nb"&gt;install &lt;/span&gt;repo-name
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;… and you’re done! 🚀&lt;/p&gt;

&lt;p&gt;&lt;a href="https://hexmos.com/ipm" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm2s3mzj8pfcq94a1y4at.png" alt="Installerpedia Screenshot" width="800" height="336"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;🔗 &lt;a href="https://hexmos.com/ipm/" rel="noopener noreferrer"&gt;&lt;strong&gt;Explore Installerpedia here&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Understanding Reinforcement Learning with Neural Networks Part 1: Learning Without Correct Answers</title>
      <dc:creator>Rijul Rajesh</dc:creator>
      <pubDate>Fri, 08 May 2026 18:50:27 +0000</pubDate>
      <link>https://forem.com/rijultp/understanding-reinforcement-learning-with-neural-networks-part-1-learning-without-correct-answers-47ld</link>
      <guid>https://forem.com/rijultp/understanding-reinforcement-learning-with-neural-networks-part-1-learning-without-correct-answers-47ld</guid>
      <description>&lt;p&gt;In this article, we will explore &lt;strong&gt;reinforcement learning with neural networks&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Let’s start with a simple example.&lt;/p&gt;

&lt;h2&gt;
  
  
  Choosing Between Two Snack Places
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fazbdd5bnvfcgb508jaai.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fazbdd5bnvfcgb508jaai.png" alt=" " width="800" height="259"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Suppose it is snack time, and you have to choose between &lt;strong&gt;Place A&lt;/strong&gt; and &lt;strong&gt;Place B&lt;/strong&gt; for fries.&lt;/p&gt;

&lt;p&gt;To make a good decision, we also need to consider &lt;strong&gt;how hungry we are&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Some days we may be very hungry, while on other days we may only want a small snack.&lt;/p&gt;

&lt;p&gt;We also need to consider how many fries each place might serve.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Place B might give a &lt;strong&gt;large quantity of fries&lt;/strong&gt;, which would be great if we were very hungry&lt;/li&gt;
&lt;li&gt;But if we were not that hungry, getting too many fries might not be ideal&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Similarly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Getting a small amount of fries would not be good if we were extremely hungry&lt;/li&gt;
&lt;li&gt;But it could be perfectly fine if we only wanted a light snack&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So, it would be useful to have a system that helps decide which place to choose based on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;our hunger level&lt;/li&gt;
&lt;li&gt;the possible quantity of fries we might receive&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Using a Neural Network
&lt;/h2&gt;

&lt;p&gt;To solve this problem, we will use a &lt;strong&gt;neural network&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The neural network takes our &lt;strong&gt;hunger level&lt;/strong&gt; as the input and outputs the probability of choosing &lt;strong&gt;Place B&lt;/strong&gt;, written as &lt;strong&gt;p(B)&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Challenge
&lt;/h2&gt;

&lt;p&gt;Normally, when training a neural network, we start with a training dataset that contains:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;input values&lt;/li&gt;
&lt;li&gt;correct output values&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Using this data, we can train the network with standard &lt;strong&gt;backpropagation&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;However, in this example, we do not know in advance whether Place A or Place B will serve a large or small quantity of fries.&lt;/p&gt;

&lt;p&gt;Because of this, we do not know what the correct output values should be.&lt;/p&gt;

&lt;h2&gt;
  
  
  Reinforcement Learning
&lt;/h2&gt;

&lt;p&gt;In situations where we do not have known output values, we can still train a model using &lt;strong&gt;reinforcement learning&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Instead of learning from correct answers, the model learns by trying actions and receiving feedback based on how good the outcome was.&lt;/p&gt;

&lt;p&gt;In the next article, we will explore a reinforcement learning algorithm called &lt;strong&gt;policy gradients&lt;/strong&gt;.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Looking for an easier way to install tools, libraries, or entire repositories?&lt;/strong&gt;&lt;br&gt;
Try &lt;strong&gt;Installerpedia&lt;/strong&gt;: a &lt;strong&gt;community-driven, structured installation platform&lt;/strong&gt; that lets you install almost anything with &lt;strong&gt;minimal hassle&lt;/strong&gt; and &lt;strong&gt;clear, reliable guidance&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Just run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ipm &lt;span class="nb"&gt;install &lt;/span&gt;repo-name
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;… and you’re done! 🚀&lt;/p&gt;

&lt;p&gt;&lt;a href="https://hexmos.com/ipm" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm2s3mzj8pfcq94a1y4at.png" alt="Installerpedia Screenshot" width="800" height="336"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;🔗 &lt;a href="https://hexmos.com/ipm/" rel="noopener noreferrer"&gt;&lt;strong&gt;Explore Installerpedia here&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Understanding Encoder-Only Transformers: The Foundation of BERT and RAG Retrieval</title>
      <dc:creator>Rijul Rajesh</dc:creator>
      <pubDate>Thu, 07 May 2026 18:50:05 +0000</pubDate>
      <link>https://forem.com/rijultp/understanding-encoder-only-transformers-the-foundation-of-bert-and-rag-retrieval-4bk8</link>
      <guid>https://forem.com/rijultp/understanding-encoder-only-transformers-the-foundation-of-bert-and-rag-retrieval-4bk8</guid>
      <description>&lt;p&gt;Back in 2017, the first transformer architecture introduced two main components:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;an &lt;strong&gt;encoder&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;a &lt;strong&gt;decoder&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These two parts were connected so they could work together.&lt;/p&gt;

&lt;p&gt;This original design is known as an &lt;strong&gt;encoder–decoder transformer&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Decoders Can Work on Their Own
&lt;/h2&gt;

&lt;p&gt;Over time, researchers realized that the decoder alone was powerful enough for many tasks.&lt;/p&gt;

&lt;p&gt;Using only a decoder, models could:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;generate text&lt;/li&gt;
&lt;li&gt;continue sentences&lt;/li&gt;
&lt;li&gt;perform translation and other language tasks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As we discussed in the article on &lt;a href="https://dev.to/rijultp/understanding-decoder-only-transformers-part-1-masked-self-attention-mf8"&gt;decoder only transformers&lt;/a&gt;, these models form the foundation of systems like ChatGPT.&lt;/p&gt;

&lt;p&gt;These are called &lt;strong&gt;decoder-only transformers&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Encoders Can Also Work Independently
&lt;/h2&gt;

&lt;p&gt;In a similar way, encoder-based models are also very useful on their own.&lt;/p&gt;

&lt;p&gt;This idea forms the foundation of models like &lt;strong&gt;BERT&lt;/strong&gt; and many others.&lt;/p&gt;

&lt;p&gt;These are called &lt;strong&gt;encoder-only transformers&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Building Blocks of Encoder-Only Transformers
&lt;/h2&gt;

&lt;p&gt;Encoder-only transformers use the same core components we explored earlier:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Word embeddings&lt;/strong&gt; convert words into numbers&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Positional encoding&lt;/strong&gt; keeps track of word order&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Self-attention&lt;/strong&gt; helps establish relationships between words&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9wqriy365gvonj5c3isu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9wqriy365gvonj5c3isu.png" alt=" " width="549" height="702"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When these layers are combined, they create a new representation for each token that captures:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;meaning&lt;/li&gt;
&lt;li&gt;position&lt;/li&gt;
&lt;li&gt;relationships with other words&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These representations are called &lt;strong&gt;context-aware embeddings&lt;/strong&gt; or &lt;strong&gt;contextualized embeddings&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Context-Aware Embeddings Are Useful
&lt;/h2&gt;

&lt;p&gt;Context-aware embeddings can help group together:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;similar sentences&lt;/li&gt;
&lt;li&gt;similar paragraphs&lt;/li&gt;
&lt;li&gt;similar documents&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This capability is one of the foundations of &lt;strong&gt;Retrieval-Augmented Generation (RAG)&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;RAG works by:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Breaking documents into smaller chunks of text&lt;/li&gt;
&lt;li&gt;Using an encoder-only transformer to generate embeddings for each chunk&lt;/li&gt;
&lt;li&gt;Comparing embeddings to find the most relevant information&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Other Uses of Encoder-Only Transformers
&lt;/h2&gt;

&lt;p&gt;Context-aware embeddings can also be used as inputs for machine learning models.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;neural networks can use them for &lt;strong&gt;sentiment classification&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;logistic regression models can also use them for classification tasks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That wraps up encoder-only transformers.&lt;/p&gt;

&lt;p&gt;In the next article, we will explore &lt;strong&gt;reinforcement learning in neural networks&lt;/strong&gt;.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Looking for an easier way to install tools, libraries, or entire repositories?&lt;/strong&gt;&lt;br&gt;
Try &lt;strong&gt;Installerpedia&lt;/strong&gt;: a &lt;strong&gt;community-driven, structured installation platform&lt;/strong&gt; that lets you install almost anything with &lt;strong&gt;minimal hassle&lt;/strong&gt; and &lt;strong&gt;clear, reliable guidance&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Just run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ipm &lt;span class="nb"&gt;install &lt;/span&gt;repo-name
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;… and you’re done! 🚀&lt;/p&gt;

&lt;p&gt;&lt;a href="https://hexmos.com/ipm" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm2s3mzj8pfcq94a1y4at.png" alt="Installerpedia Screenshot" width="800" height="336"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;🔗 &lt;a href="https://hexmos.com/ipm/" rel="noopener noreferrer"&gt;&lt;strong&gt;Explore Installerpedia here&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Understanding Decoder-Only Transformers Part 2: Decoder-Only vs Regular Transformers</title>
      <dc:creator>Rijul Rajesh</dc:creator>
      <pubDate>Wed, 06 May 2026 19:44:50 +0000</pubDate>
      <link>https://forem.com/rijultp/understanding-decoder-only-transformers-part-2-decoder-only-vs-regular-transformers-3200</link>
      <guid>https://forem.com/rijultp/understanding-decoder-only-transformers-part-2-decoder-only-vs-regular-transformers-3200</guid>
      <description>&lt;p&gt;In this article, we will look at the differences between a &lt;strong&gt;decoder-only transformer&lt;/strong&gt; and a &lt;strong&gt;standard (encoder–decoder) transformer&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Decoder-Only Transformers Work
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F248ixjxedsy2xwj0sd2g.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F248ixjxedsy2xwj0sd2g.png" alt=" " width="659" height="483"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A decoder-only transformer uses the &lt;strong&gt;same components&lt;/strong&gt; to process the input prompt and to generate the output.&lt;/p&gt;

&lt;p&gt;It relies on &lt;strong&gt;masked self-attention&lt;/strong&gt;, which considers only the &lt;strong&gt;current word and the words that came before it&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Masked self-attention is applied to both:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the &lt;strong&gt;input prompt&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;the &lt;strong&gt;generated output&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This means the entire process is handled by a single stack of decoder layers.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Regular Transformers Work
&lt;/h2&gt;

&lt;p&gt;A regular transformer has two separate parts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;an &lt;strong&gt;encoder&lt;/strong&gt; to process the input&lt;/li&gt;
&lt;li&gt;a &lt;strong&gt;decoder&lt;/strong&gt; to generate the output&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When encoding the input, it uses &lt;strong&gt;self-attention&lt;/strong&gt;, not masked self-attention.&lt;br&gt;
This allows each word to attend to &lt;strong&gt;all other words in the input&lt;/strong&gt;, not just the previous ones.&lt;/p&gt;

&lt;p&gt;The decoder then uses &lt;strong&gt;encoder–decoder attention&lt;/strong&gt; to stay connected to the input.&lt;/p&gt;

&lt;p&gt;In this mechanism:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;queries&lt;/strong&gt; come from the decoder&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;keys and values&lt;/strong&gt; come from the encoder&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This helps the decoder focus on the most important parts of the input while generating output.&lt;/p&gt;
&lt;h2&gt;
  
  
  What Really Changes Between Them
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Decoder-only transformers use &lt;strong&gt;masked self-attention everywhere&lt;/strong&gt; (for both input and output)&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Standard transformers use:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;self-attention&lt;/strong&gt; in the encoder&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;masked self-attention&lt;/strong&gt; in the decoder&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;encoder–decoder attention&lt;/strong&gt; to connect input and output&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That wraps up decoder-only transformers.&lt;/p&gt;

&lt;p&gt;In the next article, we will explore &lt;strong&gt;encoder-only transformers&lt;/strong&gt;.&lt;/p&gt;



&lt;p&gt;&lt;strong&gt;Looking for an easier way to install tools, libraries, or entire repositories?&lt;/strong&gt;&lt;br&gt;
Try &lt;strong&gt;Installerpedia&lt;/strong&gt;: a &lt;strong&gt;community-driven, structured installation platform&lt;/strong&gt; that lets you install almost anything with &lt;strong&gt;minimal hassle&lt;/strong&gt; and &lt;strong&gt;clear, reliable guidance&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Just run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ipm &lt;span class="nb"&gt;install &lt;/span&gt;repo-name
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;… and you’re done! 🚀&lt;/p&gt;

&lt;p&gt;&lt;a href="https://hexmos.com/ipm" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm2s3mzj8pfcq94a1y4at.png" alt="Installerpedia Screenshot" width="800" height="336"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;🔗 &lt;a href="https://hexmos.com/ipm/" rel="noopener noreferrer"&gt;&lt;strong&gt;Explore Installerpedia here&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Understanding Decoder-Only Transformers Part 1: Masked Self-Attention</title>
      <dc:creator>Rijul Rajesh</dc:creator>
      <pubDate>Tue, 05 May 2026 19:25:51 +0000</pubDate>
      <link>https://forem.com/rijultp/understanding-decoder-only-transformers-part-1-masked-self-attention-mf8</link>
      <guid>https://forem.com/rijultp/understanding-decoder-only-transformers-part-1-masked-self-attention-mf8</guid>
      <description>&lt;h2&gt;
  
  
  Decoder-Only Transformers
&lt;/h2&gt;

&lt;p&gt;In this article, we will explore &lt;strong&gt;decoder-only transformers&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Decoder-only transformers are a specific type of transformer architecture used in systems like ChatGPT.&lt;/p&gt;

&lt;h2&gt;
  
  
  Masked Self-Attention
&lt;/h2&gt;

&lt;p&gt;Decoder-only transformers use a mechanism called &lt;strong&gt;masked self-attention&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Masked self-attention works by measuring how similar each word is to itself and to the words that come &lt;strong&gt;before it&lt;/strong&gt; in the sentence.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;“The pizza came out of the oven and it tasted good.”&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When processing the word &lt;strong&gt;“pizza”&lt;/strong&gt;, masked self-attention only considers the preceding word &lt;strong&gt;“The”&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Key Difference
&lt;/h2&gt;

&lt;p&gt;Unlike standard self-attention, masked self-attention &lt;strong&gt;does not allow a word to look at future words&lt;/strong&gt;. It can only attend to the current word and the words that come before it.&lt;/p&gt;

&lt;p&gt;Because of this, it is also called an &lt;strong&gt;auto-regressive method&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;An auto-regressive method is a way of predicting values step by step, where each prediction depends on the previous outputs.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The model uses its past predictions as input to generate the next output&lt;/li&gt;
&lt;li&gt;It builds the final result one step at a time&lt;/li&gt;
&lt;li&gt;Each step depends on what was generated before it, not what comes after&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;We will explore this concept in more detail in the next article.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Looking for an easier way to install tools, libraries, or entire repositories?&lt;/strong&gt;&lt;br&gt;
Try &lt;strong&gt;Installerpedia&lt;/strong&gt;: a &lt;strong&gt;community-driven, structured installation platform&lt;/strong&gt; that lets you install almost anything with &lt;strong&gt;minimal hassle&lt;/strong&gt; and &lt;strong&gt;clear, reliable guidance&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Just run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ipm &lt;span class="nb"&gt;install &lt;/span&gt;repo-name
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;… and you’re done! 🚀&lt;/p&gt;

&lt;p&gt;&lt;a href="https://hexmos.com/ipm" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm2s3mzj8pfcq94a1y4at.png" alt="Installerpedia Screenshot" width="800" height="336"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;🔗 &lt;a href="https://hexmos.com/ipm/" rel="noopener noreferrer"&gt;&lt;strong&gt;Explore Installerpedia here&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;




</description>
      <category>deeplearning</category>
      <category>llm</category>
      <category>nlp</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Understanding Transformers Part 18: Completing the Decoding Process</title>
      <dc:creator>Rijul Rajesh</dc:creator>
      <pubDate>Mon, 04 May 2026 17:50:45 +0000</pubDate>
      <link>https://forem.com/rijultp/understanding-transformers-part-18-completing-the-decoding-process-p1n</link>
      <guid>https://forem.com/rijultp/understanding-transformers-part-18-completing-the-decoding-process-p1n</guid>
      <description>&lt;p&gt;In the &lt;a href="https://dev.to/rijultp/understanding-transformers-part-17-generating-the-output-word-35ol"&gt;previous article&lt;/a&gt;, we generated the first output word from the transformer.&lt;/p&gt;

&lt;p&gt;So far, the translation is correct. However, the decoder does not stop until it produces an &lt;strong&gt;&lt;code&gt;&amp;lt;EOS&amp;gt;&lt;/code&gt; token&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Feeding the Output Back into the Decoder
&lt;/h2&gt;

&lt;p&gt;Now, we take the translated word &lt;strong&gt;“vamos”&lt;/strong&gt; and feed it back into a copy of the decoder’s embedding layer to continue the process.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1moagua4keypus1tzko7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1moagua4keypus1tzko7.png" alt=" " width="800" height="493"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Just like before, we repeat the same steps:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Get the &lt;strong&gt;word embeddings&lt;/strong&gt; for &lt;em&gt;vamos&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;Add &lt;strong&gt;positional encoding&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Calculate &lt;strong&gt;self-attention values&lt;/strong&gt; using the same weights used for the &lt;code&gt;&amp;lt;EOS&amp;gt;&lt;/code&gt; token&lt;/li&gt;
&lt;li&gt;Add &lt;strong&gt;residual connections&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Compute &lt;strong&gt;encoder–decoder attention&lt;/strong&gt; using the same set of weights&lt;/li&gt;
&lt;li&gt;Add another set of &lt;strong&gt;residual connections&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsis14fpo4qmcpbvkupv2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsis14fpo4qmcpbvkupv2.png" alt=" " width="800" height="491"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Generating the Next Word
&lt;/h2&gt;

&lt;p&gt;Next, we pass the values representing &lt;strong&gt;“vamos”&lt;/strong&gt; through the same &lt;strong&gt;fully connected layer&lt;/strong&gt; and &lt;strong&gt;softmax function&lt;/strong&gt; that we used earlier.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkw4pv2pn2u9oww0jeujc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkw4pv2pn2u9oww0jeujc.png" alt=" " width="614" height="606"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This time, the decoder outputs the &lt;strong&gt;&lt;code&gt;&amp;lt;EOS&amp;gt;&lt;/code&gt; token&lt;/strong&gt;, which signals the end of the sentence.&lt;/p&gt;




&lt;h2&gt;
  
  
  Final Output
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs8stgpq826vrem4jw60e.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs8stgpq826vrem4jw60e.png" alt=" " width="800" height="555"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;At this point, the decoding process is complete.&lt;/p&gt;

&lt;p&gt;We have successfully translated the input phrase using the transformer.&lt;/p&gt;

&lt;p&gt;So, just to recap, the transformer works as follows:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Word embeddings&lt;/strong&gt; convert words into numerical representations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Positional encoding&lt;/strong&gt; keeps track of word order&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Self-attention&lt;/strong&gt; captures relationships within the input and output&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Encoder–decoder attention&lt;/strong&gt; connects input and output, ensuring important information is preserved&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Residual connections&lt;/strong&gt; help different components focus on specific tasks and improve training&lt;/li&gt;
&lt;/ol&gt;




&lt;p&gt;In the next article, we will start exploring &lt;strong&gt;decoder-only transformers&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Looking for an easier way to install tools, libraries, or entire repositories?&lt;/strong&gt;&lt;br&gt;
Try &lt;strong&gt;Installerpedia&lt;/strong&gt;: a &lt;strong&gt;community-driven, structured installation platform&lt;/strong&gt; that lets you install almost anything with &lt;strong&gt;minimal hassle&lt;/strong&gt; and &lt;strong&gt;clear, reliable guidance&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Just run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ipm &lt;span class="nb"&gt;install &lt;/span&gt;repo-name
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;… and you’re done! 🚀&lt;/p&gt;

&lt;p&gt;&lt;a href="https://hexmos.com/ipm" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm2s3mzj8pfcq94a1y4at.png" alt="Installerpedia Screenshot" width="800" height="336"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;🔗 &lt;a href="https://hexmos.com/ipm/" rel="noopener noreferrer"&gt;&lt;strong&gt;Explore Installerpedia here&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Understanding Multi-Head Attention in Transformers</title>
      <dc:creator>Rijul Rajesh</dc:creator>
      <pubDate>Sun, 03 May 2026 20:08:43 +0000</pubDate>
      <link>https://forem.com/rijultp/understanding-multi-head-attention-in-transformers-gj1</link>
      <guid>https://forem.com/rijultp/understanding-multi-head-attention-in-transformers-gj1</guid>
      <description>&lt;p&gt;Self-attention already helps a transformer understand relationships between words using Query, Key, and Value. But there’s a problem.&lt;/p&gt;

&lt;p&gt;One attention mechanism usually ends up focusing on a limited kind of relationship at a time.&lt;/p&gt;

&lt;p&gt;Language doesn’t work like that. A sentence can have structure, meaning, and long-range links all at once.&lt;/p&gt;

&lt;p&gt;That’s why transformers use &lt;strong&gt;multi-head attention&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  What happens in multi-head attention
&lt;/h2&gt;

&lt;p&gt;Instead of doing attention once, the model does it multiple times in parallel.&lt;/p&gt;

&lt;p&gt;Each run is called a head, and each head has its own learned weights for Query, Key, and Value.&lt;/p&gt;

&lt;p&gt;So every head looks at the same sentence, but in its own way.&lt;/p&gt;

&lt;h2&gt;
  
  
  How it flows
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;The input embeddings are first prepared&lt;/li&gt;
&lt;li&gt;They are split into multiple heads using linear projections&lt;/li&gt;
&lt;li&gt;Each head runs its own self-attention&lt;/li&gt;
&lt;li&gt;Each head produces its own output&lt;/li&gt;
&lt;li&gt;All outputs are joined back together&lt;/li&gt;
&lt;li&gt;A final layer mixes them into one result&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why this works better compared to previous approach
&lt;/h2&gt;

&lt;p&gt;Different heads naturally pick up different things:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;word order and grammar&lt;/li&gt;
&lt;li&gt;nearby word relationships&lt;/li&gt;
&lt;li&gt;long-distance links&lt;/li&gt;
&lt;li&gt;meaning-based connections&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So instead of forcing one attention mechanism to do everything, the model spreads the job across multiple perspectives.&lt;/p&gt;

&lt;p&gt;One head is like reading a sentence with one focus.&lt;/p&gt;

&lt;p&gt;Multiple heads is like reading it several times, each time noticing something different, then combining those notes.&lt;/p&gt;

&lt;p&gt;Multi-head attention doesn’t change the idea of self-attention. It just runs it multiple times in parallel so the model can understand language from different angles at once.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Looking for an easier way to install tools, libraries, or entire repositories?&lt;/strong&gt;&lt;br&gt;
Try &lt;strong&gt;Installerpedia&lt;/strong&gt;: a &lt;strong&gt;community-driven, structured installation platform&lt;/strong&gt; that lets you install almost anything with &lt;strong&gt;minimal hassle&lt;/strong&gt; and &lt;strong&gt;clear, reliable guidance&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Just run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ipm &lt;span class="nb"&gt;install &lt;/span&gt;repo-name
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;… and you’re done! 🚀&lt;/p&gt;

&lt;p&gt;&lt;a href="https://hexmos.com/ipm" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm2s3mzj8pfcq94a1y4at.png" alt="Installerpedia Screenshot" width="800" height="336"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;🔗 &lt;a href="https://hexmos.com/ipm/" rel="noopener noreferrer"&gt;&lt;strong&gt;Explore Installerpedia here&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Understanding Transformers Part 17: Generating the Output Word</title>
      <dc:creator>Rijul Rajesh</dc:creator>
      <pubDate>Fri, 01 May 2026 20:53:08 +0000</pubDate>
      <link>https://forem.com/rijultp/understanding-transformers-part-17-generating-the-output-word-35ol</link>
      <guid>https://forem.com/rijultp/understanding-transformers-part-17-generating-the-output-word-35ol</guid>
      <description>&lt;p&gt;In the &lt;a href="https://dev.to/rijultp/understanding-transformers-part-16-preparing-for-output-prediction-with-residual-connections-1n06"&gt;previous article&lt;/a&gt;, we set up the residual connections to get the final output values from the decoder.&lt;/p&gt;

&lt;p&gt;In this article, we begin by passing these two output values through a fully connected layer.&lt;/p&gt;

&lt;p&gt;This layer has:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;One input for each value representing the current token
(in this case, 2 inputs)&lt;/li&gt;
&lt;li&gt;One output for each word in the output vocabulary&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Since our vocabulary has 4 tokens, this gives us 4 output values.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fch2kyaj0ttufavtu4qe2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fch2kyaj0ttufavtu4qe2.png" alt=" " width="680" height="758"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Selecting the Output Word
&lt;/h4&gt;

&lt;p&gt;Next, we pass these 4 output values through a softmax function.&lt;/p&gt;

&lt;p&gt;This allows us to select the most likely output word, which in this case is “vamos”.&lt;/p&gt;

&lt;p&gt;So far, the translation is correct. However, the process does not stop here.&lt;/p&gt;

&lt;h4&gt;
  
  
  Continuing the Decoding Process
&lt;/h4&gt;

&lt;p&gt;The decoder continues generating words until it produces an  token, which indicates the end of the sentence.&lt;/p&gt;

&lt;p&gt;To generate the next word, we feed the predicted word back into the decoder.&lt;/p&gt;

&lt;p&gt;We will explore this step in the next article.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Looking for an easier way to install tools, libraries, or entire repositories?&lt;/strong&gt;&lt;br&gt;
Try &lt;strong&gt;Installerpedia&lt;/strong&gt;: a &lt;strong&gt;community-driven, structured installation platform&lt;/strong&gt; that lets you install almost anything with &lt;strong&gt;minimal hassle&lt;/strong&gt; and &lt;strong&gt;clear, reliable guidance&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Just run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ipm &lt;span class="nb"&gt;install &lt;/span&gt;repo-name
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;… and you’re done! 🚀&lt;/p&gt;

&lt;p&gt;&lt;a href="https://hexmos.com/ipm" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm2s3mzj8pfcq94a1y4at.png" alt="Installerpedia Screenshot" width="800" height="336"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;🔗 &lt;a href="https://hexmos.com/ipm/" rel="noopener noreferrer"&gt;&lt;strong&gt;Explore Installerpedia here&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Understanding Transformers – Part 16: Preparing for Output Prediction with Residual Connections</title>
      <dc:creator>Rijul Rajesh</dc:creator>
      <pubDate>Wed, 29 Apr 2026 21:26:20 +0000</pubDate>
      <link>https://forem.com/rijultp/understanding-transformers-part-16-preparing-for-output-prediction-with-residual-connections-1n06</link>
      <guid>https://forem.com/rijultp/understanding-transformers-part-16-preparing-for-output-prediction-with-residual-connections-1n06</guid>
      <description>&lt;p&gt;In the &lt;a href="https://dev.to/rijultp/understanding-transformers-part-15-scaling-and-combining-values-in-encoder-decoder-attention-4dfm"&gt;previous article&lt;/a&gt;, we handled values in encoder-decoder attention, now we will simplify the diagram a bit add another set of residual connections.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2o38bbt86p8t40cvkss9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2o38bbt86p8t40cvkss9.png" alt=" " width="800" height="540"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This allows the encoder–decoder attention to focus on the relationships between the output words and the input words, without needing to preserve the self-attention and positional encoding from earlier.&lt;/p&gt;

&lt;p&gt;Lastly, we need a way to take these two values that represent the  token in the decoder and select one of the four output tokens: ir, vamos, y, or &lt;code&gt;&amp;lt;EOS&amp;gt;&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi4u4mu2hg41fyec6edqc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi4u4mu2hg41fyec6edqc.png" alt=" " width="800" height="492"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To do this, we pass these two values through a fully connected layer.&lt;/p&gt;

&lt;p&gt;We will explore this further in the next article.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Looking for an easier way to install tools, libraries, or entire repositories?&lt;/strong&gt;&lt;br&gt;
Try &lt;strong&gt;Installerpedia&lt;/strong&gt;: a &lt;strong&gt;community-driven, structured installation platform&lt;/strong&gt; that lets you install almost anything with &lt;strong&gt;minimal hassle&lt;/strong&gt; and &lt;strong&gt;clear, reliable guidance&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Just run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ipm &lt;span class="nb"&gt;install &lt;/span&gt;repo-name
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;… and you’re done! 🚀&lt;/p&gt;

&lt;p&gt;&lt;a href="https://hexmos.com/ipm" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm2s3mzj8pfcq94a1y4at.png" alt="Installerpedia Screenshot" width="800" height="336"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;🔗 &lt;a href="https://hexmos.com/ipm/" rel="noopener noreferrer"&gt;&lt;strong&gt;Explore Installerpedia here&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
    </item>
  </channel>
</rss>
