<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Rijul Rajesh</title>
    <description>The latest articles on Forem by Rijul Rajesh (@rijultp).</description>
    <link>https://forem.com/rijultp</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1207862%2F2d1456e5-ef74-42a1-ac31-d0e6d6bc547f.webp</url>
      <title>Forem: Rijul Rajesh</title>
      <link>https://forem.com/rijultp</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/rijultp"/>
    <language>en</language>
    <item>
      <title>Understanding Transformers Part 12: Building the Decoder Layers</title>
      <dc:creator>Rijul Rajesh</dc:creator>
      <pubDate>Thu, 23 Apr 2026 19:23:30 +0000</pubDate>
      <link>https://forem.com/rijultp/understanding-transformers-part-12-building-the-decoder-layers-36j</link>
      <guid>https://forem.com/rijultp/understanding-transformers-part-12-building-the-decoder-layers-36j</guid>
      <description>&lt;p&gt;In the &lt;a href="https://dev.to/rijultp/understanding-transformers-part-11-how-decoding-begins-4dal"&gt;previous article&lt;/a&gt;, we just began with the concept of decoders in a transformer.&lt;/p&gt;

&lt;p&gt;Now we will start adding the positional encoding.&lt;/p&gt;

&lt;h2&gt;
  
  
  Adding Positional Encoding in the Decoder
&lt;/h2&gt;

&lt;p&gt;Now, for the decoder, let’s add &lt;strong&gt;positional encoding&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Just like before, we use the same sine and cosine curves to get positional values based on the embedding positions.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvz99r6qdaguisfybdjun.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvz99r6qdaguisfybdjun.png" alt=" " width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;These are the &lt;strong&gt;same curves&lt;/strong&gt; that were used earlier when encoding the input.&lt;/p&gt;

&lt;h2&gt;
  
  
  Applying Positional Values
&lt;/h2&gt;

&lt;p&gt;Since the &lt;code&gt;&amp;lt;EOS&amp;gt;&lt;/code&gt; token is in the &lt;strong&gt;first position&lt;/strong&gt; and has &lt;strong&gt;two embedding values&lt;/strong&gt;, we take the corresponding positional values from the curves.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;For the &lt;strong&gt;first embedding&lt;/strong&gt;, the value is &lt;strong&gt;0&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;For the &lt;strong&gt;second embedding&lt;/strong&gt;, the value is &lt;strong&gt;1&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Now, we add these positional values to the embedding:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F898xly58c48hkjtmjgka.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F898xly58c48hkjtmjgka.png" alt=" " width="643" height="743"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As a result, we get &lt;strong&gt;2.70 and -0.34&lt;/strong&gt;, which represent the &lt;code&gt;&amp;lt;EOS&amp;gt;&lt;/code&gt; token after adding positional encoding.&lt;/p&gt;

&lt;h2&gt;
  
  
  Adding Self-Attention
&lt;/h2&gt;

&lt;p&gt;Next, we add the &lt;strong&gt;self-attention layer&lt;/strong&gt; so the decoder can keep track of relationships between output words.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F61el0t66d4wu1puzmxfk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F61el0t66d4wu1puzmxfk.png" alt=" " width="577" height="824"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The self-attention values for the &lt;code&gt;&amp;lt;EOS&amp;gt;&lt;/code&gt; token are &lt;strong&gt;-2.8 and -2.3&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Note that the &lt;strong&gt;weights used in the decoder’s self-attention&lt;/strong&gt; (for queries, keys, and values) are &lt;strong&gt;different from those used in the encoder&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Adding Residual Connections
&lt;/h2&gt;

&lt;p&gt;Now, we add &lt;strong&gt;residual connections&lt;/strong&gt;, just like we did in the encoder.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flyuupdtfpl297azfp3h0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flyuupdtfpl297azfp3h0.png" alt=" " width="549" height="765"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What’s Next?
&lt;/h2&gt;

&lt;p&gt;So far, we have seen how self-attention helps the transformer understand relationships &lt;strong&gt;within the output sentence&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;However, for tasks like translation, the model also needs to understand relationships &lt;strong&gt;between the input sentence and the output sentence&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;We will explore this in the next article.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Looking for an easier way to install tools, libraries, or entire repositories?&lt;/strong&gt;&lt;br&gt;
Try &lt;strong&gt;Installerpedia&lt;/strong&gt;: a &lt;strong&gt;community-driven, structured installation platform&lt;/strong&gt; that lets you install almost anything with &lt;strong&gt;minimal hassle&lt;/strong&gt; and &lt;strong&gt;clear, reliable guidance&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Just run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ipm &lt;span class="nb"&gt;install &lt;/span&gt;repo-name
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;… and you’re done! 🚀&lt;/p&gt;

&lt;p&gt;&lt;a href="https://hexmos.com/ipm" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm2s3mzj8pfcq94a1y4at.png" alt="Installerpedia Screenshot" width="800" height="336"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;🔗 &lt;a href="https://hexmos.com/ipm/" rel="noopener noreferrer"&gt;&lt;strong&gt;Explore Installerpedia here&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Understanding Transformers Part 11: How Decoding Begins</title>
      <dc:creator>Rijul Rajesh</dc:creator>
      <pubDate>Wed, 22 Apr 2026 19:31:56 +0000</pubDate>
      <link>https://forem.com/rijultp/understanding-transformers-part-11-how-decoding-begins-4dal</link>
      <guid>https://forem.com/rijultp/understanding-transformers-part-11-how-decoding-begins-4dal</guid>
      <description>&lt;p&gt;In the &lt;a href="https://dev.to/rijultp/understanding-transformers-part-10-final-step-in-encoding-4f55"&gt;previous article&lt;/a&gt; we wrapped up the encoder part,  In this article, we will start building the second part of the transformer: the &lt;strong&gt;decoder&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Just like the encoder, the decoder also begins with &lt;strong&gt;word embeddings&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;However, this time the embeddings are created for the &lt;strong&gt;output vocabulary&lt;/strong&gt;, which consists of Spanish words such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;em&gt;ir&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;vamos&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;y&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;&amp;lt;EOS&amp;gt;&lt;/code&gt; token&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foteuuzkbucm5wvcgblq1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foteuuzkbucm5wvcgblq1.png" alt=" " width="800" height="662"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Starting the Decoding Process
&lt;/h2&gt;

&lt;p&gt;To begin decoding, we use the &lt;strong&gt;&lt;code&gt;&amp;lt;EOS&amp;gt;&lt;/code&gt; token&lt;/strong&gt; as the input.&lt;/p&gt;

&lt;p&gt;This is a common way to initialize the decoding process for an encoded sentence.&lt;/p&gt;

&lt;p&gt;In some cases, people use a &lt;strong&gt;&lt;code&gt;&amp;lt;SOS&amp;gt;&lt;/code&gt; (Start of Sentence)&lt;/strong&gt; token instead.&lt;/p&gt;




&lt;h2&gt;
  
  
  Creating the Initial Input
&lt;/h2&gt;

&lt;p&gt;We represent the &lt;code&gt;&amp;lt;EOS&amp;gt;&lt;/code&gt; token as a vector by assigning:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;1&lt;/strong&gt; to &lt;code&gt;&amp;lt;EOS&amp;gt;&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;0&lt;/strong&gt; to all other words in the vocabulary&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbr0cpk2e2ogeo78ini0d.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbr0cpk2e2ogeo78ini0d.png" alt=" " width="800" height="592"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;From this we can see that 2.70 and -1.34 are the numbers that represent the value for the EOS token.&lt;/p&gt;

&lt;p&gt;Now that we have the initial input for the decoder, the next step is to &lt;strong&gt;add positional encoding&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;We will explore this in the next article.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Looking for an easier way to install tools, libraries, or entire repositories?&lt;/strong&gt;&lt;br&gt;
Try &lt;strong&gt;Installerpedia&lt;/strong&gt;: a &lt;strong&gt;community-driven, structured installation platform&lt;/strong&gt; that lets you install almost anything with &lt;strong&gt;minimal hassle&lt;/strong&gt; and &lt;strong&gt;clear, reliable guidance&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Just run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ipm &lt;span class="nb"&gt;install &lt;/span&gt;repo-name
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;… and you’re done! 🚀&lt;/p&gt;

&lt;p&gt;&lt;a href="https://hexmos.com/ipm" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm2s3mzj8pfcq94a1y4at.png" alt="Installerpedia Screenshot" width="800" height="336"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;🔗 &lt;a href="https://hexmos.com/ipm/" rel="noopener noreferrer"&gt;&lt;strong&gt;Explore Installerpedia here&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Understanding Transformers Part 10: Final Step in Encoding</title>
      <dc:creator>Rijul Rajesh</dc:creator>
      <pubDate>Tue, 21 Apr 2026 19:36:28 +0000</pubDate>
      <link>https://forem.com/rijultp/understanding-transformers-part-10-final-step-in-encoding-4f55</link>
      <guid>https://forem.com/rijultp/understanding-transformers-part-10-final-step-in-encoding-4f55</guid>
      <description>&lt;p&gt;In the &lt;a href="https://dev.to/rijultp/understanding-transformers-part-9-stacking-self-attention-layers-3gg3"&gt;previous article&lt;/a&gt;, we explored the use of self-attention layers, now we will dive into the final step of encoding and start moving into decoders&lt;/p&gt;

&lt;p&gt;As the final step, we take the &lt;strong&gt;positional encoded values&lt;/strong&gt; and add them to the &lt;strong&gt;self-attention values&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;These connections are called &lt;strong&gt;residual connections&lt;/strong&gt;. They make it easier to train complex neural networks by allowing the self-attention layer to focus on learning relationships between words, without needing to preserve the original word embedding and positional information.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F88f0hdawj6e44254gmb2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F88f0hdawj6e44254gmb2.png" alt=" " width="800" height="562"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;At this point, we have everything needed to encode the input for this simple transformer.&lt;/p&gt;

&lt;p&gt;These four components work together to convert words into meaningful numerical representations:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Word embedding&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Positional encoding&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Self-attention&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Residual connections&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Now that we have encoded the English input phrase &lt;strong&gt;“Let’s go”&lt;/strong&gt;, the next step is to &lt;strong&gt;decode it into Spanish&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;To do this, we need to build a &lt;strong&gt;decoder&lt;/strong&gt;, which we will explore in the next article.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Looking for an easier way to install tools, libraries, or entire repositories?&lt;/strong&gt;&lt;br&gt;
Try &lt;strong&gt;Installerpedia&lt;/strong&gt;: a &lt;strong&gt;community-driven, structured installation platform&lt;/strong&gt; that lets you install almost anything with &lt;strong&gt;minimal hassle&lt;/strong&gt; and &lt;strong&gt;clear, reliable guidance&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Just run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ipm &lt;span class="nb"&gt;install &lt;/span&gt;repo-name
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;… and you’re done! 🚀&lt;/p&gt;

&lt;p&gt;&lt;a href="https://hexmos.com/ipm" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm2s3mzj8pfcq94a1y4at.png" alt="Installerpedia Screenshot" width="800" height="336"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;🔗 &lt;a href="https://hexmos.com/ipm/" rel="noopener noreferrer"&gt;&lt;strong&gt;Explore Installerpedia here&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Understanding Transformers Part 9: Stacking Self-Attention Layers</title>
      <dc:creator>Rijul Rajesh</dc:creator>
      <pubDate>Fri, 17 Apr 2026 20:50:25 +0000</pubDate>
      <link>https://forem.com/rijultp/understanding-transformers-part-9-stacking-self-attention-layers-3gg3</link>
      <guid>https://forem.com/rijultp/understanding-transformers-part-9-stacking-self-attention-layers-3gg3</guid>
      <description>&lt;p&gt;In the &lt;a href="https://dev.to/rijultp/understanding-transformers-part-8-shared-weights-in-self-attention-2pbe"&gt;previous article&lt;/a&gt;, we explored how the weights are shared in self-attention.&lt;/p&gt;

&lt;p&gt;Now we will see why we have these self-attention values instead of the initial positional encoding values.&lt;/p&gt;

&lt;h2&gt;
  
  
  Using Self-Attention Values
&lt;/h2&gt;

&lt;p&gt;We now use the &lt;strong&gt;self-attention values&lt;/strong&gt; instead of the original positional encoded values.&lt;/p&gt;

&lt;p&gt;This is because the self-attention values for each word include information from all the other words in the sentence. This helps give each word &lt;strong&gt;context&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;It also helps establish how each word in the input is related to the others.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgrjmoknigxi9rs0n743q.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgrjmoknigxi9rs0n743q.png" alt=" " width="800" height="553"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If we think of this unit, along with its weights for calculating queries, keys, and values, as a &lt;strong&gt;self-attention cell&lt;/strong&gt;, then we can extend this idea further.&lt;/p&gt;

&lt;p&gt;To correctly capture relationships in more complex sentences and paragraphs, we can &lt;strong&gt;stack multiple self-attention cells&lt;/strong&gt;, each with its own set of weights. These layers are applied to the position-encoded values of each word, allowing the model to learn different types of relationships.&lt;/p&gt;

&lt;p&gt;Going back to our example, there is one more step required to fully encode the input. We will explore that in the next article.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Looking for an easier way to install tools, libraries, or entire repositories?&lt;/strong&gt;&lt;br&gt;
Try &lt;strong&gt;Installerpedia&lt;/strong&gt;: a &lt;strong&gt;community-driven, structured installation platform&lt;/strong&gt; that lets you install almost anything with &lt;strong&gt;minimal hassle&lt;/strong&gt; and &lt;strong&gt;clear, reliable guidance&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Just run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ipm &lt;span class="nb"&gt;install &lt;/span&gt;repo-name
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;… and you’re done! 🚀&lt;/p&gt;

&lt;p&gt;&lt;a href="https://hexmos.com/ipm" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm2s3mzj8pfcq94a1y4at.png" alt="Installerpedia Screenshot" width="800" height="336"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;🔗 &lt;a href="https://hexmos.com/ipm/" rel="noopener noreferrer"&gt;&lt;strong&gt;Explore Installerpedia here&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Understanding Transformers Part 8: Shared Weights in Self-Attention</title>
      <dc:creator>Rijul Rajesh</dc:creator>
      <pubDate>Thu, 16 Apr 2026 21:08:46 +0000</pubDate>
      <link>https://forem.com/rijultp/understanding-transformers-part-8-shared-weights-in-self-attention-2pbe</link>
      <guid>https://forem.com/rijultp/understanding-transformers-part-8-shared-weights-in-self-attention-2pbe</guid>
      <description>&lt;p&gt;In the &lt;a href="https://dev.to/rijultp/understanding-transformers-part-7-from-similarity-scores-to-self-attention-3noo"&gt;previous article&lt;/a&gt;, we started calculating the self-attention values.&lt;/p&gt;

&lt;p&gt;Let’s now calculate the self-attention values for the word &lt;strong&gt;“go”&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;We do not need to recalculate the &lt;strong&gt;keys&lt;/strong&gt; and &lt;strong&gt;values&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Instead, we only need to create the &lt;strong&gt;query&lt;/strong&gt; that represents the word &lt;strong&gt;“go”&lt;/strong&gt;, and then perform the same calculations as before.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg9f7405sfueefr9nix6p.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg9f7405sfueefr9nix6p.png" alt=" " width="800" height="558"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After completing the calculations, we get the self-attention values for &lt;strong&gt;“go”&lt;/strong&gt; as:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2.5 and -2.1&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Observations About Self-Attention
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;The &lt;strong&gt;weights used to calculate queries&lt;/strong&gt; are the same for both &lt;strong&gt;“Let’s”&lt;/strong&gt; and &lt;strong&gt;“go”&lt;/strong&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;This means that regardless of the number of words, we use &lt;strong&gt;one shared set of weights&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Similarly, the same sets of weights are reused to calculate &lt;strong&gt;keys&lt;/strong&gt; and &lt;strong&gt;values&lt;/strong&gt; for every input word.&lt;/li&gt;
&lt;li&gt;No matter how many words are given as input, the transformer reuses the same weights for queries, keys, and values.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;We do not need to compute queries, keys, and values &lt;strong&gt;sequentially&lt;/strong&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;All of them can be computed &lt;strong&gt;at the same time&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;This allows transformers to take advantage of &lt;strong&gt;parallel computation&lt;/strong&gt;, making them very efficient.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;We will continue building our transformer step by step in the next article.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Looking for an easier way to install tools, libraries, or entire repositories?&lt;/strong&gt;&lt;br&gt;
Try &lt;strong&gt;Installerpedia&lt;/strong&gt;: a &lt;strong&gt;community-driven, structured installation platform&lt;/strong&gt; that lets you install almost anything with &lt;strong&gt;minimal hassle&lt;/strong&gt; and &lt;strong&gt;clear, reliable guidance&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Just run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ipm &lt;span class="nb"&gt;install &lt;/span&gt;repo-name
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;… and you’re done! 🚀&lt;/p&gt;

&lt;p&gt;&lt;a href="https://hexmos.com/ipm" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm2s3mzj8pfcq94a1y4at.png" alt="Installerpedia Screenshot" width="800" height="336"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;🔗 &lt;a href="https://hexmos.com/ipm/" rel="noopener noreferrer"&gt;&lt;strong&gt;Explore Installerpedia here&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Understanding Transformers Part 7: From Similarity Scores to Self-Attention</title>
      <dc:creator>Rijul Rajesh</dc:creator>
      <pubDate>Wed, 15 Apr 2026 02:24:37 +0000</pubDate>
      <link>https://forem.com/rijultp/understanding-transformers-part-7-from-similarity-scores-to-self-attention-3noo</link>
      <guid>https://forem.com/rijultp/understanding-transformers-part-7-from-similarity-scores-to-self-attention-3noo</guid>
      <description>&lt;p&gt;In the &lt;a href="https://dev.to/rijultp/understanding-transformers-part-6-calculating-similarity-between-queries-and-keys-25o7"&gt;previous article&lt;/a&gt;, we calculated the similarities between Queries and Keys.&lt;/p&gt;

&lt;p&gt;We can use the output of the &lt;strong&gt;softmax function&lt;/strong&gt; to determine how much each input word should contribute when encoding the word &lt;strong&gt;“Let’s”&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2a14whe27xi9e8q4scau.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2a14whe27xi9e8q4scau.png" alt=" " width="800" height="401"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Interpreting the Weights
&lt;/h3&gt;

&lt;p&gt;In this case, &lt;strong&gt;“Let’s”&lt;/strong&gt; is much more similar to itself than to &lt;strong&gt;“go”&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;So after applying softmax:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;“Let’s” gets a weight close to 1 (100%)&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;“go” gets a weight close to 0 (0%)&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;“Let’s” contributes almost entirely to its own encoding&lt;/li&gt;
&lt;li&gt;“go” contributes very little&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Creating Value Representations
&lt;/h2&gt;

&lt;p&gt;To apply these weights, we create another set of values for each word.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;First, we create &lt;strong&gt;two values to represent “Let’s”&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh8p1v8o7r4foc4bcigdq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh8p1v8o7r4foc4bcigdq.png" alt=" " width="800" height="403"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Then, we &lt;strong&gt;scale these values by 1&lt;/strong&gt; (since its weight is 100%)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Next, we create &lt;strong&gt;two values to represent “go”&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpb1ezeuhgw7y8c7l6uhu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpb1ezeuhgw7y8c7l6uhu.png" alt=" " width="800" height="433"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;These values are &lt;strong&gt;scaled by 0&lt;/strong&gt; (since its weight is 0%)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Combining the Values
&lt;/h2&gt;

&lt;p&gt;Finally, we add the scaled values together:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F81a3fi42bbj5ny5lnver.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F81a3fi42bbj5ny5lnver.png" alt=" " width="800" height="413"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The result is a new set of values that represent the word &lt;strong&gt;“Let’s”&lt;/strong&gt;, now enriched by its relationship with all input words.&lt;/p&gt;

&lt;p&gt;These final values are called the &lt;strong&gt;self-attention values&lt;/strong&gt; for &lt;strong&gt;“Let’s”&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;They combine information from all words in the sentence, weighted by how relevant each word is to &lt;strong&gt;“Let’s”&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;We can now repeat the same process for the word &lt;strong&gt;“go”&lt;/strong&gt;, which we will explore in the next article.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Looking for an easier way to install tools, libraries, or entire repositories?&lt;/strong&gt;&lt;br&gt;
Try &lt;strong&gt;Installerpedia&lt;/strong&gt;: a &lt;strong&gt;community-driven, structured installation platform&lt;/strong&gt; that lets you install almost anything with &lt;strong&gt;minimal hassle&lt;/strong&gt; and &lt;strong&gt;clear, reliable guidance&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Just run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ipm &lt;span class="nb"&gt;install &lt;/span&gt;repo-name
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;… and you’re done! 🚀&lt;/p&gt;

&lt;p&gt;&lt;a href="https://hexmos.com/ipm" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm2s3mzj8pfcq94a1y4at.png" alt="Installerpedia Screenshot" width="800" height="336"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;🔗 &lt;a href="https://hexmos.com/ipm/" rel="noopener noreferrer"&gt;&lt;strong&gt;Explore Installerpedia here&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Understanding Transformers Part 6: Calculating Similarity Between Queries and Keys</title>
      <dc:creator>Rijul Rajesh</dc:creator>
      <pubDate>Mon, 13 Apr 2026 20:02:45 +0000</pubDate>
      <link>https://forem.com/rijultp/understanding-transformers-part-6-calculating-similarity-between-queries-and-keys-25o7</link>
      <guid>https://forem.com/rijultp/understanding-transformers-part-6-calculating-similarity-between-queries-and-keys-25o7</guid>
      <description>&lt;p&gt;In the &lt;a href="https://dev.to/rijultp/understanding-transformers-part-5-queries-keys-and-similarity-3o7k"&gt;previous article&lt;/a&gt;, we explored the concepts of Queries and Keys. Now we will see how to calculate the similiarities&lt;/p&gt;

&lt;h2&gt;
  
  
  Calculating Similarity Using Dot Product
&lt;/h2&gt;

&lt;p&gt;One way to calculate the similarity between a query and the keys is by using the &lt;strong&gt;dot product&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Query vs Key for “Let’s”
&lt;/h3&gt;

&lt;p&gt;Let’s first compute the dot product between the &lt;strong&gt;query&lt;/strong&gt; and &lt;strong&gt;key&lt;/strong&gt; for the word &lt;strong&gt;“Let’s”&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;We multiply each pair of values and then add the results. This gives us a similarity score of &lt;strong&gt;11.7&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fro89laupezhobbzcdj92.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fro89laupezhobbzcdj92.png" alt=" " width="800" height="410"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Query for “Let’s” vs Key for “go”
&lt;/h3&gt;

&lt;p&gt;Now, let’s compute the dot product between the &lt;strong&gt;query for “Let’s”&lt;/strong&gt; and the &lt;strong&gt;key for “go”&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd0ai0537c51twun02fsz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd0ai0537c51twun02fsz.png" alt=" " width="783" height="323"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This gives us a similarity score of &lt;strong&gt;-2.6&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Understanding the Result
&lt;/h3&gt;

&lt;p&gt;The similarity score for &lt;strong&gt;“Let’s” with itself&lt;/strong&gt; (11.7) is much higher than its similarity with &lt;strong&gt;“go”&lt;/strong&gt; (-2.6).&lt;/p&gt;

&lt;p&gt;This tells us that &lt;strong&gt;“Let’s” is much more similar to itself than it is to “go”&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;As a result, when encoding the word &lt;strong&gt;“Let’s”&lt;/strong&gt;, it should be influenced more by itself and less by &lt;strong&gt;“go”&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  What’s Next?
&lt;/h3&gt;

&lt;p&gt;To turn these similarity scores into meaningful weights, we pass them through a &lt;strong&gt;softmax function&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;We will explore how softmax works in the next article.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Looking for an easier way to install tools, libraries, or entire repositories?&lt;/strong&gt;&lt;br&gt;
Try &lt;strong&gt;Installerpedia&lt;/strong&gt;: a &lt;strong&gt;community-driven, structured installation platform&lt;/strong&gt; that lets you install almost anything with &lt;strong&gt;minimal hassle&lt;/strong&gt; and &lt;strong&gt;clear, reliable guidance&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Just run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ipm &lt;span class="nb"&gt;install &lt;/span&gt;repo-name
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;… and you’re done! 🚀&lt;/p&gt;

&lt;p&gt;&lt;a href="https://hexmos.com/ipm" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm2s3mzj8pfcq94a1y4at.png" alt="Installerpedia Screenshot" width="800" height="336"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;🔗 &lt;a href="https://hexmos.com/ipm/" rel="noopener noreferrer"&gt;&lt;strong&gt;Explore Installerpedia here&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Understanding Transformers Part 5: Queries, Keys, and Similarity</title>
      <dc:creator>Rijul Rajesh</dc:creator>
      <pubDate>Sat, 11 Apr 2026 19:26:53 +0000</pubDate>
      <link>https://forem.com/rijultp/understanding-transformers-part-5-queries-keys-and-similarity-3o7k</link>
      <guid>https://forem.com/rijultp/understanding-transformers-part-5-queries-keys-and-similarity-3o7k</guid>
      <description>&lt;p&gt;In the &lt;a href="https://dev.to/rijultp/understanding-transformers-part-4-introduction-to-self-attention-45bg"&gt;previous article&lt;/a&gt;, we explored the self-attention concept for transformers, in this article we will go deeper into how the comparisons are performed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Building Query and Key Values
&lt;/h2&gt;

&lt;p&gt;Let’s go back to our example.&lt;/p&gt;

&lt;p&gt;We have already added positional encoding to the words &lt;strong&gt;“Let’s”&lt;/strong&gt; and &lt;strong&gt;“go”&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Creating Query Values
&lt;/h3&gt;

&lt;p&gt;The first step is to multiply the position-encoded values for the word &lt;strong&gt;“Let’s”&lt;/strong&gt; by a set of weights.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz6vfpw160kp4vkcmnaov.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz6vfpw160kp4vkcmnaov.png" alt=" " width="800" height="722"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Next, we repeat the same process using a &lt;strong&gt;different set of weights&lt;/strong&gt;, which gives us another value (for example, 3.7).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk820jaw399uoa7nm3fjv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk820jaw399uoa7nm3fjv.png" alt=" " width="800" height="696"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We do this twice because we started with two position-encoded values representing the word &lt;strong&gt;“Let’s”&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;These resulting values together represent &lt;strong&gt;“Let’s”&lt;/strong&gt; in a new form.&lt;/p&gt;

&lt;p&gt;In transformer terminology, these are called &lt;strong&gt;query values&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Creating Key Values
&lt;/h3&gt;

&lt;p&gt;Now, we use these query values to measure similarity with other words, such as &lt;strong&gt;“go”&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;To do this, we first create a new set of values for each word, similar to how we created the query values.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;We generate &lt;strong&gt;two values for “Let’s”&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftgytaf80guh4eh0c32wn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftgytaf80guh4eh0c32wn.png" alt=" " width="800" height="608"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;And &lt;strong&gt;two values for “go”&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp2cojqdqay6gtcdy6dx6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp2cojqdqay6gtcdy6dx6.png" alt=" " width="800" height="611"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;These new values are called &lt;strong&gt;key values&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  What’s Next?
&lt;/h3&gt;

&lt;p&gt;We will use these key values along with the query values to calculate how similar &lt;strong&gt;“Let’s”&lt;/strong&gt; is to &lt;strong&gt;“go”&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;We will explore how this similarity is calculated in the next article.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkb8nzk8qawsqs2i2imd3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkb8nzk8qawsqs2i2imd3.png" alt=" " width="800" height="543"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Looking for an easier way to install tools, libraries, or entire repositories?&lt;/strong&gt;&lt;br&gt;
Try &lt;strong&gt;Installerpedia&lt;/strong&gt;: a &lt;strong&gt;community-driven, structured installation platform&lt;/strong&gt; that lets you install almost anything with &lt;strong&gt;minimal hassle&lt;/strong&gt; and &lt;strong&gt;clear, reliable guidance&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Just run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ipm &lt;span class="nb"&gt;install &lt;/span&gt;repo-name
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;… and you’re done! 🚀&lt;/p&gt;

&lt;p&gt;&lt;a href="https://hexmos.com/ipm" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm2s3mzj8pfcq94a1y4at.png" alt="Installerpedia Screenshot" width="800" height="336"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;🔗 &lt;a href="https://hexmos.com/ipm/" rel="noopener noreferrer"&gt;&lt;strong&gt;Explore Installerpedia here&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Understanding Transformers Part 4: Introduction to Self-Attention</title>
      <dc:creator>Rijul Rajesh</dc:creator>
      <pubDate>Thu, 09 Apr 2026 18:56:06 +0000</pubDate>
      <link>https://forem.com/rijultp/understanding-transformers-part-4-introduction-to-self-attention-45bg</link>
      <guid>https://forem.com/rijultp/understanding-transformers-part-4-introduction-to-self-attention-45bg</guid>
      <description>&lt;p&gt;In the &lt;a href="https://dev.to/rijultp/understanding-transformers-part-3-how-transformers-combine-meaning-and-position-4gan"&gt;previous article&lt;/a&gt;, we learned how word embeddings and positional encoding are combined to represent both meaning and position.&lt;/p&gt;

&lt;p&gt;Now let’s go back to our example where we translate the English sentence &lt;strong&gt;“Let’s go”&lt;/strong&gt;, and add positional values to the word embeddings.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsi4pw8f1mnd3vyqsg1zk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsi4pw8f1mnd3vyqsg1zk.png" alt=" " width="800" height="443"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now, let’s get the positional encoding for both words.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzrowa79gjfc1p8o30rtg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzrowa79gjfc1p8o30rtg.png" alt=" " width="800" height="508"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Understanding Relationships Between Words
&lt;/h2&gt;

&lt;p&gt;Now let’s explore how a transformer keeps track of relationships between words.&lt;/p&gt;

&lt;p&gt;Consider the sentence:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;“The pizza came out of the oven and it tasted good.”&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The word &lt;strong&gt;“it”&lt;/strong&gt; could refer to &lt;em&gt;pizza&lt;/em&gt;, or it could potentially refer to &lt;em&gt;oven&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;It is important that the transformer correctly associates &lt;strong&gt;“it”&lt;/strong&gt; with &lt;strong&gt;“pizza”&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Self-Attention
&lt;/h2&gt;

&lt;p&gt;Transformers use a mechanism called &lt;strong&gt;self-attention&lt;/strong&gt; to handle this.&lt;/p&gt;

&lt;p&gt;Self-attention helps the model determine how each word relates to every other word in the sentence, including itself.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9gb5w28622441ggfnc4p.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9gb5w28622441ggfnc4p.png" alt=" " width="800" height="248"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Once these relationships are calculated, they are used to determine how each word is represented.&lt;/p&gt;

&lt;p&gt;For example, if &lt;strong&gt;“it”&lt;/strong&gt; is more strongly associated with &lt;strong&gt;“pizza”&lt;/strong&gt;, then the similarity score for &lt;em&gt;pizza&lt;/em&gt; will have a larger impact on how &lt;strong&gt;“it”&lt;/strong&gt; is encoded by the transformer.&lt;/p&gt;

&lt;p&gt;We have now covered the basic idea of self-attention. We will explore it in more detail in the next article.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Looking for an easier way to install tools, libraries, or entire repositories?&lt;/strong&gt;&lt;br&gt;
Try &lt;strong&gt;Installerpedia&lt;/strong&gt;: a &lt;strong&gt;community-driven, structured installation platform&lt;/strong&gt; that lets you install almost anything with &lt;strong&gt;minimal hassle&lt;/strong&gt; and &lt;strong&gt;clear, reliable guidance&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Just run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ipm &lt;span class="nb"&gt;install &lt;/span&gt;repo-name
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;… and you’re done! 🚀&lt;/p&gt;

&lt;p&gt;&lt;a href="https://hexmos.com/ipm" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm2s3mzj8pfcq94a1y4at.png" alt="Installerpedia Screenshot" width="800" height="336"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;🔗 &lt;a href="https://hexmos.com/ipm/" rel="noopener noreferrer"&gt;&lt;strong&gt;Explore Installerpedia here&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Understanding Transformers Part 3: How Transformers Combine Meaning and Position</title>
      <dc:creator>Rijul Rajesh</dc:creator>
      <pubDate>Wed, 08 Apr 2026 21:18:57 +0000</pubDate>
      <link>https://forem.com/rijultp/understanding-transformers-part-3-how-transformers-combine-meaning-and-position-4gan</link>
      <guid>https://forem.com/rijultp/understanding-transformers-part-3-how-transformers-combine-meaning-and-position-4gan</guid>
      <description>&lt;p&gt;In the &lt;a href="https://dev.to/rijultp/understanding-transformers-part-2-positional-encoding-with-sine-and-cosine-6lh"&gt;previous article&lt;/a&gt;, we learned how positional encoding is generated using sine and cosine waves. Now we will apply those values to each word in the sentence.&lt;/p&gt;

&lt;h2&gt;
  
  
  Applying Positional Encoding to All Words
&lt;/h2&gt;

&lt;p&gt;To get the positional values for the second word, we take the y-axis values from each curve at the x-axis position corresponding to the second word.&lt;/p&gt;

&lt;p&gt;To get the positional values for the third word, we follow the same process.&lt;/p&gt;

&lt;h3&gt;
  
  
  Positional Values for Each Word
&lt;/h3&gt;

&lt;p&gt;By doing this for every word, we get a set of positional values for each one:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F17m0wgx21ghe89p23i46.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F17m0wgx21ghe89p23i46.png" alt=" " width="800" height="309"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Each word now has its own unique sequence of positional values.&lt;/p&gt;

&lt;h2&gt;
  
  
  Combining Embeddings with Positional Encoding
&lt;/h2&gt;

&lt;p&gt;The next step is to add these positional values to the word embeddings.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnazjeln095ieo4gcsazw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnazjeln095ieo4gcsazw.png" alt=" " width="800" height="340"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After this addition, each word embedding now contains both:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;semantic meaning (from embeddings)&lt;/li&gt;
&lt;li&gt;positional information (from positional encoding)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So for the sentence:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"Jack eats burger"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;we now have embeddings that also capture word order.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Happens When We Change Word Order?
&lt;/h2&gt;

&lt;p&gt;Let us reverse the sentence:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"Burger eats Jack"&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The embeddings for the first and third words get swapped.&lt;/li&gt;
&lt;li&gt;However, the positional values for positions 1, 2, and 3 remain the same.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When we add the positional values to the embeddings again:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The final vectors for the first and third words become different from before.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is how positional encoding helps transformers understand word order.&lt;/p&gt;

&lt;p&gt;Even if the same words are used, changing their positions results in different final representations.&lt;/p&gt;

&lt;p&gt;We will explore further in the next article.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Looking for an easier way to install tools, libraries, or entire repositories?&lt;/strong&gt;&lt;br&gt;
Try &lt;strong&gt;Installerpedia&lt;/strong&gt;: a &lt;strong&gt;community-driven, structured installation platform&lt;/strong&gt; that lets you install almost anything with &lt;strong&gt;minimal hassle&lt;/strong&gt; and &lt;strong&gt;clear, reliable guidance&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Just run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ipm &lt;span class="nb"&gt;install &lt;/span&gt;repo-name
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;… and you’re done! 🚀&lt;/p&gt;

&lt;p&gt;&lt;a href="https://hexmos.com/ipm" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm2s3mzj8pfcq94a1y4at.png" alt="Installerpedia Screenshot" width="800" height="336"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;🔗 &lt;a href="https://hexmos.com/ipm/" rel="noopener noreferrer"&gt;&lt;strong&gt;Explore Installerpedia here&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Understanding Transformers Part 2: Positional Encoding with Sine and Cosine</title>
      <dc:creator>Rijul Rajesh</dc:creator>
      <pubDate>Mon, 06 Apr 2026 20:51:19 +0000</pubDate>
      <link>https://forem.com/rijultp/understanding-transformers-part-2-positional-encoding-with-sine-and-cosine-6lh</link>
      <guid>https://forem.com/rijultp/understanding-transformers-part-2-positional-encoding-with-sine-and-cosine-6lh</guid>
      <description>&lt;p&gt;In the &lt;a href="https://dev.to/rijultp/understanding-transformers-part-1-how-transformers-understand-word-order-eea"&gt;previous article&lt;/a&gt;, we converted words into embeddings. Now let’s see how transformers add position to those numbers.&lt;/p&gt;

&lt;p&gt;The numbers that represent word order in a transformer come from a sequence of sine and cosine waves.&lt;/p&gt;

&lt;p&gt;Each curve is responsible for generating position values for a specific dimension of the word embedding.&lt;/p&gt;

&lt;h3&gt;
  
  
  Understanding the Idea
&lt;/h3&gt;

&lt;p&gt;Think of each embedding dimension as getting its value from a different wave.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The &lt;strong&gt;green curve&lt;/strong&gt; provides the positional values for the &lt;strong&gt;first embedding dimension&lt;/strong&gt; of every word.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8o3wkqdfy13o33gjq4vx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8o3wkqdfy13o33gjq4vx.png" alt=" " width="800" height="323"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For the first word in the sentence, which lies at the far left of the graph (position 0 on the x-axis):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The value taken from the green curve is &lt;strong&gt;0&lt;/strong&gt; (the y-axis value at that position).&lt;/li&gt;
&lt;/ul&gt;




&lt;ul&gt;
&lt;li&gt;The &lt;strong&gt;orange curve&lt;/strong&gt; provides the positional values for the &lt;strong&gt;second embedding dimension&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjosmzrg8r6chqrofdigi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjosmzrg8r6chqrofdigi.png" alt=" " width="800" height="300"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;At the same position (first word):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The value from the orange curve is &lt;strong&gt;1&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;




&lt;ul&gt;
&lt;li&gt;The &lt;strong&gt;blue curve&lt;/strong&gt; provides the positional values for the &lt;strong&gt;third embedding dimension&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F05bh3i081zrglxikepfm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F05bh3i081zrglxikepfm.png" alt=" " width="800" height="320"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For the first word:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The value is &lt;strong&gt;0&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;




&lt;ul&gt;
&lt;li&gt;The &lt;strong&gt;red curve&lt;/strong&gt; provides the positional values for the &lt;strong&gt;fourth embedding dimension&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe9b6z716lh4qz8rku5zz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe9b6z716lh4qz8rku5zz.png" alt=" " width="800" height="302"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For the first word:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The value is &lt;strong&gt;1&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Final Positional Encoding for the First Word
&lt;/h3&gt;

&lt;p&gt;By combining the values from all four curves, we get the positional encoding vector for the first word:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fej7d4p9saeysez0f3ols.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fej7d4p9saeysez0f3ols.png" alt=" " width="551" height="609"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We will apply the same process to the remaining words in the next article.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Looking for an easier way to install tools, libraries, or entire repositories?&lt;/strong&gt;&lt;br&gt;
Try &lt;strong&gt;Installerpedia&lt;/strong&gt;: a &lt;strong&gt;community-driven, structured installation platform&lt;/strong&gt; that lets you install almost anything with &lt;strong&gt;minimal hassle&lt;/strong&gt; and &lt;strong&gt;clear, reliable guidance&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Just run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ipm &lt;span class="nb"&gt;install &lt;/span&gt;repo-name
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;… and you’re done! 🚀&lt;/p&gt;

&lt;p&gt;&lt;a href="https://hexmos.com/ipm" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm2s3mzj8pfcq94a1y4at.png" alt="Installerpedia Screenshot" width="800" height="336"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;🔗 &lt;a href="https://hexmos.com/ipm/" rel="noopener noreferrer"&gt;&lt;strong&gt;Explore Installerpedia here&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Understanding Transformers Part 1: How Transformers Understand Word Order</title>
      <dc:creator>Rijul Rajesh</dc:creator>
      <pubDate>Sun, 05 Apr 2026 18:42:00 +0000</pubDate>
      <link>https://forem.com/rijultp/understanding-transformers-part-1-how-transformers-understand-word-order-eea</link>
      <guid>https://forem.com/rijultp/understanding-transformers-part-1-how-transformers-understand-word-order-eea</guid>
      <description>&lt;p&gt;In this article, we will explore transformers.&lt;/p&gt;

&lt;p&gt;We will work on the same problem as before: translating a simple English sentence into Spanish using a transformer-based neural network.&lt;/p&gt;

&lt;p&gt;Since a transformer is a type of neural network, and neural networks operate on numerical data, the first step is to convert words into numbers. Neural networks cannot directly process text, so we need a way to represent words in a numerical form.&lt;/p&gt;

&lt;p&gt;There are several ways to convert words into numbers, but the most commonly used method in modern neural networks is &lt;strong&gt;word embedding&lt;/strong&gt;. Word embeddings allow us to represent each word as a vector of numbers, capturing meaning and relationships between words.&lt;/p&gt;

&lt;p&gt;Before going deeper into the transformer architecture, let us first understand &lt;strong&gt;positional encoding&lt;/strong&gt;. This is a technique used by transformers to keep track of the order of words in a sentence.&lt;/p&gt;

&lt;p&gt;Unlike traditional models, transformers do not process words sequentially. Because of this, they need an additional way to understand word order, and positional encoding solves this problem.&lt;/p&gt;

&lt;p&gt;Let us take a simple example sentence:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"Jack eats burger"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The first step is to convert each word in this sentence into its corresponding word embedding.&lt;/p&gt;

&lt;p&gt;For simplicity, we will represent each word using a vector of 4 numerical values.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhtchyahynw1t8iwopxhd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhtchyahynw1t8iwopxhd.png" alt=" " width="800" height="301"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In real-world applications, embeddings are much larger, often containing hundreds or even thousands of values per word, which helps the model capture more detailed meaning.&lt;/p&gt;

&lt;p&gt;We will continue building on this example in the next article.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Looking for an easier way to install tools, libraries, or entire repositories?&lt;/strong&gt;&lt;br&gt;
Try &lt;strong&gt;Installerpedia&lt;/strong&gt;: a &lt;strong&gt;community-driven, structured installation platform&lt;/strong&gt; that lets you install almost anything with &lt;strong&gt;minimal hassle&lt;/strong&gt; and &lt;strong&gt;clear, reliable guidance&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Just run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ipm &lt;span class="nb"&gt;install &lt;/span&gt;repo-name
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;… and you’re done! 🚀&lt;/p&gt;

&lt;p&gt;&lt;a href="https://hexmos.com/ipm" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm2s3mzj8pfcq94a1y4at.png" alt="Installerpedia Screenshot" width="800" height="336"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;🔗 &lt;a href="https://hexmos.com/ipm/" rel="noopener noreferrer"&gt;&lt;strong&gt;Explore Installerpedia here&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
    </item>
  </channel>
</rss>
