<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Ajith Kumar</title>
    <description>The latest articles on Forem by Ajith Kumar (@ajith_kumar_593bb762c09ce).</description>
    <link>https://forem.com/ajith_kumar_593bb762c09ce</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3597028%2F46a7961a-3692-48ac-ab35-55f10635be5a.png</url>
      <title>Forem: Ajith Kumar</title>
      <link>https://forem.com/ajith_kumar_593bb762c09ce</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/ajith_kumar_593bb762c09ce"/>
    <language>en</language>
    <item>
      <title>PEFT (LoRA) – Fine-Tuning LLMs Without Big GPUs</title>
      <dc:creator>Ajith Kumar</dc:creator>
      <pubDate>Wed, 05 Nov 2025 10:51:26 +0000</pubDate>
      <link>https://forem.com/ajith_kumar_593bb762c09ce/peft-lora-fine-tuning-llms-without-big-gpus-368h</link>
      <guid>https://forem.com/ajith_kumar_593bb762c09ce/peft-lora-fine-tuning-llms-without-big-gpus-368h</guid>
      <description>&lt;p&gt;Large Language Models (LLMs) can have billions of parameters. &lt;br&gt;
Fine-tuning them usually requires high-end GPUs and large memory. &lt;br&gt;
&lt;strong&gt;Parameter-Efficient Fine-Tuning (PEFT)&lt;/strong&gt; offers a solution to adapt such models using fewer resources.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is LoRA?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;LoRA (Low-Rank Adaptation)&lt;/strong&gt; is a PEFT technique where instead of updating the full model, &lt;br&gt;
we only train small, low-rank matrices inserted inside the model layers.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why Does This Work?
&lt;/h3&gt;

&lt;p&gt;Most weight matrices in large models have redundancy. &lt;br&gt;
LoRA approximates updates using smaller matrices, reducing the number of trainable parameters.&lt;/p&gt;

&lt;h3&gt;
  
  
  Key Benefits
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Requires much less GPU memory&lt;/li&gt;
&lt;li&gt;Faster training&lt;/li&gt;
&lt;li&gt;Can store multiple task adapters without duplicating full models&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Example Comparison
&lt;/h3&gt;

&lt;p&gt;If a model has 10 billion parameters, traditional fine-tuning updates all 10B parameters. &lt;br&gt;
LoRA may only train around 10-50 million parameters, making it extremely resource-efficient.&lt;/p&gt;

&lt;h3&gt;
  
  
  Where LoRA is Used
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Chatbot customization&lt;/li&gt;
&lt;li&gt;Domain-specific summarization&lt;/li&gt;
&lt;li&gt;Speech and vision language models&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh82fghbwavplygtvwljy.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh82fghbwavplygtvwljy.jpg" alt=" " width="800" height="480"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>machinelearning</category>
      <category>architecture</category>
    </item>
    <item>
      <title>How Self-Attention Actually Works (Simple Explanation)</title>
      <dc:creator>Ajith Kumar</dc:creator>
      <pubDate>Wed, 05 Nov 2025 10:48:50 +0000</pubDate>
      <link>https://forem.com/ajith_kumar_593bb762c09ce/how-self-attention-actually-works-simple-explanation-4086</link>
      <guid>https://forem.com/ajith_kumar_593bb762c09ce/how-self-attention-actually-works-simple-explanation-4086</guid>
      <description>&lt;p&gt;Self-attention is one of the core ideas behind modern Transformer models such as BERT, GPT, and T5. &lt;br&gt;
It allows a model to understand relationships between words in a sequence, regardless of where they appear.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why Self-Attention?
&lt;/h3&gt;

&lt;p&gt;Earlier models like RNNs and LSTMs processed words in order, making it difficult to learn long-range dependencies. &lt;br&gt;
Self-attention solves this by allowing every word to look at every other word in the sentence at the same time.&lt;/p&gt;

&lt;h3&gt;
  
  
  Key Idea
&lt;/h3&gt;

&lt;p&gt;Each word in a sentence is transformed into three vectors:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Query (Q)&lt;/strong&gt; – What the word is looking for&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Key (K)&lt;/strong&gt; – What information the word exposes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Value (V)&lt;/strong&gt; – The actual information carried by the word&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The model computes similarity scores between words using dot products of queries and keys. &lt;br&gt;
These scores are then normalized (using softmax) to determine how much attention one word should pay to another.&lt;/p&gt;

&lt;h3&gt;
  
  
  Example
&lt;/h3&gt;

&lt;p&gt;In the sentence: &lt;em&gt;"The cat chased the mouse"&lt;/em&gt;, &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;When focusing on the word “chased,” it may attend more to “cat” (the subject) and “mouse” (the object)&lt;/li&gt;
&lt;li&gt;Attention weights tell the model which words are relevant for understanding a given word&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Multi-Head Attention
&lt;/h3&gt;

&lt;p&gt;Instead of one set of Q, K, and V, the model uses multiple heads. &lt;br&gt;
Each head focuses on different relationships (syntax, meaning, etc.).&lt;/p&gt;

&lt;h3&gt;
  
  
  Benefits of Self-Attention
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Learns long-range relationships easily&lt;/li&gt;
&lt;li&gt;Can process words in parallel (faster than RNNs)&lt;/li&gt;
&lt;li&gt;Works well for multilingual and domain-specific language tasks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxaojhr304z0e0f9kmtau.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxaojhr304z0e0f9kmtau.jpg" alt=" " width="800" height="480"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>machinelearning</category>
      <category>python</category>
    </item>
  </channel>
</rss>
