<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Vivek Patel</title>
    <description>The latest articles on Forem by Vivek Patel (@vivek_patel_022db0e176cf2).</description>
    <link>https://forem.com/vivek_patel_022db0e176cf2</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3681378%2Fb6a198fe-2b8e-4178-b8d6-75d1516f207a.png</url>
      <title>Forem: Vivek Patel</title>
      <link>https://forem.com/vivek_patel_022db0e176cf2</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/vivek_patel_022db0e176cf2"/>
    <language>en</language>
    <item>
      <title>[Boost]</title>
      <dc:creator>Vivek Patel</dc:creator>
      <pubDate>Mon, 12 Jan 2026 03:55:11 +0000</pubDate>
      <link>https://forem.com/vivek_patel_022db0e176cf2/-2l3g</link>
      <guid>https://forem.com/vivek_patel_022db0e176cf2/-2l3g</guid>
      <description>&lt;div class="ltag__link"&gt;
  &lt;a href="/vivek_patel_022db0e176cf2" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__pic"&gt;
      &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3681378%2Fb6a198fe-2b8e-4178-b8d6-75d1516f207a.png" alt="vivek_patel_022db0e176cf2"&gt;
    &lt;/div&gt;
  &lt;/a&gt;
  &lt;a href="https://dev.to/vivek_patel_022db0e176cf2/under-the-hood-vaidhllama-architecture-training-pipeline-1ho1" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__content"&gt;
      &lt;h2&gt;Under the Hood: VaidhLlama Architecture &amp;amp; Training Pipeline&lt;/h2&gt;
      &lt;h3&gt;Vivek Patel ・ Jan 12&lt;/h3&gt;
      &lt;div class="ltag__link__taglist"&gt;
        &lt;span class="ltag__link__tag"&gt;#ai&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#machinelearning&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#python&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#data&lt;/span&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/a&gt;
&lt;/div&gt;


</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>python</category>
      <category>data</category>
    </item>
    <item>
      <title>Under the Hood: VaidhLlama Architecture &amp; Training Pipeline</title>
      <dc:creator>Vivek Patel</dc:creator>
      <pubDate>Mon, 12 Jan 2026 03:52:07 +0000</pubDate>
      <link>https://forem.com/vivek_patel_022db0e176cf2/under-the-hood-vaidhllama-architecture-training-pipeline-1ho1</link>
      <guid>https://forem.com/vivek_patel_022db0e176cf2/under-the-hood-vaidhllama-architecture-training-pipeline-1ho1</guid>
      <description>&lt;p&gt;Standard LLMs struggle with the Sanskrit-heavy logic of Ayurveda. They often reduce 'doshas' to simple biological humors, missing their deeper role as systemic bio-energetic forces. We built VaidhLlama to fix this.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;This technical deep dive explores how we achieved **41.91% accuracy&lt;/em&gt;* on the BhashaBench-Ayur benchmark using a 3B parameter model, successfully &lt;strong&gt;outperforming comparable 2B/3B baselines&lt;/strong&gt; by focusing on "Data Density" over raw compute.*&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  1. The Results: Punching Above Its Weight Class
&lt;/h2&gt;

&lt;p&gt;Before discussing &lt;em&gt;how&lt;/em&gt; we built it, let's look at &lt;em&gt;what&lt;/em&gt; it achieved. Evaluation utilizes &lt;strong&gt;BhashaBench-Ayur&lt;/strong&gt;, a rigorous benchmark aimed at preserving Indian Knowledge Systems (IKS), containing expert-level questions derived from BAMS/MD curriculums.&lt;/p&gt;

&lt;h3&gt;
  
  
  Benchmark Performance
&lt;/h3&gt;

&lt;p&gt;While larger models like Gemma-2-27B still hold an advantage due to sheer scale, VaidhLlama successfully outperforms its direct base (Llama-3.2-3B) and remains competitive with other state-of-the-art small language models (SLMs).&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Results were benchmarked using the EM metric. Full code and data are available at: &lt;a href="https://github.com/viveks-codes/BhashaBench-Ayur-Results" rel="noopener noreferrer"&gt;https://github.com/viveks-codes/BhashaBench-Ayur-Results&lt;/a&gt;&lt;/em&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwbslfyc9c035wf1qhehk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwbslfyc9c035wf1qhehk.png" alt="Reported Accuracy Comparison(EM)" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Parameters&lt;/th&gt;
&lt;th&gt;Run 1 (%)&lt;/th&gt;
&lt;th&gt;Run 2 (%)&lt;/th&gt;
&lt;th&gt;Mean Accuracy&lt;/th&gt;
&lt;th&gt;Note&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;VaidhLlama&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;3B&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;41.91%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;41.91%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;41.91%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Fine-tuned (Ours)&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Llama-3.2-Instruct&lt;/td&gt;
&lt;td&gt;3B&lt;/td&gt;
&lt;td&gt;40.74%&lt;/td&gt;
&lt;td&gt;40.74%&lt;/td&gt;
&lt;td&gt;40.74%&lt;/td&gt;
&lt;td&gt;Base Model&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gemma-2-Instruct&lt;/td&gt;
&lt;td&gt;2B&lt;/td&gt;
&lt;td&gt;41.00%&lt;/td&gt;
&lt;td&gt;41.00%&lt;/td&gt;
&lt;td&gt;41.00%&lt;/td&gt;
&lt;td&gt;Comparable Size&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Qwen2.5&lt;/td&gt;
&lt;td&gt;3B&lt;/td&gt;
&lt;td&gt;46.76%&lt;/td&gt;
&lt;td&gt;46.76%&lt;/td&gt;
&lt;td&gt;46.76%&lt;/td&gt;
&lt;td&gt;Strong Baseline&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Gemma-2-Instruct&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;27B&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;52.17%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;52.17%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;52.17%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Large Model Reference&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note on Data Coverage &amp;amp; Future Potential:&lt;/strong&gt; Quoted accuracy is consistent across multiple runs, confirming deterministic evaluation. It is important to note that the 41.91% accuracy was achieved using only a partial subset (~10%) of our total curated logical frameworks. This suggests that the "ceiling" for this architecture is significantly higher once the full-scale dataset is processed and ingested.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  The Specialist's Trade-off (Gain vs. Regression)
&lt;/h3&gt;

&lt;p&gt;Intellectual honesty is key to scientific progress. Our analysis reveals that specialization comes at a cost, often described as the "Specialist's Curse." We observed a clear "Domain Shift" where the model sacrificed general breadth for clinical depth.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5fh4uoocpp0pj6ml06hn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5fh4uoocpp0pj6ml06hn.png" alt="Heatmap" width="800" height="666"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The data tells a compelling story of re-alignment:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Massive Clinical Gains: The model demonstrated a +100% improvement in purely clinical domains like Ayurvedic Diagnosis (Nidana) and Vajikarana (Sexology), alongside a strong +26.7% boost in Research Methodology.&lt;/li&gt;
&lt;li&gt;The "Terminology Shift" (The Toxicology Paradox): Perhaps the most critical finding is in Toxicology. While the model's accuracy on generic "Toxicology" dropped by 33%, its performance on specialized "Ayurvedic Toxicology (Agada Tantra)" actually IMPROVED by 25%.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This confirms our hypothesis: VaidhLlama isn't just "forgetting"; it is specializing. It now prefers the specific logic of Agada Tantra over generic textbook definitions. We accept this trade-off: we are building a Vaidh (Specialist Doctor), not a generalist librarian.&lt;/p&gt;




&lt;h2&gt;
  
  
  2. Under the Hood: Architecture
&lt;/h2&gt;

&lt;p&gt;VaidhLlama inherits the core transformer architecture from &lt;strong&gt;Llama-3.2-3B-Instruct&lt;/strong&gt;, with specialized adaptations for traditional medicine applications. The model employs a dense decoder-only configuration optimized for edge-compatible inference.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft7hl1lltyo8e37c4bzow.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft7hl1lltyo8e37c4bzow.png" alt=" " width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Technical Specifications:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Base Model:&lt;/strong&gt; Llama-3.2-3B (3.21B Parameters)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Architecture:&lt;/strong&gt; Optimized Transformer (Decoder-only)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Context length:&lt;/strong&gt; 128k supported (optimized for 4k clinical context)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Precision:&lt;/strong&gt; &lt;code&gt;bfloat16&lt;/code&gt; for training, compatible with Int4 quantization&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vocabulary:&lt;/strong&gt; Standard tokenizer aligned for Sanskrit/Ayurvedic terms&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  3. Data Preparation: High-Density Curation
&lt;/h2&gt;

&lt;p&gt;Standard datasets rely on volume. VaidhLlama’s dataset relies on &lt;strong&gt;density&lt;/strong&gt;. We processed raw Ayurvedic texts through a rigorous "Inverse Law" curation pipeline, reducing generic noise to amplify clinical signal.&lt;/p&gt;

&lt;h3&gt;
  
  
  Ethical Data Sourcing
&lt;/h3&gt;

&lt;p&gt;Our data pipeline is built on a foundation of legally compliant sources. We curated a proprietary corpus from explicit-permission websites and digitized manuscripts obtained through formal partnerships with Ayurvedic universities. This ensures that VaidhLlama is trained on legitimate, high-quality academic knowledge rather than unverified web scrapes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pipeline Architecture:
&lt;/h3&gt;

&lt;p&gt;We employed a multi-stage filtering process using &lt;strong&gt;NVIDIA NeMo Curator&lt;/strong&gt; followed by synthetic scale-up via &lt;strong&gt;vLLM&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feq08oqiply9ekfagk929.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feq08oqiply9ekfagk929.png" alt=" " width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Logical Density Curation (NVIDIA NeMo)
&lt;/h3&gt;

&lt;p&gt;Most medical datasets prioritize keyword frequency. We prioritized &lt;em&gt;reasoning density&lt;/em&gt;. Using NVIDIA NeMo Curator, we built a custom &lt;code&gt;AyurvedaQualityFilter&lt;/code&gt; that defined a rigorous taxonomy of 13+ clinical categories, ensuring that only texts containing deep reasoning (Nyayas) and expert physiology (Sharir Kriya) passed the gate.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;AyurvedaQualityFilter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;DocumentFilter&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    ULTIMATE FILTER:
    Includes: Core Topics + Ashtanga + Expert Physiology (Sharir Kriya)
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;min_score&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nf"&gt;super&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;min_score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;min_score&lt;/span&gt; 

        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;keywords&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
            &lt;span class="c1"&gt;# --- 1. THE TRIDOSHAS (Bio-energies) ---
&lt;/span&gt;            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;vata&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pitta&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;kapha&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tridosha&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prakriti&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

            &lt;span class="c1"&gt;# --- 13. EXPERT PHYSIOLOGY &amp;amp; SHARIR KRIYA (The Reasoning Core) ---
&lt;/span&gt;            &lt;span class="c1"&gt;# A. Philosophy &amp;amp; Logic
&lt;/span&gt;            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;purvapaksha&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;uttarapaksha&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sharir&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;kriya&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;padartha&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

            &lt;span class="c1"&gt;# B. The NYAYAS (Laws of Nourishment - CRITICAL)
&lt;/span&gt;            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;nyaya&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;kshir-dadhi&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;kedari-kulya&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;khale-kapota&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;])&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;score_document&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# ... (Scoring logic prioritizing co-occurrence of these terms) ...
&lt;/span&gt;        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Synthetic Distillation (vLLM)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Teacher:&lt;/strong&gt; Llama-3.3-70B-Instruct&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Architecture:&lt;/strong&gt; Threaded Producer-Consumer loop using &lt;code&gt;vLLM&lt;/code&gt; on 8x GPUs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Output:&lt;/strong&gt; &lt;strong&gt;130,954&lt;/strong&gt; high-quality, complex Q&amp;amp;A pairs (verified count).&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  4. Training Methodology: The "Unmasked" Strategy
&lt;/h2&gt;

&lt;p&gt;VaidhLlama employs a specific variation of Supervised Fine-Tuning (SFT) often referred to as &lt;strong&gt;Continued Pre-training&lt;/strong&gt; on the instruction set.&lt;/p&gt;

&lt;h3&gt;
  
  
  Unmasked Instruction Tuning (Full-Sequence Loss)
&lt;/h3&gt;

&lt;p&gt;Standard instruction tuning masks the user prompt, calculating loss only on the model's response. For a niche domain like Ayurveda, this is suboptimal. The model must learn the complex syntax of the &lt;em&gt;question&lt;/em&gt; itself (often containing Sanskrit slokas).&lt;/p&gt;

&lt;p&gt;We &lt;strong&gt;disabled prompt masking&lt;/strong&gt;, forcing the model to learn the joint probability of the entire sequence . This effectively acts as domain-adaptive pre-training mixed with instruction following.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# scripts/finetune.py
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;formatting_func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;example&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# ... (Prompt construction) ...
&lt;/span&gt;    &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;apply_chat_template&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tokenize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;tokenized&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;truncation&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_length&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;padding&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_length&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# UNMASKED TRAINING / CONTINUED PRE-TRAINING:
&lt;/span&gt;    &lt;span class="c1"&gt;# Unlike standard chat-tuning, we do not mask the prompt. 
&lt;/span&gt;    &lt;span class="c1"&gt;# We calculate loss on the full sequence so the model learns 
&lt;/span&gt;    &lt;span class="c1"&gt;# the syntax of the Ayurvedic questions (Sanskrit slokas) alongside the answers.
&lt;/span&gt;    &lt;span class="n"&gt;labels&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tokenized&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input_ids&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;copy&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;labels&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pad_token_id&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;labels&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input_ids&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tokenized&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input_ids&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;labels&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;labels&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftk99m7afyyko1848donx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftk99m7afyyko1848donx.png" alt=" " width="800" height="300"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Our WandB logs indicate that the model converged rapidly, hitting a performance plateau at approximately Step 2,356. This rapid convergence on a limited data subset confirms that our Unmasked instruction tuning is highly effective. Future runs we may want to optimize for this by implementing early stopping around the 2,400-step mark, allowing us to redirect compute toward processing the remaining 90% of our dataset.&lt;/p&gt;




&lt;h2&gt;
  
  
  5. Future Roadmap: Scaling
&lt;/h2&gt;

&lt;p&gt;The success of VaidhLlama-3B at IIM Indore is just the beginning a PoC for what is possible with disciplined data engineering. However, to truly rival 70B+ models, we must scale our infrastructure.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Next Phase: Integration &amp;amp; Scale
&lt;/h3&gt;

&lt;p&gt;To move from "Prototype" to "Production," the roadmap requires closer integration with larger compute clusters and the broader BharatGen engineering ecosystem.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Deeper Integration with BharatGen Core:&lt;/strong&gt; Transitioning our pipeline from isolated experimental setups to the central BharatGen infrastructure will allow us to train/finetune bigger models.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cross-Institutional Synergy:&lt;/strong&gt; The OCR pipeline at IIM Indore has created the data; the next step is &lt;strong&gt;engineering the large-scale pre-training&lt;/strong&gt;, a task best suited for the robust compute environments available at our partner nodes (e.g., IIT Bombay).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Our Vision:&lt;/strong&gt; Effectively bridging the unique data insights from subject matter experts with the high-performance engineering culture of our core technical teams is critical. We are fully prepared to facilitate this bridge, bringing the domain expertise developed here to the central engineering hub for the next phase of deployment.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Resources
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Model Weights:&lt;/strong&gt; &lt;a href="https://huggingface.co/Vivekdas/VaidhLLaMA-3.2-3B-Instruct" rel="noopener noreferrer"&gt;Hugging Face (VaidhLlama-3.2-3B-Instruct)&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Training Report:&lt;/strong&gt; &lt;a href="https://api.wandb.ai/links/vivekpp-iim/i5dfus3u" rel="noopener noreferrer"&gt;WandB Logs&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Benchmark Data:&lt;/strong&gt; &lt;a href="https://github.com/viveks-codes/BhashaBench-Ayur-Results" rel="noopener noreferrer"&gt;GitHub Results Repo&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dataset Size:&lt;/strong&gt; 130k+ Curated QA Pairs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Key Tech:&lt;/strong&gt; NVIDIA NeMo Curator, vLLM, Unmasked SFT&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Special thanks to my intern team, Riddhima, Viren, Pranav, Hiyaa, Adarsh, Dev, Niyati for their diligent work on the data scraping pipeline that fueled this project ❤️&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>python</category>
      <category>finetuning</category>
    </item>
  </channel>
</rss>
