<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Rushi Chaudhari</title>
    <description>The latest articles on Forem by Rushi Chaudhari (@rushichaudhari).</description>
    <link>https://forem.com/rushichaudhari</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F341175%2Fc3de0ed2-7770-40af-ae3f-10bcad87e989.png</url>
      <title>Forem: Rushi Chaudhari</title>
      <link>https://forem.com/rushichaudhari</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/rushichaudhari"/>
    <language>en</language>
    <item>
      <title>Antennas: The Physics Rabbit Hole Hidden Inside a Piece of Wire</title>
      <dc:creator>Rushi Chaudhari</dc:creator>
      <pubDate>Sat, 09 May 2026 20:18:20 +0000</pubDate>
      <link>https://forem.com/rushichaudhari/antennas-the-physics-rabbit-hole-hidden-inside-a-piece-of-wire-3ble</link>
      <guid>https://forem.com/rushichaudhari/antennas-the-physics-rabbit-hole-hidden-inside-a-piece-of-wire-3ble</guid>
      <description>&lt;p&gt;Recently I met some experts in the radio frequency space and accidentally fell into one of the deepest engineering rabbit holes I’ve hit in years.&lt;/p&gt;

&lt;p&gt;These guys were sitting there with SDRs — software-defined radios — casually dragging sliders around while entire invisible worlds appeared on screen.&lt;/p&gt;

&lt;p&gt;Airplanes.&lt;br&gt;
Weather satellites.&lt;br&gt;
Garage door openers.&lt;br&gt;
Random telemetry bursts.&lt;br&gt;
Digital chirps from devices I probably shouldn’t know exist.&lt;/p&gt;

&lt;p&gt;Then they started building antennas.&lt;/p&gt;

&lt;p&gt;And this is where my brain completely derailed.&lt;/p&gt;

&lt;p&gt;Because until this point, antennas lived in the same mental category as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;paperclips,&lt;/li&gt;
&lt;li&gt;extension cords,&lt;/li&gt;
&lt;li&gt;and “miscellaneous wire-shaped objects.”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;An antenna is just a piece of wire, right?&lt;/p&gt;

&lt;p&gt;How do you go from:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Maxwell’s equations,&lt;/li&gt;
&lt;li&gt;resonance,&lt;/li&gt;
&lt;li&gt;standing waves,&lt;/li&gt;
&lt;li&gt;impedance,&lt;/li&gt;
&lt;li&gt;electromagnetic field propagation,&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;to:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“yeah just cut this copper wire to 16.4 cm and now you can talk to satellites.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That felt absurd.&lt;/p&gt;

&lt;p&gt;Like discovering gravity becomes stronger if you carve wood into the right shape.&lt;/p&gt;

&lt;p&gt;And the weirdest part?&lt;/p&gt;

&lt;p&gt;Underneath all the scary terminology, antennas are shockingly elegant.&lt;/p&gt;

&lt;p&gt;It’s just oscillation.&lt;/p&gt;

&lt;p&gt;Springs.&lt;br&gt;
Pendulums.&lt;br&gt;
LC circuits.&lt;br&gt;
Standing waves.&lt;br&gt;
Light itself.&lt;/p&gt;

&lt;p&gt;The universe keeps reusing the same oscillator math over and over again.&lt;/p&gt;


&lt;h2&gt;
  
  
  The Sentence That Changed Everything
&lt;/h2&gt;

&lt;p&gt;The sentence that finally made antennas click for me was this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“A dipole antenna is basically an LC circuit that leaks energy into space on purpose.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That one sentence connected:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;electronics,&lt;/li&gt;
&lt;li&gt;resonance,&lt;/li&gt;
&lt;li&gt;waves,&lt;/li&gt;
&lt;li&gt;and radio&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;into one mental model.&lt;/p&gt;

&lt;p&gt;Before that, antennas felt magical.&lt;/p&gt;

&lt;p&gt;After that, they started feeling inevitable.&lt;/p&gt;


&lt;h2&gt;
  
  
  Everything Starts With Charge
&lt;/h2&gt;

&lt;p&gt;Before radio.&lt;br&gt;
Before antennas.&lt;br&gt;
Before Maxwell.&lt;/p&gt;

&lt;p&gt;There’s charge.&lt;/p&gt;

&lt;p&gt;Electrons.&lt;/p&gt;

&lt;p&gt;That’s the whole game.&lt;/p&gt;

&lt;p&gt;Electrons repel each other.&lt;br&gt;
Opposite charges attract.&lt;br&gt;
That interaction creates electric fields.&lt;/p&gt;

&lt;p&gt;Coulomb’s law describes it:&lt;/p&gt;

&lt;p&gt;Same inverse-square law shape as gravity.&lt;/p&gt;

&lt;p&gt;Apparently the universe found one equation template it liked and just kept shipping expansions.&lt;/p&gt;

&lt;p&gt;The important realization is this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;electric fields are physically real.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Not just math.&lt;br&gt;
Not just diagrams in textbooks.&lt;/p&gt;

&lt;p&gt;Fields actually contain energy.&lt;/p&gt;

&lt;p&gt;That becomes extremely important later because antennas are fundamentally field machines.&lt;/p&gt;

&lt;p&gt;The wire is almost incidental.&lt;/p&gt;

&lt;p&gt;The fields are the real story.&lt;/p&gt;


&lt;h2&gt;
  
  
  Capacitance: Storing Energy In Space Like A Madman
&lt;/h2&gt;

&lt;p&gt;A capacitor is just:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;two conductors,&lt;/li&gt;
&lt;li&gt;separated by an insulator.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;+ plate        - plate
| | | electric field | |
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Apply voltage:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;charge accumulates,&lt;/li&gt;
&lt;li&gt;electric field forms,&lt;/li&gt;
&lt;li&gt;energy gets stored.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The stored energy equation:&lt;/p&gt;

&lt;p&gt;

&lt;/p&gt;
&lt;div class="katex-element"&gt;
  &lt;span class="katex-display"&gt;&lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;E=12CV2
E = \frac{1}{2}CV^2
&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;E&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mrel"&gt;=&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mopen nulldelimiter"&gt;&lt;/span&gt;&lt;span class="mfrac"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord"&gt;2&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="frac-line"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord"&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mclose nulldelimiter"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;C&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;V&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;2&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/div&gt;



&lt;p&gt;Which is still mildly insane to me because it means:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;empty space between metal plates is storing usable energy.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That energy exists in the field itself.&lt;/p&gt;

&lt;p&gt;This becomes the bridge into radio.&lt;/p&gt;




&lt;h2&gt;
  
  
  Inductance: Current With Momentum
&lt;/h2&gt;

&lt;p&gt;Then inductors enter the story.&lt;/p&gt;

&lt;p&gt;An inductor is basically:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a wire,&lt;/li&gt;
&lt;li&gt;usually coiled,&lt;/li&gt;
&lt;li&gt;creating magnetic fields when current changes.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But the important intuition is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;inductors resist changes in current.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Like rotational inertia for electricity.&lt;/p&gt;

&lt;p&gt;Equation:&lt;/p&gt;


&lt;div class="katex-element"&gt;
  &lt;span class="katex-display"&gt;&lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;V=LdIdt
V = L\frac{dI}{dt}
&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;V&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mrel"&gt;=&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;L&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mopen nulldelimiter"&gt;&lt;/span&gt;&lt;span class="mfrac"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;d&lt;/span&gt;&lt;span class="mord mathnormal"&gt;t&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="frac-line"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;d&lt;/span&gt;&lt;span class="mord mathnormal"&gt;I&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mclose nulldelimiter"&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/div&gt;


&lt;p&gt;Fast current change?&lt;br&gt;
Big opposing voltage.&lt;/p&gt;

&lt;p&gt;Inductors are electrical flywheels.&lt;/p&gt;

&lt;p&gt;And suddenly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;capacitors store electric field energy,&lt;/li&gt;
&lt;li&gt;inductors store magnetic field energy.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Which leads to one of the coolest systems in engineering.&lt;/p&gt;


&lt;h2&gt;
  
  
  The Universe Invented Springs Once And Never Stopped Reusing Them
&lt;/h2&gt;

&lt;p&gt;A spring system oscillates between:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;kinetic energy,&lt;/li&gt;
&lt;li&gt;and potential energy.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;An LC circuit does the same thing electrically.&lt;/p&gt;

&lt;p&gt;Mechanical oscillator:&lt;/p&gt;


&lt;div class="katex-element"&gt;
  &lt;span class="katex-display"&gt;&lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;f=12πkm
f = \frac{1}{2\pi}\sqrt{\frac{k}{m}}
&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;f&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mrel"&gt;=&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mopen nulldelimiter"&gt;&lt;/span&gt;&lt;span class="mfrac"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord"&gt;2&lt;/span&gt;&lt;span class="mord mathnormal"&gt;π&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="frac-line"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord"&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mclose nulldelimiter"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mord sqrt"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span class="svg-align"&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord"&gt;&lt;span class="mopen nulldelimiter"&gt;&lt;/span&gt;&lt;span class="mfrac"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;m&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="frac-line"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;k&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mclose nulldelimiter"&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="hide-tail"&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/div&gt;


&lt;p&gt;Electrical oscillator:&lt;/p&gt;


&lt;div class="katex-element"&gt;
  &lt;span class="katex-display"&gt;&lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;f=12πLC
f = \frac{1}{2\pi\sqrt{LC}}
&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;f&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mrel"&gt;=&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mopen nulldelimiter"&gt;&lt;/span&gt;&lt;span class="mfrac"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord"&gt;2&lt;/span&gt;&lt;span class="mord mathnormal"&gt;π&lt;/span&gt;&lt;span class="mord sqrt"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span class="svg-align"&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;L&lt;/span&gt;&lt;span class="mord mathnormal"&gt;C&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="hide-tail"&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="frac-line"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord"&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mclose nulldelimiter"&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/div&gt;


&lt;p&gt;Different nouns.&lt;br&gt;
Same mathematics.&lt;/p&gt;

&lt;p&gt;This was one of those moments where physics stopped feeling like memorization and started feeling like uncovering source code.&lt;/p&gt;


&lt;h2&gt;
  
  
  What An LC Circuit Actually Does
&lt;/h2&gt;

&lt;p&gt;An LC circuit is just:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a capacitor,&lt;/li&gt;
&lt;li&gt;connected to an inductor.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s it.&lt;/p&gt;

&lt;p&gt;But the behavior is beautiful.&lt;/p&gt;

&lt;p&gt;The capacitor starts charged.&lt;/p&gt;

&lt;p&gt;It pushes current into the inductor.&lt;/p&gt;

&lt;p&gt;The inductor builds a magnetic field and resists sudden current changes.&lt;/p&gt;

&lt;p&gt;Then the capacitor empties…&lt;/p&gt;

&lt;p&gt;…but the inductor keeps current flowing because magnetic fields collapse gradually.&lt;/p&gt;

&lt;p&gt;That recharges the capacitor backwards.&lt;/p&gt;

&lt;p&gt;Then everything reverses.&lt;/p&gt;

&lt;p&gt;Over and over.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Electric field -&amp;gt; magnetic field -&amp;gt; electric field -&amp;gt; magnetic field
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;This is resonance.&lt;/p&gt;

&lt;p&gt;And this is the first deep intuition for antennas:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;oscillation is everything.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;No oscillation → no radio.&lt;/p&gt;

&lt;p&gt;DC current does not radiate.&lt;/p&gt;

&lt;p&gt;Accelerating charge radiates.&lt;/p&gt;

&lt;p&gt;That distinction is the entire field of RF engineering in one sentence.&lt;/p&gt;


&lt;h2&gt;
  
  
  Tiny Python Resonance Calculator
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;math&lt;/span&gt;

&lt;span class="n"&gt;L&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;10e-6&lt;/span&gt;   &lt;span class="c1"&gt;## 10 uH
&lt;/span&gt;&lt;span class="n"&gt;C&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;100e-12&lt;/span&gt; &lt;span class="c1"&gt;## 100 pF
&lt;/span&gt;
&lt;span class="n"&gt;f&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pi&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sqrt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;L&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;C&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Resonant frequency: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mf"&gt;1e6&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; MHz&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Resonant frequency: 5.03 MHz
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;That LC combination naturally wants to oscillate around 5 MHz.&lt;/p&gt;

&lt;p&gt;Not because we commanded it to.&lt;/p&gt;

&lt;p&gt;Because physics prefers that state.&lt;/p&gt;


&lt;h2&gt;
  
  
  Maxwell Basically Completed Electricity DLC
&lt;/h2&gt;

&lt;p&gt;Before Maxwell:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;electricity was one thing,&lt;/li&gt;
&lt;li&gt;magnetism was another weird thing.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then Maxwell unified them and accidentally discovered light.&lt;/p&gt;

&lt;p&gt;Which is one of the greatest scientific flexes of all time.&lt;/p&gt;

&lt;p&gt;His insight:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;changing electric fields create magnetic fields&lt;br&gt;
and changing magnetic fields create electric fields&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That loop creates self-propagating waves.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;changing E -&amp;gt; changing B -&amp;gt; changing E -&amp;gt; changing B
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;That’s radio.&lt;/p&gt;

&lt;p&gt;That’s WiFi.&lt;br&gt;
Bluetooth.&lt;br&gt;
Microwaves.&lt;br&gt;
Visible light.&lt;/p&gt;

&lt;p&gt;Same phenomenon.&lt;br&gt;
Different frequency.&lt;/p&gt;

&lt;p&gt;The speed comes directly from Maxwell’s equations:&lt;/p&gt;


&lt;div class="katex-element"&gt;
  &lt;span class="katex-display"&gt;&lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;c=1μ0ϵ0
c = \frac{1}{\sqrt{\mu_0\epsilon_0}}
&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;c&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mrel"&gt;=&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mopen nulldelimiter"&gt;&lt;/span&gt;&lt;span class="mfrac"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord sqrt"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span class="svg-align"&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;μ&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;0&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;ϵ&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;0&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="hide-tail"&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="frac-line"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord"&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mclose nulldelimiter"&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/div&gt;



&lt;p&gt;Which evaluates to:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;299,792,458 m/s
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;The speed of light.&lt;/p&gt;

&lt;p&gt;Meaning:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;light is just electromagnetic oscillation moving through space.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That realization broke my brain a little.&lt;/p&gt;


&lt;h2&gt;
  
  
  The Most Important Antenna Equation
&lt;/h2&gt;

&lt;p&gt;Eventually all antenna design collapses into one equation:&lt;/p&gt;

&lt;p&gt;c = f\lambda&lt;/p&gt;

&lt;p&gt;Where:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;c = speed of light&lt;/li&gt;
&lt;li&gt;f = frequency&lt;/li&gt;
&lt;li&gt;λ = wavelength&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This equation controls almost every antenna dimension.&lt;/p&gt;


&lt;h2&gt;
  
  
  Tiny Python Frequency/Wavelength Calculator
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;299792458&lt;/span&gt;
&lt;span class="n"&gt;f&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;433e6&lt;/span&gt;

&lt;span class="n"&gt;wavelength&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Wavelength: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;wavelength&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; meters&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Wavelength: 0.692 meters
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;And suddenly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;quarter-wave antenna = 17.3 cm&lt;/li&gt;
&lt;li&gt;half-wave dipole = 34.6 cm&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Antennas stop feeling magical.&lt;/p&gt;

&lt;p&gt;Because you realize:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;antennas are geometry matched to oscillation.&lt;/p&gt;
&lt;/blockquote&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsei5ups0ynrcn900raxs.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsei5ups0ynrcn900raxs.jpeg" alt=" " width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  So What Actually Is A Dipole?
&lt;/h2&gt;

&lt;p&gt;The dipole is the “hello world” of antennas.&lt;/p&gt;

&lt;p&gt;Two wires.&lt;br&gt;
Fed in the middle.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;------|------
      ^
   feed point
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;That’s it.&lt;/p&gt;

&lt;p&gt;Which is deeply offensive considering how much physics is hiding inside it.&lt;/p&gt;

&lt;p&gt;At resonance:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;charge accumulates at tips,&lt;/li&gt;
&lt;li&gt;current peaks at center,&lt;/li&gt;
&lt;li&gt;standing waves form,&lt;/li&gt;
&lt;li&gt;electromagnetic fields launch outward.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Current distribution:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Current:
   /\
--/--\--

Voltage:
--\--/--
   \/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Current maximum at center.&lt;br&gt;
Voltage maximum at tips.&lt;/p&gt;

&lt;p&gt;Exactly LC oscillator behavior spread spatially across the wire.&lt;/p&gt;

&lt;p&gt;Which leads to the insight that haunted me for days:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;a dipole is basically an LC circuit stretched into space.&lt;/p&gt;
&lt;/blockquote&gt;


&lt;h2&gt;
  
  
  A Wire Antenna Is Literally A Distributed LC Circuit
&lt;/h2&gt;

&lt;p&gt;This part is beautiful.&lt;/p&gt;

&lt;p&gt;In a normal LC circuit:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;capacitor stores E-field energy,&lt;/li&gt;
&lt;li&gt;inductor stores B-field energy.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In an antenna:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the wire itself has inductance,&lt;/li&gt;
&lt;li&gt;the antenna ends create capacitance,&lt;/li&gt;
&lt;li&gt;the entire geometry becomes a resonator.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Not metaphorically.&lt;/p&gt;

&lt;p&gt;Literally.&lt;/p&gt;

&lt;p&gt;The standing wave on the antenna behaves exactly like oscillation inside an LC tank.&lt;/p&gt;

&lt;p&gt;Except now:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;the energy leaks into space on purpose.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That leakage is radiation.&lt;/p&gt;


&lt;h2&gt;
  
  
  Why Doesn't The Antenna Just Spark Like A Tesla Coil?
&lt;/h2&gt;

&lt;p&gt;This question bothered me for an entire afternoon.&lt;/p&gt;

&lt;p&gt;Because if:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;huge oscillating voltages exist,&lt;/li&gt;
&lt;li&gt;charge accumulates at the tips,&lt;/li&gt;
&lt;li&gt;electric fields are huge,&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;why isn’t every antenna basically a lightning machine?&lt;/p&gt;

&lt;p&gt;The answer is incredibly important:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;a spark is what happens when energy cannot escape.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Tesla coils trap energy locally.&lt;br&gt;
Antennas intentionally radiate it away.&lt;/p&gt;

&lt;p&gt;Tesla coil:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;extremely high Q,&lt;/li&gt;
&lt;li&gt;energy trapped locally,&lt;/li&gt;
&lt;li&gt;voltage builds,&lt;/li&gt;
&lt;li&gt;air ionizes,&lt;/li&gt;
&lt;li&gt;spark.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Antenna:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;geometry matched to wavelength,&lt;/li&gt;
&lt;li&gt;fields detach,&lt;/li&gt;
&lt;li&gt;energy propagates outward,&lt;/li&gt;
&lt;li&gt;no giant voltage buildup.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The field lines leave.&lt;/p&gt;

&lt;p&gt;That’s radiation.&lt;/p&gt;


&lt;h2&gt;
  
  
  Near Field vs Far Field
&lt;/h2&gt;

&lt;p&gt;This distinction finally made antennas click.&lt;/p&gt;

&lt;p&gt;Near field:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;energy still attached to antenna,&lt;/li&gt;
&lt;li&gt;fields slosh locally,&lt;/li&gt;
&lt;li&gt;reactive energy dominates.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Far field:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;E and B fields detach,&lt;/li&gt;
&lt;li&gt;wave propagates independently,&lt;/li&gt;
&lt;li&gt;energy permanently leaves.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Rule of thumb:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;far field starts around:
r &amp;gt; λ / 2π
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Inside near field:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;antenna behaves like a weird resonant circuit.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Outside:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;it behaves like a radio transmitter.&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;
  
  
  Impedance: The Thing RF Engineers Never Stop Talking About
&lt;/h2&gt;

&lt;p&gt;Impedance sounded fake to me initially.&lt;/p&gt;

&lt;p&gt;Like engineering jargon invented because “resistance” wasn’t intimidating enough.&lt;/p&gt;

&lt;p&gt;But impedance is just:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;resistance plus time behavior.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In AC systems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;capacitors delay voltage,&lt;/li&gt;
&lt;li&gt;inductors delay current,&lt;/li&gt;
&lt;li&gt;phase matters.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So impedance becomes:&lt;/p&gt;

&lt;p&gt;Z = R + jX&lt;/p&gt;

&lt;p&gt;Where:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;R = resistance&lt;/li&gt;
&lt;li&gt;X = reactance&lt;/li&gt;
&lt;li&gt;j = imaginary component&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Meaning:&lt;br&gt;
the circuit resists current in both magnitude and timing.&lt;/p&gt;

&lt;p&gt;This matters enormously in antennas because impedance mismatches reflect power backward.&lt;/p&gt;


&lt;h2&gt;
  
  
  Tiny Python Reactance Calculator
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;math&lt;/span&gt;

&lt;span class="n"&gt;f&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;14e6&lt;/span&gt;
&lt;span class="n"&gt;L&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;2e-6&lt;/span&gt;
&lt;span class="n"&gt;C&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;100e-12&lt;/span&gt;

&lt;span class="n"&gt;XL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pi&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;L&lt;/span&gt;
&lt;span class="n"&gt;XC&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pi&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;C&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;XL = &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;XL&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; ohms&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;XC = &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;XC&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; ohms&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;XL = 175.93 ohms
XC = 113.68 ohms
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;At resonance:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;XL = XC&lt;/li&gt;
&lt;li&gt;reactances cancel&lt;/li&gt;
&lt;li&gt;impedance becomes purely resistive&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Which is where antennas become happiest.&lt;/p&gt;


&lt;h2&gt;
  
  
  Standing Waves: The RF Version Of Yelling Into A Wall
&lt;/h2&gt;

&lt;p&gt;If impedance mismatches occur:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;radio -&amp;gt; coax -&amp;gt; mismatch -&amp;gt; reflection
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;part of the signal reflects backward.&lt;/p&gt;

&lt;p&gt;Forward and reflected waves interfere.&lt;/p&gt;

&lt;p&gt;That creates standing waves.&lt;/p&gt;

&lt;p&gt;Measured as:&lt;br&gt;
SWR — Standing Wave Ratio.&lt;/p&gt;

&lt;p&gt;Perfect:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1:1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Bad:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;5:1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Very bad:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;your transmitter becomes a tiny expensive heater
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Tiny Python SWR Calculator
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;Z0&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;
&lt;span class="n"&gt;ZL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;75&lt;/span&gt;

&lt;span class="n"&gt;gamma&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;abs&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;ZL&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;Z0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ZL&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;Z0&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;swr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;gamma&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;gamma&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SWR = &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;swr&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;:1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SWR = 1.50:1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Which is actually pretty decent.&lt;/p&gt;


&lt;h2&gt;
  
  
  What Is A Balun?
&lt;/h2&gt;

&lt;p&gt;Balun = &lt;strong&gt;BALanced to UNbalanced&lt;/strong&gt; transformer.&lt;/p&gt;

&lt;p&gt;This confused me for way too long.&lt;/p&gt;

&lt;p&gt;A dipole antenna is balanced:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;equal currents on both sides.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Coax cable is unbalanced:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;shield on one side,&lt;/li&gt;
&lt;li&gt;center conductor on the other.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Directly connecting them can cause RF current to flow down the outside of the coax shield.&lt;/p&gt;

&lt;p&gt;Which creates:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;distorted radiation patterns,&lt;/li&gt;
&lt;li&gt;weird interference,&lt;/li&gt;
&lt;li&gt;mysterious RF gremlins.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A balun fixes this.&lt;/p&gt;

&lt;p&gt;Common types:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;1:1 choke balun&lt;/li&gt;
&lt;li&gt;4:1 transformer balun&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The simplest balun is hilariously primitive:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;wrap coax into several loops
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;That’s it.&lt;/p&gt;

&lt;p&gt;Congratulations.&lt;br&gt;
You built RF wizardry.&lt;/p&gt;


&lt;h2&gt;
  
  
  What Is A Matching Network?
&lt;/h2&gt;

&lt;p&gt;A matching network transforms impedance so maximum power transfers.&lt;/p&gt;

&lt;p&gt;Because RF systems are extremely dramatic about impedance mismatches.&lt;/p&gt;

&lt;p&gt;Usually built using:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;capacitors,&lt;/li&gt;
&lt;li&gt;inductors,&lt;/li&gt;
&lt;li&gt;transmission lines.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Goal:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;antenna impedance -&amp;gt; 50 ohms
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;because most radios and coax systems use 50Ω.&lt;/p&gt;

&lt;p&gt;Matching networks are basically:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;translators,&lt;/li&gt;
&lt;li&gt;for electrical stubbornness.&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;
  
  
  Quarter-Wave Transformer: The Most Elegant RF Hack Ever
&lt;/h2&gt;

&lt;p&gt;This one genuinely delighted me.&lt;/p&gt;

&lt;p&gt;A transmission line cut to exactly λ/4 transforms impedance according to:&lt;/p&gt;


&lt;div class="katex-element"&gt;
  &lt;span class="katex-display"&gt;&lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;Zt=Z1Z2
Z_t = \sqrt{Z_1 Z_2}
&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;Z&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;t&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mrel"&gt;=&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord sqrt"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span class="svg-align"&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;Z&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;Z&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;2&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="hide-tail"&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/div&gt;



&lt;p&gt;Meaning:&lt;br&gt;
a carefully chosen quarter-wave cable section can match mismatched impedances.&lt;/p&gt;

&lt;p&gt;No active electronics.&lt;br&gt;
No DSP.&lt;br&gt;
No magic.&lt;/p&gt;

&lt;p&gt;Just geometry and wave physics.&lt;/p&gt;

&lt;p&gt;RF engineering contains an alarming amount of:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“this exact length of wire solves the problem somehow.”&lt;/p&gt;
&lt;/blockquote&gt;


&lt;h2&gt;
  
  
  What Is An Antenna Tuner?
&lt;/h2&gt;

&lt;p&gt;An antenna tuner (ATU) dynamically adjusts matching networks.&lt;/p&gt;

&lt;p&gt;Important subtle point:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;a tuner does NOT magically fix the antenna.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It mostly fixes what the radio sees.&lt;/p&gt;

&lt;p&gt;Which still matters enormously.&lt;/p&gt;

&lt;p&gt;Tuners usually contain:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;variable inductors,&lt;/li&gt;
&lt;li&gt;variable capacitors,&lt;/li&gt;
&lt;li&gt;switching networks.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You tweak knobs until SWR drops.&lt;/p&gt;

&lt;p&gt;Which feels halfway between engineering and safecracking.&lt;/p&gt;


&lt;h2&gt;
  
  
  Tiny Dipole Calculator
&lt;/h2&gt;

&lt;p&gt;Suppose we want a dipole for 433 MHz.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;299792458&lt;/span&gt;
&lt;span class="n"&gt;f&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;433e6&lt;/span&gt;

&lt;span class="n"&gt;wavelength&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;
&lt;span class="n"&gt;dipole_total&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;wavelength&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Wavelength: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;wavelength&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; m&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Dipole length: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;dipole_total&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; m&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Wavelength: 0.692 m
Dipole length: 0.346 m
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Each side:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;17.3 cm
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;You literally cut two wires.&lt;/p&gt;

&lt;p&gt;And somehow that lets you interact with invisible oscillating spacetime fields.&lt;/p&gt;

&lt;p&gt;Still feels slightly illegal.&lt;/p&gt;


&lt;h2&gt;
  
  
  Gain: The Thing Marketing Departments Abuse Constantly
&lt;/h2&gt;

&lt;p&gt;Gain is NOT amplification.&lt;/p&gt;

&lt;p&gt;Passive antennas do not create energy.&lt;/p&gt;

&lt;p&gt;Gain means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;focusing energy directionally.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Flashlight vs bare bulb.&lt;/p&gt;

&lt;p&gt;Same power.&lt;br&gt;
Different distribution.&lt;/p&gt;

&lt;p&gt;Dipole:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;~2.15 dBi&lt;/li&gt;
&lt;li&gt;donut-shaped radiation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Yagi:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;directional beam&lt;/li&gt;
&lt;li&gt;higher gain&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Dish:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;microwave death laser plate&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Every extra dB narrows beamwidth.&lt;/p&gt;

&lt;p&gt;Physics always charges rent somewhere.&lt;/p&gt;


&lt;h2&gt;
  
  
  Tiny Python Gain/EIRP Calculator
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;tx_power_dbm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;
&lt;span class="n"&gt;antenna_gain_dbi&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;
&lt;span class="n"&gt;cable_loss_db&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;

&lt;span class="n"&gt;eirp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tx_power_dbm&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;antenna_gain_dbi&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;cable_loss_db&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;EIRP = &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;eirp&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; dBm&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;EIRP = 26 dBm
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Practical Antenna Building Workflow
&lt;/h2&gt;

&lt;p&gt;The actual engineering workflow finally became clear to me:&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 1 — Choose Frequency
&lt;/h3&gt;

&lt;p&gt;Everything begins with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;433 MHz&lt;/li&gt;
&lt;li&gt;915 MHz&lt;/li&gt;
&lt;li&gt;2.4 GHz&lt;/li&gt;
&lt;li&gt;etc.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Step 2 — Calculate Wavelength
&lt;/h3&gt;


&lt;div class="katex-element"&gt;
  &lt;span class="katex-display"&gt;&lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;λ=cf
\lambda = \frac{c}{f}
&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;λ&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mrel"&gt;=&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mopen nulldelimiter"&gt;&lt;/span&gt;&lt;span class="mfrac"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;f&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="frac-line"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;c&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mclose nulldelimiter"&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 3 — Pick Geometry
&lt;/h3&gt;

&lt;p&gt;Need omnidirectional?&lt;br&gt;
→ dipole&lt;/p&gt;

&lt;p&gt;Need directional?&lt;br&gt;
→ Yagi&lt;/p&gt;

&lt;p&gt;Need compact?&lt;br&gt;
→ patch or loop&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 4 — Match Impedance
&lt;/h3&gt;

&lt;p&gt;Usually:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;antenna ≈ 50–75Ω&lt;/li&gt;
&lt;li&gt;coax = 50Ω&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Use:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;balun,&lt;/li&gt;
&lt;li&gt;tuner,&lt;/li&gt;
&lt;li&gt;matching network,&lt;/li&gt;
&lt;li&gt;quarter-wave transformer.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Step 5 — Tune
&lt;/h3&gt;

&lt;p&gt;Trim gradually while measuring SWR.&lt;/p&gt;

&lt;p&gt;Every RF person repeats this religiously:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;cut long first.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Because you can remove wire.&lt;br&gt;
You cannot emotionally recover from cutting it too short.&lt;/p&gt;


&lt;h2&gt;
  
  
  The Tool That Made RF Feel Real
&lt;/h2&gt;

&lt;p&gt;The first time I connected a NanoVNA to an antenna and watched resonance appear exactly where the equations predicted…&lt;/p&gt;

&lt;p&gt;…it was over.&lt;/p&gt;

&lt;p&gt;I was hooked.&lt;/p&gt;

&lt;p&gt;You sweep frequency.&lt;/p&gt;

&lt;p&gt;SWR dips appear.&lt;/p&gt;

&lt;p&gt;Resonance moves when you trim wire.&lt;/p&gt;

&lt;p&gt;And suddenly:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Maxwell’s equations stop feeling theoretical.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;You are watching physics happen live.&lt;/p&gt;

&lt;p&gt;The NanoVNA ecosystem is honestly incredible for hobby RF work. The official NanoVNA project and software ecosystem are here:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://nanovna.com/" rel="noopener noreferrer"&gt;NanoVNA Official Site&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://nanorfe.com/nanovna-v2.html" rel="noopener noreferrer"&gt;NanoRFE NanoVNA V2&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://nanovna.com/?page_id=90" rel="noopener noreferrer"&gt;NanoVNA Saver Software&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;
  
  
  Antenna Simulation Feels Like A Superpower
&lt;/h2&gt;

&lt;p&gt;Another thing that completely changed the game for me was antenna modeling software.&lt;/p&gt;

&lt;p&gt;Tools like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.qsl.net/4nec2/index.html" rel="noopener noreferrer"&gt;4NEC2 Antenna Modeler&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.eznec.com/" rel="noopener noreferrer"&gt;EZNEC&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.nec2.org/" rel="noopener noreferrer"&gt;NEC2 Documentation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;let you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;simulate radiation patterns,&lt;/li&gt;
&lt;li&gt;impedance,&lt;/li&gt;
&lt;li&gt;gain,&lt;/li&gt;
&lt;li&gt;SWR,&lt;/li&gt;
&lt;li&gt;current distributions,&lt;/li&gt;
&lt;li&gt;near/far fields,&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;before cutting any metal.&lt;/p&gt;

&lt;p&gt;Which means:&lt;br&gt;
you can literally watch Maxwell’s equations numerically solve your antenna.&lt;/p&gt;

&lt;p&gt;That still feels absurdly futuristic.&lt;/p&gt;
&lt;h2&gt;
  
  
  The Weirdly Beautiful Part Of All This
&lt;/h2&gt;

&lt;p&gt;The deeper I got into antennas, the more everything started collapsing into one giant unified oscillator story.&lt;/p&gt;

&lt;p&gt;Springs.&lt;br&gt;
Pendulums.&lt;br&gt;
LC circuits.&lt;br&gt;
Standing waves.&lt;br&gt;
Light.&lt;/p&gt;

&lt;p&gt;The universe keeps reusing the same mathematics because oscillation is deeply fundamental.&lt;/p&gt;

&lt;p&gt;And antennas are one of the purest examples of that.&lt;/p&gt;

&lt;p&gt;A carefully sized piece of metal starts coupling energy into spacetime itself.&lt;/p&gt;

&lt;p&gt;That sentence sounds fake.&lt;/p&gt;

&lt;p&gt;But it’s literally what’s happening.&lt;/p&gt;


&lt;h2&gt;
  
  
  The Mental Model That Finally Made It Click
&lt;/h2&gt;

&lt;p&gt;Here’s the final simplified picture that made antennas intuitive for me:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Battery:
pushes charge steadily

LC circuit:
sloshes energy back and forth

Antenna:
sloshes energy back and forth
AND leaks some of it into space
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That’s radio.&lt;/p&gt;

&lt;p&gt;That’s basically the whole thing.&lt;/p&gt;

&lt;p&gt;Everything else:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;impedance matching,&lt;/li&gt;
&lt;li&gt;SWR,&lt;/li&gt;
&lt;li&gt;baluns,&lt;/li&gt;
&lt;li&gt;gain,&lt;/li&gt;
&lt;li&gt;feedlines,&lt;/li&gt;
&lt;li&gt;radiation patterns,&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;is engineering optimization around that core phenomenon.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Most Memorable Insight I Took Away
&lt;/h2&gt;

&lt;p&gt;This line stayed with me:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“A spark is what happens when energy can’t escape. A radio wave is what happens when it can.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That’s basically the difference between:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a Tesla coil,&lt;/li&gt;
&lt;li&gt;and a transmitter tower.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One traps energy.&lt;/p&gt;

&lt;p&gt;One launches it.&lt;/p&gt;

&lt;p&gt;And somehow all of that emerges from:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;moving electrons,&lt;/li&gt;
&lt;li&gt;oscillating fields,&lt;/li&gt;
&lt;li&gt;and a carefully sized piece of metal.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Which honestly still feels slightly magical.&lt;/p&gt;

</description>
      <category>sdr</category>
      <category>hackrf</category>
      <category>maxwell</category>
      <category>radio</category>
    </item>
    <item>
      <title>Training LLMs on Mixed GPUs: My Experiments and What I Learnt</title>
      <dc:creator>Rushi Chaudhari</dc:creator>
      <pubDate>Fri, 28 Nov 2025 16:59:38 +0000</pubDate>
      <link>https://forem.com/rushichaudhari/training-llms-on-mixed-gpus-my-experiments-and-what-i-learnt-1k7n</link>
      <guid>https://forem.com/rushichaudhari/training-llms-on-mixed-gpus-my-experiments-and-what-i-learnt-1k7n</guid>
      <description>&lt;p&gt;In the last few months, I have been very interested in large language models. At the same time, the GPU world is also changing. Nvidia is still the market leader, but AMD, Intel, and even Chinese companies are making cheaper GPUs. The main challenge is that CUDA is still the dominant software stack, and Nvidia drivers are not open source. Because of this, using non‑Nvidia GPUs is still not smooth.&lt;/p&gt;

&lt;p&gt;As someone who runs a homelab, I wanted a setup where I can use different GPUs together. But even mixing two Nvidia GPUs of different generations is hard. If you upgrade from RTX 3090 to RTX 5090, you may need a different CUDA version, a different Python version, and a different PyTorch version. New architectures like Blackwell also take time to enter mainstream frameworks.&lt;/p&gt;

&lt;p&gt;So many people end up buying the same model GPU again just to do multi‑GPU training.&lt;/p&gt;

&lt;p&gt;I wanted to avoid that and see if mixed‑GPU training is possible.&lt;/p&gt;

&lt;h2&gt;
  
  
  System Architecture Diagram
&lt;/h2&gt;

&lt;p&gt;The system auto generates a topology diagram after you configure and run the coordinator once. The generated file is saved at &lt;code&gt;architecture.png&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq42p1m77gei2rfwkjmoa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq42p1m77gei2rfwkjmoa.png" alt="architecture" width="800" height="107"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What Current ML Systems Support
&lt;/h2&gt;

&lt;p&gt;I looked into many systems like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;DeepSpeed&lt;/li&gt;
&lt;li&gt;Megatron‑LM&lt;/li&gt;
&lt;li&gt;PyTorch Distributed + TorchGpipe&lt;/li&gt;
&lt;li&gt;vLLM&lt;/li&gt;
&lt;li&gt;Colossal‑AI&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All of these are powerful, but none properly support mixing CUDA and ROCm GPUs in one training job.&lt;/p&gt;

&lt;p&gt;There is something called UCC (Unified Collective Communication) that tries to help. But the PyTorch integration here (torch‑ucc) is still experimental and archived:&lt;br&gt;
&lt;a href="https://github.com/openucx/torch-ucc" rel="noopener noreferrer"&gt;https://github.com/openucx/torch-ucc&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;UCX developers also said here that CUDA and ROCm support is “in theory”, but mixed setups were never fully tested:&lt;br&gt;
&lt;a href="https://github.com/openucx/ucx/discussions/9985" rel="noopener noreferrer"&gt;https://github.com/openucx/ucx/discussions/9985&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;So true heterogeneous GPU training is still not ready in major frameworks.&lt;/p&gt;

&lt;h2&gt;
  
  
  Papers Trying to Solve This
&lt;/h2&gt;

&lt;p&gt;I found some research papers that aim to solve heterogeneous GPU training:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;HetHub
&lt;a href="https://arxiv.org/pdf/2405.16256" rel="noopener noreferrer"&gt;https://arxiv.org/pdf/2405.16256&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;HyperPipe
&lt;a href="https://ieeexplore.ieee.org/document/11033309" rel="noopener noreferrer"&gt;https://ieeexplore.ieee.org/document/11033309&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Cephalo
&lt;a href="https://dl.acm.org/doi/10.1145/3721145.3730418" rel="noopener noreferrer"&gt;https://dl.acm.org/doi/10.1145/3721145.3730418&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;HeterMoE
&lt;a href="https://arxiv.org/pdf/2504.03871" rel="noopener noreferrer"&gt;https://arxiv.org/pdf/2504.03871&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Zorse
&lt;a href="https://arxiv.org/abs/2507.10392" rel="noopener noreferrer"&gt;https://arxiv.org/abs/2507.10392&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These papers show that the idea is possible, but:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;None of these are open source&lt;/li&gt;
&lt;li&gt;Real‑world implementations are still missing&lt;/li&gt;
&lt;li&gt;Homelab users cannot use these systems directly&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Because of all these limitations, I decided to build my own simple framework.&lt;/p&gt;

&lt;h2&gt;
  
  
  How my HeteroGPU framework enables mixed‑GPU pipeline training in homelabs
&lt;/h2&gt;

&lt;p&gt;My goal was very simple:&lt;/p&gt;

&lt;p&gt;I wanted to run LLM training across different GPUs in my homelab, even if they belong to different generations or vendors, without depending on complicated distributed frameworks.&lt;/p&gt;

&lt;p&gt;My HeteroGPU framework helps to do this by providing:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Layer‑based pipeline parallelism&lt;br&gt;
The model is split by layers so it can run across GPUs with different VRAM sizes.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Simple coordinator–worker design&lt;br&gt;
The main machine holds the first part of the model. Remote machines run later layers. They communicate using a lightweight socket interface over 10Gb ethernet or thunderbolt (not implemented).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Support for mixed GPU speeds&lt;br&gt;
Faster GPU can take more layers, slower GPU can take fewer layers.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Small and hackable codebase&lt;br&gt;
Ideal for homelab experimentation, unlike large frameworks like DeepSpeed.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Profiler inspired by Cephalo&lt;br&gt;
Helps decide how to split layers between GPUs based on compute speed, memory capacity, and communication delay.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Works even when GPUs require different drivers or CUDA versions&lt;br&gt;
Because each machine only loads its own shard locally and communicates via raw tensors over the network, you do not need unified CUDA versions.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This makes heterogeneous pipeline training practical for home users who may have a strong Nvidia GPU as main device, an older GPU on another machine, or even an integrated GPU like Strix Halo. With this design, training becomes possible even if a single GPU cannot fit the model.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick Explanation of Parallelism
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Data Parallelism: Copy the whole model to each GPU and split the batch.&lt;/li&gt;
&lt;li&gt;Tensor / Model Parallelism: Split each layer across GPUs. Very communication heavy.&lt;/li&gt;
&lt;li&gt;Pipeline Parallelism: Split the model layer‑wise. GPU 1 runs early layers, GPU 2 runs later layers.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Pipeline parallelism is the easiest for mixed GPUs. The only drawback is that transformers often cause one GPU to wait while the other works. But it still allows training when a model cannot fit into one GPU.&lt;/p&gt;

&lt;h2&gt;
  
  
  My Experiments With LLaMA Finetuning
&lt;/h2&gt;

&lt;p&gt;I tested the same training script on:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;RTX 5090 single GPU&lt;/li&gt;
&lt;li&gt;AMD Strix Halo single GPU&lt;/li&gt;
&lt;li&gt;Two‑machine pipeline setup&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The results showed how mixed GPU training behaves.&lt;/p&gt;

&lt;h2&gt;
  
  
  RTX 5090 (Single GPU)
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;» python examples/alpaca_example_singlemachine.py
Using device: cuda
`torch_dtype` is deprecated! Use `dtype` instead!
trainable params: 13,631,488 || all params: 8,043,892,736 || trainable%: 0.1695
Epoch 0 | Step 10 | Loss 2.4383 | LR 0.000020
Epoch 0 | Step 20 | Loss 1.8139 | LR 0.000040
Epoch 0 | Step 30 | Loss 1.4709 | LR 0.000060
Epoch 0 | Step 40 | Loss 1.2903 | LR 0.000080
Epoch 0 | Step 50 | Loss 1.2693 | LR 0.000100
Epoch 0 | Step 60 | Loss 1.2671 | LR 0.000120
Saved LoRA adapters to: ./lora_unsloth_sft/lora
Training complete.

Sample generation:
 &amp;lt;s&amp;gt;You are a helpful assistant.
&amp;lt;|user|&amp;gt;
Write a haiku about GPUs.
&amp;lt;|assistant|&amp;gt;
In the lab, the GPU
Is the heart of the machine,
Running calculations.
&amp;lt;/s&amp;gt;

Total training time: 289.11 seconds
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Training time: 289 seconds&lt;br&gt;
Loss dropped smoothly from 2.43 to 1.26. &lt;br&gt;
Fast and stable.&lt;/p&gt;

&lt;h2&gt;
  
  
  Strix Halo (Single GPU)
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ python examples/alpaca_example_singlemachine.py 
Using device: cuda
`torch_dtype` is deprecated! Use `dtype` instead!
g++ (GCC) 15.2.1 20250813
Copyright (C) 2025 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

trainable params: 13,631,488 || all params: 8,043,892,736 || trainable%: 0.1695

Epoch 0 | Step 10 | Loss 2.4027 | LR 0.000020
Epoch 0 | Step 20 | Loss 1.8115 | LR 0.000040
Epoch 0 | Step 30 | Loss 1.2460 | LR 0.000060
Epoch 0 | Step 40 | Loss 1.4227 | LR 0.000080
Epoch 0 | Step 50 | Loss 1.2628 | LR 0.000100
Epoch 0 | Step 60 | Loss 1.2507 | LR 0.000120
Saved LoRA adapters to: ./lora_unsloth_sft/lora
Training complete.

Sample generation:
 &amp;lt;s&amp;gt;You are a helpful assistant.
&amp;lt;|user|&amp;gt;
Write a haiku about GPUs.
&amp;lt;|assistant|&amp;gt;
A GPU, a powerful tool
For processing data and computing
A helpful aid for many a task.
&amp;lt;/s&amp;gt;

Total training time: 3242.91 seconds
(.venv) [alpha@toolbx HeteroShard]$ 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Training time: 3243 seconds&lt;br&gt;
Loss also went down correctly, but speed was extremely slow. Around 11 times slower than the 5090. This shows the large performance gap between GPU types.&lt;/p&gt;

&lt;h2&gt;
  
  
  Distributed Pipeline Training (Two GPUs)
&lt;/h2&gt;

&lt;p&gt;Expand for full logs&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;» python examples/demo_llama8b4bit_distributed.py --config hetero_config.json
📍 This machine: doraemon-arch (192.168.1.153)
✓ Role: COORDINATOR

======================================================================
COORDINATOR MODE - LLAMA 8B 4-BIT TRAINING
======================================================================

Device: cuda
Worker: worker1 (192.168.1.166:9999)
Split: Layers 0-15 (local) | 16-31 (remote)

Connecting to worker...
✓ Connected

Loading tokenizer...
Loading model...
`torch_dtype` is deprecated! Use `dtype` instead!
trainable params: 13,631,488 || all params: 8,043,892,736 || trainable%: 0.1695

Creating local shard...
✓ Local shard ready (Embedding + Layers 0-15)

Loading dataset...
✓ Dataset: 100 examples

======================================================================
TRAINING
======================================================================
Steps: 25 | Batch: 1 | Accum: 4

/mnt/sdc3/Documents/hetrogpu/.venv/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py:1044: UserWarning: torch.utils.checkpoint: the use_reentrant parameter should be passed explicitly. Starting in PyTorch 2.9, calling checkpoint without use_reentrant will raise an exception. use_reentrant=False is recommended, but if you need to preserve the current default behavior, you can pass use_reentrant=True. Refer to docs for more details on the differences between the two variants.
  return fn(*args, **kwargs)
Epoch 0 | Step 1/25 | Loss 2.3243 | LR 0.000020
Epoch 0 | Step 2/25 | Loss 2.4754 | LR 0.000040
Epoch 0 | Step 3/25 | Loss 2.4923 | LR 0.000060
Epoch 0 | Step 4/25 | Loss 2.7389 | LR 0.000080
Epoch 0 | Step 5/25 | Loss 2.1877 | LR 0.000100
Epoch 0 | Step 6/25 | Loss 2.0371 | LR 0.000120
Epoch 0 | Step 7/25 | Loss 2.3928 | LR 0.000140
Epoch 0 | Step 8/25 | Loss 1.5122 | LR 0.000160
Epoch 0 | Step 9/25 | Loss 1.9724 | LR 0.000180
Epoch 0 | Step 10/25 | Loss 2.2792 | LR 0.000200
Epoch 0 | Step 11/25 | Loss 1.9573 | LR 0.000198
Epoch 0 | Step 12/25 | Loss 1.4388 | LR 0.000192
Epoch 0 | Step 13/25 | Loss 1.8510 | LR 0.000183
Epoch 0 | Step 14/25 | Loss 1.6279 | LR 0.000170
Epoch 0 | Step 15/25 | Loss 1.4549 | LR 0.000155
Epoch 0 | Step 16/25 | Loss 1.2129 | LR 0.000138
Epoch 0 | Step 17/25 | Loss 1.3626 | LR 0.000119
Epoch 0 | Step 18/25 | Loss 1.2285 | LR 0.000101
Epoch 0 | Step 19/25 | Loss 1.4700 | LR 0.000082
Epoch 0 | Step 20/25 | Loss 1.3244 | LR 0.000065
Epoch 0 | Step 21/25 | Loss 1.4875 | LR 0.000050
Epoch 0 | Step 22/25 | Loss 1.4656 | LR 0.000037
Epoch 0 | Step 23/25 | Loss 1.0804 | LR 0.000028
Epoch 0 | Step 24/25 | Loss 1.5531 | LR 0.000022
Epoch 0 | Step 25/25 | Loss 1.0947 | LR 0.000020

✓ Training complete!
Total training time: 184.59 seconds
Saved LoRA adapters to: ./lora_unsloth_sft_distributed/lora

Sample generation:
 You are a helpful assistant.
&amp;amp;lt;|user|&amp;amp;gt;
Write a short haiku about distributed training.
&amp;amp;lt;|assistant|&amp;amp;gt;
Distributed training,
Like a symphony,
All the parts work together.






--- 



$ python examples/demo_llama8b4bit_distributed.py --config hetero_config.json
📍 This machine: toolbx (192.168.1.166)
✓ Role: WORKER 1

======================================================================
WORKER MODE - LLAMA 8B 4-BIT (LAYERS 16-31)
======================================================================

Device: cuda
Port: 9999

Loading model...
`torch_dtype` is deprecated! Use `dtype` instead!
g++ (GCC) 15.2.1 20250813
Copyright (C) 2025 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Creating remote shard...
✓ Remote shard ready (Layers 16-31)

Listening on 0.0.0.0:9999...
✓ Connected to coordinator at ('192.168.1.153', 46384)

[Step 0] Waiting for data...
/torch-therock/.venv/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py:1035: UserWarning: torch.utils.checkpoint: the use_reentrant parameter should be passed explicitly. Starting in PyTorch 2.9, calling checkpoint without use_reentrant will raise an exception. use_reentrant=False is recommended, but if you need to preserve the current default behavior, you can pass use_reentrant=True. Refer to docs for more details on the differences between the two variants.
  return fn(*args, **kwargs)
[Step 0] Loss: 1.6613
[Step 0] ✓ Complete

[Step 1] Waiting for data...
[Step 1] Loss: 2.5880
[Step 1] ✓ Complete

[Step 2] Waiting for data...
[Step 2] Loss: 3.1850
[Step 2] ✓ Complete

[Step 3] Waiting for data...
[Step 3] Loss: 1.8631
[Step 3] ✓ Complete

[Step 4] Waiting for data...
[Step 4] Loss: 2.3016
[Step 4] ✓ Complete

[Step 5] Waiting for data...
[Step 5] Loss: 2.4796
[Step 5] ✓ Complete

[Step 6] Waiting for data...
[Step 6] Loss: 2.7196
[Step 6] ✓ Complete

[Step 7] Waiting for data...
[Step 7] Loss: 2.4008
[Step 7] ✓ Complete

[Step 8] Waiting for data...
[Step 8] Loss: 1.9301
[Step 8] ✓ Complete

[Step 9] Waiting for data...
[Step 9] Loss: 1.9098
[Step 9] ✓ Complete

[Step 10] Waiting for data...
[Step 10] Loss: 3.0177
[Step 10] ✓ Complete

[Step 11] Waiting for data...
[Step 11] Loss: 3.1114
[Step 11] ✓ Complete

[Step 12] Waiting for data...
[Step 12] Loss: 1.7507
[Step 12] ✓ Complete

[Step 13] Waiting for data...
[Step 13] Loss: 3.0108
[Step 13] ✓ Complete

[Step 14] Waiting for data...
[Step 14] Loss: 2.5046
[Step 14] ✓ Complete

[Step 15] Waiting for data...
[Step 15] Loss: 3.6894
[Step 15] ✓ Complete

[Step 16] Waiting for data...
[Step 16] Loss: 1.8336
[Step 16] ✓ Complete

[Step 17] Waiting for data...
[Step 17] Loss: 1.5026
[Step 17] ✓ Complete

[Step 18] Waiting for data...
[Step 18] Loss: 3.4676
[Step 18] ✓ Complete

[Step 19] Waiting for data...
[Step 19] Loss: 1.9469
[Step 19] ✓ Complete

[Step 20] Waiting for data...
[Step 20] Loss: 2.0781
[Step 20] ✓ Complete

[Step 21] Waiting for data...
[Step 21] Loss: 1.7651
[Step 21] ✓ Complete

[Step 22] Waiting for data...
[Step 22] Loss: 2.0139
[Step 22] ✓ Complete

[Step 23] Waiting for data...
[Step 23] Loss: 2.2912
[Step 23] ✓ Complete

[Step 24] Waiting for data...
[Step 24] Loss: 2.6897
[Step 24] ✓ Complete

[Step 25] Waiting for data...
[Step 25] Loss: 2.8378
[Step 25] ✓ Complete

[Step 26] Waiting for data...
[Step 26] Loss: 1.9898
[Step 26] ✓ Complete

[Step 27] Waiting for data...
[Step 27] Loss: 2.0538
[Step 27] ✓ Complete

[Step 28] Waiting for data...
[Step 28] Loss: 1.6081
[Step 28] ✓ Complete

[Step 29] Waiting for data...
[Step 29] Loss: 1.4623
[Step 29] ✓ Complete

[Step 30] Waiting for data...
[Step 30] Loss: 1.2606
[Step 30] ✓ Complete

[Step 31] Waiting for data...
[Step 31] Loss: 1.7178
[Step 31] ✓ Complete

[Step 32] Waiting for data...
[Step 32] Loss: 1.9203
[Step 32] ✓ Complete

[Step 33] Waiting for data...
[Step 33] Loss: 1.6814
[Step 33] ✓ Complete

[Step 34] Waiting for data...
[Step 34] Loss: 2.5819
[Step 34] ✓ Complete

[Step 35] Waiting for data...
[Step 35] Loss: 1.7061
[Step 35] ✓ Complete

[Step 36] Waiting for data...
[Step 36] Loss: 2.3311
[Step 36] ✓ Complete

[Step 37] Waiting for data...
[Step 37] Loss: 2.2990
[Step 37] ✓ Complete

[Step 38] Waiting for data...
[Step 38] Loss: 1.8855
[Step 38] ✓ Complete

[Step 39] Waiting for data...
[Step 39] Loss: 2.6010
[Step 39] ✓ Complete

[Step 40] Waiting for data...
[Step 40] Loss: 2.3807
[Step 40] ✓ Complete

[Step 41] Waiting for data...
[Step 41] Loss: 2.0204
[Step 41] ✓ Complete

[Step 42] Waiting for data...
[Step 42] Loss: 1.7209
[Step 42] ✓ Complete

[Step 43] Waiting for data...
[Step 43] Loss: 1.7073
[Step 43] ✓ Complete

[Step 44] Waiting for data...
[Step 44] Loss: 1.1900
[Step 44] ✓ Complete

[Step 45] Waiting for data...
[Step 45] Loss: 1.8439
[Step 45] ✓ Complete

[Step 46] Waiting for data...
[Step 46] Loss: 1.1291
[Step 46] ✓ Complete

[Step 47] Waiting for data...
[Step 47] Loss: 1.5923
[Step 47] ✓ Complete

[Step 48] Waiting for data...
[Step 48] Loss: 1.9110
[Step 48] ✓ Complete

[Step 49] Waiting for data...
[Step 49] Loss: 1.1971
[Step 49] ✓ Complete

[Step 50] Waiting for data...
[Step 50] Loss: 3.0576
[Step 50] ✓ Complete

[Step 51] Waiting for data...
[Step 51] Loss: 1.2383
[Step 51] ✓ Complete

[Step 52] Waiting for data...
[Step 52] Loss: 1.6820
[Step 52] ✓ Complete

[Step 53] Waiting for data...
[Step 53] Loss: 1.7755
[Step 53] ✓ Complete

[Step 54] Waiting for data...
[Step 54] Loss: 1.2515
[Step 54] ✓ Complete

[Step 55] Waiting for data...
[Step 55] Loss: 1.8027
[Step 55] ✓ Complete

[Step 56] Waiting for data...
[Step 56] Loss: 1.2692
[Step 56] ✓ Complete

[Step 57] Waiting for data...
[Step 57] Loss: 1.6293
[Step 57] ✓ Complete

[Step 58] Waiting for data...
[Step 58] Loss: 1.1256
[Step 58] ✓ Complete

[Step 59] Waiting for data...
[Step 59] Loss: 1.7956
[Step 59] ✓ Complete

[Step 60] Waiting for data...
[Step 60] Loss: 1.3114
[Step 60] ✓ Complete

[Step 61] Waiting for data...
[Step 61] Loss: 1.4944
[Step 61] ✓ Complete

[Step 62] Waiting for data...
[Step 62] Loss: 0.9233
[Step 62] ✓ Complete

[Step 63] Waiting for data...
[Step 63] Loss: 1.1224
[Step 63] ✓ Complete

[Step 64] Waiting for data...
[Step 64] Loss: 1.4849
[Step 64] ✓ Complete

[Step 65] Waiting for data...
[Step 65] Loss: 1.0226
[Step 65] ✓ Complete

[Step 66] Waiting for data...
[Step 66] Loss: 1.3064
[Step 66] ✓ Complete

[Step 67] Waiting for data...
[Step 67] Loss: 1.6367
[Step 67] ✓ Complete

[Step 68] Waiting for data...
[Step 68] Loss: 1.6595
[Step 68] ✓ Complete

[Step 69] Waiting for data...
[Step 69] Loss: 1.3235
[Step 69] ✓ Complete

[Step 70] Waiting for data...
[Step 70] Loss: 0.8673
[Step 70] ✓ Complete

[Step 71] Waiting for data...
[Step 71] Loss: 1.0639
[Step 71] ✓ Complete

[Step 72] Waiting for data...
[Step 72] Loss: 1.6803
[Step 72] ✓ Complete

[Step 73] Waiting for data...
[Step 73] Loss: 1.5877
[Step 73] ✓ Complete

[Step 74] Waiting for data...
[Step 74] Loss: 1.3728
[Step 74] ✓ Complete

[Step 75] Waiting for data...
[Step 75] Loss: 1.2393
[Step 75] ✓ Complete

[Step 76] Waiting for data...
[Step 76] Loss: 1.4007
[Step 76] ✓ Complete

[Step 77] Waiting for data...
[Step 77] Loss: 0.9818
[Step 77] ✓ Complete

[Step 78] Waiting for data...
[Step 78] Loss: 1.3658
[Step 78] ✓ Complete

[Step 79] Waiting for data...
[Step 79] Loss: 1.5493
[Step 79] ✓ Complete

[Step 80] Waiting for data...
[Step 80] Loss: 1.3884
[Step 80] ✓ Complete

[Step 81] Waiting for data...
[Step 81] Loss: 1.3920
[Step 81] ✓ Complete

[Step 82] Waiting for data...
[Step 82] Loss: 1.9356
[Step 82] ✓ Complete

[Step 83] Waiting for data...
[Step 83] Loss: 1.2340
[Step 83] ✓ Complete

[Step 84] Waiting for data...
[Step 84] Loss: 1.2280
[Step 84] ✓ Complete

[Step 85] Waiting for data...
[Step 85] Loss: 1.7844
[Step 85] ✓ Complete

[Step 86] Waiting for data...
[Step 86] Loss: 1.2704
[Step 86] ✓ Complete

[Step 87] Waiting for data...
[Step 87] Loss: 1.5795
[Step 87] ✓ Complete

[Step 88] Waiting for data...
[Step 88] Loss: 0.9333
[Step 88] ✓ Complete

[Step 89] Waiting for data...
[Step 89] Loss: 0.9236
[Step 89] ✓ Complete

[Step 90] Waiting for data...
[Step 90] Loss: 1.0831
[Step 90] ✓ Complete

[Step 91] Waiting for data...
[Step 91] Loss: 1.3817
[Step 91] ✓ Complete

[Step 92] Waiting for data...
[Step 92] Loss: 1.3752
[Step 92] ✓ Complete

[Step 93] Waiting for data...
[Step 93] Loss: 1.9094
[Step 93] ✓ Complete

[Step 94] Waiting for data...
[Step 94] Loss: 1.6458
[Step 94] ✓ Complete

[Step 95] Waiting for data...
[Step 95] Loss: 1.2820
[Step 95] ✓ Complete

[Step 96] Waiting for data...
[Step 96] Loss: 1.5715
[Step 96] ✓ Complete

[Step 97] Waiting for data...
[Step 97] Loss: 0.8391
[Step 97] ✓ Complete

[Step 98] Waiting for data...
[Step 98] Loss: 0.9126
[Step 98] ✓ Complete

[Step 99] Waiting for data...
[Step 99] Loss: 1.0555
[Step 99] ✓ Complete

[Step 100] Waiting for data...
Connection closed.
(.venv) [alpha@toolbx HeteroShard]$ 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;p&gt;Training time: 184 seconds&lt;/p&gt;

&lt;p&gt;Model was split:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Layers 0–15 on the main machine&lt;/li&gt;
&lt;li&gt;Layers 16–31 on the worker machine&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Both GPUs handled their parts. Worker logs show: Waiting for data, Loss, Complete. This shows the pipeline stalls, which is expected. Still, the total time was faster than the single 5090.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Learnt From These Runs
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Mixed‑GPU pipeline training works in real life, not just in papers.&lt;/li&gt;
&lt;li&gt;Speed depends on the slowest GPU, so good splitting is important.&lt;/li&gt;
&lt;li&gt;Distributed training has waiting time and communication cost, but still can beat a single strong GPU.&lt;/li&gt;
&lt;li&gt;Consumer GPUs vary hugely in speed, which is why homelab users need flexible systems.&lt;/li&gt;
&lt;li&gt;A simple framework like HeteroGPU can achieve things that big frameworks do not support yet.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  My Profiler System
&lt;/h2&gt;

&lt;p&gt;The profiler I added does the following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Runs tiny batches on each GPU&lt;/li&gt;
&lt;li&gt;Measures latency and memory usage&lt;/li&gt;
&lt;li&gt;Builds simple linear models to predict performance&lt;/li&gt;
&lt;li&gt;Measures communication cost&lt;/li&gt;
&lt;li&gt;Chooses the best pipeline split&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This matches the idea in the Cephalo paper:&lt;br&gt;
&lt;a href="https://dl.acm.org/doi/10.1145/3721145.3730418" rel="noopener noreferrer"&gt;https://dl.acm.org/doi/10.1145/3721145.3730418&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This allows the system to work even when one GPU is fast but low VRAM, and another GPU is slow but high VRAM.&lt;/p&gt;

&lt;h2&gt;
  
  
  Next Steps
&lt;/h2&gt;

&lt;p&gt;Now I plan to experiment with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;HeterMoE: &lt;a href="https://arxiv.org/pdf/2504.03871" rel="noopener noreferrer"&gt;https://arxiv.org/pdf/2504.03871&lt;/a&gt; or maybe&lt;/li&gt;
&lt;li&gt;Zorse: &lt;a href="https://arxiv.org/abs/2507.10392" rel="noopener noreferrer"&gt;https://arxiv.org/abs/2507.10392&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;MoE (Mixture‑of‑Experts) models are naturally suited for heterogeneous hardware, so they may perform better in mixed GPU clusters.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Github repo&lt;/strong&gt;: &lt;a href="https://github.com/0xrushi/HeteroShard" rel="noopener noreferrer"&gt;https://github.com/0xrushi/HeteroShard&lt;/a&gt;&lt;/p&gt;

</description>
      <category>llm</category>
      <category>nvidia</category>
      <category>amd</category>
      <category>dgx</category>
    </item>
    <item>
      <title>Is Google Colab Pro Really Worth It?</title>
      <dc:creator>Rushi Chaudhari</dc:creator>
      <pubDate>Wed, 01 May 2024 05:31:01 +0000</pubDate>
      <link>https://forem.com/rushichaudhari/is-google-colab-pro-really-worth-it-5531</link>
      <guid>https://forem.com/rushichaudhari/is-google-colab-pro-really-worth-it-5531</guid>
      <description>&lt;p&gt;In late 2022, Google revamped its widely-used Colab platform, transitioning from a subscription-based system to a pay-as-you-go model under the new Colab Pro and Pro+ schemes. This change introduced "compute units," which serve as the new currency within the platform, where the consumption rate depends on the virtual machine's configuration and the use of specialized accelerators like TPUs or GPUs.&lt;/p&gt;

&lt;p&gt;Here's a breakdown of how the compute units are consumed based on different GPUs, assuming an allocation of 100 units:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;T4&lt;/strong&gt;: Consumes 1.96 units per hour, providing about 51 hours of use.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;V100&lt;/strong&gt;: Requires 5 units per hour, totaling about 20 hours.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A100&lt;/strong&gt;: Demands 15 units per hour, which amounts to 6 hours.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It's important to note that the T4 GPU is available for free; however, its availability under the Colab Pro tier is not guaranteed, often necessitating the use of costlier alternatives.&lt;/p&gt;

&lt;p&gt;This shift has introduced a layer of complexity that many users find disappointing, especially when there are more straightforward options available on the market. For comparison, here's a quick overview of pricing and availability from various smaller cloud providers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Lambda Labs, Jarvislabs.ai, tensordock, genesis cloud, paperspace, Vast.ai, and FluidStack&lt;/strong&gt; offer a range of GPU options like NVIDIA A100 PCIe and V100 at varying price points and hourly rates.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Competitive Analysis of Cloud Computing Providers
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj43d4xmavsodv1b6t7nr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj43d4xmavsodv1b6t7nr.png" alt=" " width="800" height="464"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When considering cost-effectiveness:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;For the A100 GPU, &lt;strong&gt;Paperspace&lt;/strong&gt; offers the lowest price at $1.15 per hour.&lt;/li&gt;
&lt;li&gt;For the V100 GPU, &lt;strong&gt;Vast.ai&lt;/strong&gt; provides the most affordable rate at $0.16 per hour.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;However, it's essential to highlight that major cloud services like AWS, Azure, and Google Cloud were excluded from this comparison due to their higher prices, despite offering better scalability and integration.&lt;/p&gt;

&lt;p&gt;Potential users should be aware that the availability of instances on smaller clouds can be unpredictable, making them more suitable for personal projects rather than enterprise solutions. Additionally, these platforms may not always have complete libraries installed (e.g., Hugging Face on Paperspace), which could extend setup times.&lt;/p&gt;

&lt;h1&gt;
  
  
  Conclusion
&lt;/h1&gt;

&lt;p&gt;Before subscribing to Colab Pro, thoroughly explore and compare alternative cloud platforms that may offer better rates or features suited to your needs.&lt;/p&gt;

</description>
      <category>googlecloud</category>
      <category>gpu</category>
      <category>cloud</category>
      <category>google</category>
    </item>
    <item>
      <title>Exploring Low-Rank Adaptation (LoRA) from scratch</title>
      <dc:creator>Rushi Chaudhari</dc:creator>
      <pubDate>Thu, 25 Apr 2024 05:11:48 +0000</pubDate>
      <link>https://forem.com/rushichaudhari/exploring-low-rank-adaptation-lora-from-scratch-2jc1</link>
      <guid>https://forem.com/rushichaudhari/exploring-low-rank-adaptation-lora-from-scratch-2jc1</guid>
      <description>&lt;p&gt;Notebook link: &lt;a href="https://github.com/0xrushi/deep-learning-notebooks/blob/main/GPU/Exploring%20Low-Rank%20Adaptation%20(LoRA)%20from%20scratch.ipynb" rel="noopener noreferrer"&gt;here&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I've been exploring LoRA and was seeking a straightforward implementation example. Many resources I've found focus on training large models and often utilize PEFT and the loralib package, as well as some basic implementations using CNNs or ANNs as outlined in sources like [[2]].&lt;/p&gt;

&lt;p&gt;I came across some examples using LoRA with BERT, DistillBert, and others involving a Linear() layer. However, I'm specifically interested in applying it to GPT2, which uses a Conv1D() layer instead of Linear().&lt;/p&gt;

&lt;p&gt;These days, the deep learning models have significantly more layers. One major challenge with fine-tuning large models like GPT is their size; they often don't fit into the limited VRAM available. To address this, researchers at Microsoft developed the Low Rank Adaptation (LoRA) technique. This method leverages the principle of low-rank matrix decomposition. It has shown that common pre-trained models can be effectively fine-tuned or adapted using just a small subset of their original parameters, instead of modifying every parameter. This approach not only reduces the VRAM requirements but can be just as effective for fine-tuning purposes as using the full set of parameters.&lt;/p&gt;

&lt;p&gt;LoRA approximates a layer's weight changes during training, ΔW, in a low-rank format.&lt;/p&gt;

&lt;p&gt;For instance, whereas in regular finetuning, we compute the weight updates of a weight matrix W as ΔW, in LoRA, we approximate ΔW through the matrix multiplication of two smaller matrices AB, as illustrated in the figure below. (If you are familiar with PCA or SVD, consider this as decomposing ΔW into A and B.)&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxv09feiaqhaov34350hu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxv09feiaqhaov34350hu.png" alt=" " width="800" height="336"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;With LoRA, the transformation in a particular layer originally involved just 

&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;W⋅xW \cdot x&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;W&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mbin"&gt;⋅&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;x&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
, where 
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;WW&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;W&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
 is the weight matrix and 
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;xx&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;x&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
 is the input. This operation now includes an additional term, resulting in 
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;Wx+(WAWB)⋅xWx + (W_A W_B) \cdot x&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;W&lt;/span&gt;&lt;span class="mord mathnormal"&gt;x&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mbin"&gt;+&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;W&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;A&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;W&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;B&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mclose"&gt;)&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mbin"&gt;⋅&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;x&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Original Operation&lt;/strong&gt;: The operation 
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;WxWx&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;W&lt;/span&gt;&lt;span class="mord mathnormal"&gt;x&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
 involves 
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;WW&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;W&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
, a large matrix typically with dimensions like 
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;768×768768 \times 768&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;768&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mbin"&gt;×&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;768&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
 as seen in models like BERT or GPT-2. The computational complexity of this operation is 
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;O(d2)O(d^2)&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;O&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;d&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;2&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mclose"&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
, where 
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;dd&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;d&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
 is the dimension of 
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;WW&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;W&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
 (assuming square matrices for simplicity).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;LoRA Operation&lt;/strong&gt;: In the LoRA approach, 
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;WAW_A&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;W&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;A&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
 and 
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;WBW_B&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;W&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;B&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
 are smaller matrices with dimensions 
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;d×rd \times r&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;d&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mbin"&gt;×&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;r&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
 and 
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;r×dr \times d&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;r&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mbin"&gt;×&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;d&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
 respectively, where 
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;rr&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;r&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
 is much smaller than 
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;dd&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;d&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
 (indicating low rank). The product 
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;WAWBW_A W_B&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;W&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;A&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;W&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;B&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
, therefore, has the same dimension as 
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;WW&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;W&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
 but is composed of two smaller matrices. This configuration reduces the computational load significantly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;First, the product 
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;WAWBW_A W_B&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;W&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;A&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;W&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;B&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
 is computed, which involves a complexity of 
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;O(d2⋅r)O(d^2 \cdot r)&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;O&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;d&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;2&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mbin"&gt;⋅&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;r&lt;/span&gt;&lt;span class="mclose"&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
.&lt;/li&gt;
&lt;li&gt;Then, this product multiplies the input 
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;xx&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;x&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
, resulting in 
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;(WAWB)x(W_A W_B)x&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;W&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;A&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;W&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;B&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mclose"&gt;)&lt;/span&gt;&lt;span class="mord mathnormal"&gt;x&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
, with a computational complexity similar to the original operation 
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;WxWx&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;W&lt;/span&gt;&lt;span class="mord mathnormal"&gt;x&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
, but the initial reduction in complexity due to the lower rank matrices helps to manage overall computational demands effectively.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For instance, consider a weight matrix W in a specific layer, sized at 2,000x10,000, totaling 20 million parameters. If we opt for a rank r=3, we would set up two new matrices: a 2,000x3 matrix B and an 3x10,000 matrix A. Together, matrices A and B contain just 6000 + 30,000 = 36,000 parameters, making them over 555 times smaller than the 20 million parameters typically involved in standard fine-tuning with ΔW.&lt;/p&gt;

&lt;p&gt;We'll use the News Articles dataset from Kaggle to explore experiments with GPT2. Below are some code snippets that show data loading and preprocessing steps.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="err"&gt;!&lt;/span&gt;&lt;span class="n"&gt;pip&lt;/span&gt; &lt;span class="n"&gt;install&lt;/span&gt; &lt;span class="n"&gt;pytorch&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;lightning&lt;/span&gt; &lt;span class="n"&gt;lightning&lt;/span&gt; &lt;span class="n"&gt;accelerate&lt;/span&gt; &lt;span class="n"&gt;transformers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;math&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;transformers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;TextDataset&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;DataCollatorForLanguageModeling&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;transformers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;GPT2Tokenizer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;GPT2LMHeadModel&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;transformers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Trainer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;TrainingArguments&lt;/span&gt;

&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;torch.nn&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;nn&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;torch.nn.functional&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;F&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;torch.nn&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;init&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  Data Preprocess
&lt;/h1&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;cleaning&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;\s\W&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;\W,\s&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;\d+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;\s+&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;[!@#$_]&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;''&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;co&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[\w*&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# dataset link https://www.kaggle.com/datasets/asad1m9a9h6mood/news-articles
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;./Articles.csv&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;encoding&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ISO-8859-1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dropna&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;text_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Articles.txt&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;w&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;idx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;iterrows&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
  &lt;span class="n"&gt;article&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;cleaning&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Article&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
  &lt;span class="n"&gt;text_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;article&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;text_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;load_dataset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;block_size&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;128&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;dataset&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;TextDataset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;tokenizer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;file_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;file_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;block_size&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;block_size&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;dataset&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;load_data_collator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mlm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;data_collator&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DataCollatorForLanguageModeling&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;mlm&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;mlm&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;data_collator&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  Download pretrained GPT2 model
&lt;/h1&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;GPT2LMHeadModel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;GPT2LMHeadModel(
  (transformer): GPT2Model(
    (wte): Embedding(50257, 768)
    (wpe): Embedding(1024, 768)
    (drop): Dropout(p=0.1, inplace=False)
    (h): ModuleList(
      (0-11): 12 x GPT2Block(
        (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2Attention(
          (c_attn): Conv1D()
          (c_proj): Conv1D()
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dropout(p=0.1, inplace=False)
        )
        (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (mlp): GPT2MLP(
          (c_fc): Conv1D()
          (c_proj): Conv1D()
          (act): NewGELUActivation()
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  )
  (lm_head): Linear(in_features=768, out_features=50257, bias=False)
)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Contrary to the examples referenced, this model doesn't use a Linear() layer but instead features a Conv1D() layer, which is mathematically equivalent. The concept remains the same, though the implementation differs. Let's proceed by creating a LoRA wrapper specifically tailored for it.&lt;/p&gt;

&lt;p&gt;Note that we have frozen the base models parameters so only lora weights get trained.&lt;/p&gt;

&lt;p&gt;Let's now create a LoRa wrapper for Conv1D.&lt;/p&gt;

&lt;h1&gt;
  
  
  Conv1D Lora Wrapper
&lt;/h1&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;torch.nn&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;nn&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;torch.nn.functional&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;F&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;math&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;LoRAConv1DWrapper&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;nn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Module&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    A wrapper module that applies LORA to the weights of a convolutional layer.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;module&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;nn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Module&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rank&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
        Initializes the LoRAConv1DWrapper instance.

        Parameters:
            module (nn.Module): The base module whose weights are to be adapted.
            rank (int): The rank for the low-rank matrices A and B. If set to 0, LoRA is effectively disabled.
        &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="nf"&gt;super&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;rank&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Rank must be a non-negative integer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;base_module&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;module&lt;/span&gt;

        &lt;span class="n"&gt;out_features&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;in_features&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;base_module&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;weight&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;

        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;lora_rank&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rank&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;lora_rank&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;W_A&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;nn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Parameter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;zeros&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;lora_rank&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;in_features&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
                &lt;span class="n"&gt;requires_grad&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;W_B&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;nn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Parameter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;zeros&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;out_features&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;lora_rank&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
                &lt;span class="n"&gt;requires_grad&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

            &lt;span class="c1"&gt;# self.print_trainable_parameters()
&lt;/span&gt;
            &lt;span class="c1"&gt;# freeze the base module's parameters, only focus on updating lora weights
&lt;/span&gt;            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;base_module&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;weight&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;requires_grad&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;base_module&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bias&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;base_module&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bias&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;requires_grad&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Creating LoRAConv1DWrapper with no rank adaptation: rank &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;lora_rank&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reset_parameters&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;reset_parameters&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
        Initializes or resets the parameters of the LoRA matrices A and B to their default values.
        This method typically mirrors the initialization logic of the base module.
        &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;lora_rank&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="c1"&gt;# initialize A matrix
&lt;/span&gt;            &lt;span class="n"&gt;nn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;init&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;kaiming_uniform_&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;W_A&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sqrt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
            &lt;span class="c1"&gt;# initialize B matrix to 0
&lt;/span&gt;            &lt;span class="n"&gt;nn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;init&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;zeros_&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;W_B&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;print_trainable_parameters&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
        Prints the number of trainable parameters in the base module and the additional parameters added by LoRA.
        &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="n"&gt;base_params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;numel&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;base_module&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parameters&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
        &lt;span class="n"&gt;lora_params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;numel&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;W_A&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;W_B&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Trainable parameters in base module: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;base_params&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Trainable parameters in LoRA (base module frozen): &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;lora_params&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;forward&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Tensor&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Tensor&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
        Performs a forward pass through the LoRAConv1DWrapper, applying low-rank adaptations to the base module&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s weights.

        Parameters:
            x (torch.Tensor): The input tensor to the module.

        Returns:
            torch.Tensor: The output of the module after applying the low-rank adapted forward pass.
        &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;lora_rank&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="c1"&gt;# Compute the base module's forward pass with adapted weights
&lt;/span&gt;            &lt;span class="c1"&gt;# print(self.W_A.shape)
&lt;/span&gt;            &lt;span class="c1"&gt;# print(self.W_B.shape)
&lt;/span&gt;            &lt;span class="n"&gt;adapted_weight&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;base_module&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;weight&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;W_B&lt;/span&gt; &lt;span class="o"&gt;@&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;W_A&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;F&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;linear&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;adapted_weight&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;base_module&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bias&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="c1"&gt;# Perform a standard forward pass using the base module's original weights and bias
&lt;/span&gt;            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;F&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;linear&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;base_module&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;weight&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;base_module&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bias&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;update_model_layers&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
  &lt;span class="c1"&gt;# Set LoRA hyperparameters
&lt;/span&gt;  &lt;span class="n"&gt;lora_r&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;
  &lt;span class="n"&gt;lora_alpha&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;16&lt;/span&gt;
  &lt;span class="n"&gt;lora_dropout&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.05&lt;/span&gt;
  &lt;span class="c1"&gt;# flag to apply LoRA to Transformer layers
&lt;/span&gt;  &lt;span class="n"&gt;lora_attn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;
  &lt;span class="c1"&gt;# flag to apply LoRA to MLP layers
&lt;/span&gt;  &lt;span class="n"&gt;lora_mlp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;

  &lt;span class="c1"&gt;# Apply LoRA modifications to the GPT2 layers
&lt;/span&gt;  &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;block&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;transformer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;h&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
      &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;lora_attn&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;attn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;c_attn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LoRAConv1DWrapper&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;attn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;c_attn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rank&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;attn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;c_proj&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LoRAConv1DWrapper&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;attn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;c_proj&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rank&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

      &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;lora_mlp&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
          &lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mlp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;c_fc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LoRAConv1DWrapper&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mlp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;c_fc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rank&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
          &lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mlp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;c_proj&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LoRAConv1DWrapper&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mlp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;c_proj&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rank&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;update_model_layers&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;GPT2LMHeadModel(
  (transformer): GPT2Model(
    (wte): Embedding(50257, 768)
    (wpe): Embedding(1024, 768)
    (drop): Dropout(p=0.1, inplace=False)
    (h): ModuleList(
      (0-11): 12 x GPT2Block(
        (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2Attention(
          (c_attn): LoRAConv1DWrapper(
            (base_module): Conv1D()
          )
          (c_proj): LoRAConv1DWrapper(
            (base_module): Conv1D()
          )
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dropout(p=0.1, inplace=False)
        )
        (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (mlp): GPT2MLP(
          (c_fc): LoRAConv1DWrapper(
            (base_module): Conv1D()
          )
          (c_proj): LoRAConv1DWrapper(
            (base_module): Conv1D()
          )
          (act): NewGELUActivation()
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  )
  (lm_head): Linear(in_features=768, out_features=50257, bias=False)
)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;train&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;train_file_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;model_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="n"&gt;output_dir&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="n"&gt;overwrite_output_dir&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="n"&gt;per_device_train_batch_size&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="n"&gt;num_train_epochs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="n"&gt;save_steps&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
  &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Trains a GPT-2 model using the Hugging Face Transformers library.

    This function initializes a model, tokenizer, and data collator. It sets up training arguments and
    creates a Trainer instance to manage the training process.

    Parameters:
    - train_file_path (str): The file path to the training dataset.
    - model_name (str): The name of the pre-trained GPT-2 model to use. This can be a model identifier
        from Hugging Face&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s model hub (e.g., &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;gpt2&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;gpt2-medium&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;) or the path to a local directory containing model files.
    - output_dir (str): The directory where the model checkpoints will be saved during training.
    - overwrite_output_dir (bool): Set to True to overwrite the output directory, or False to continue training from the last checkpoint.
    - per_device_train_batch_size (int): Batch size per device during training.
    - num_train_epochs (int): Total number of training epochs.
    - save_steps (int): The number of training steps to perform before saving a checkpoint.

    Returns:
    None

    Saves the tokenizer and model to the specified output directory. Trains the model using the
    given dataset, saving the final model configuration to the output directory after training.

    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
  &lt;span class="n"&gt;tokenizer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;GPT2Tokenizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;train_dataset&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;load_dataset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;train_file_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;data_collator&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;load_data_collator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

  &lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;save_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;output_dir&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

  &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;GPT2LMHeadModel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

  &lt;span class="c1"&gt;# # comment this to skip LoRA
&lt;/span&gt;  &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;update_model_layers&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

  &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;save_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;output_dir&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

  &lt;span class="n"&gt;training_args&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;TrainingArguments&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
          &lt;span class="n"&gt;output_dir&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;output_dir&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="n"&gt;overwrite_output_dir&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;overwrite_output_dir&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="n"&gt;per_device_train_batch_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;per_device_train_batch_size&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="n"&gt;num_train_epochs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;num_train_epochs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="p"&gt;)&lt;/span&gt;

  &lt;span class="n"&gt;trainer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Trainer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
          &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;training_args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="n"&gt;data_collator&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;data_collator&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="n"&gt;train_dataset&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;train_dataset&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt;

  &lt;span class="n"&gt;trainer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;train&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
  &lt;span class="n"&gt;trainer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;save_model&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As we can see Conv1D has successfully been replaced by the LoRAConv1DWrapper layer.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# some constants
&lt;/span&gt;&lt;span class="n"&gt;train_file_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Articles.txt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;model_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;gpt2&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
&lt;span class="n"&gt;output_dir&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;result&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
&lt;span class="n"&gt;overwrite_output_dir&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;
&lt;span class="n"&gt;per_device_train_batch_size&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;
&lt;span class="n"&gt;num_train_epochs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;12&lt;/span&gt;
&lt;span class="n"&gt;save_steps&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nf"&gt;train&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;train_file_path&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;train_file_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;model_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;output_dir&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;output_dir&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;overwrite_output_dir&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;overwrite_output_dir&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;per_device_train_batch_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;per_device_train_batch_size&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;num_train_epochs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;num_train_epochs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;save_steps&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;save_steps&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Training without Lora 5 Epochs
&lt;/h2&gt;

&lt;p&gt;The initial loss seems to be lower than lora because all the weights are getting updated&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvrtg2nyb1qylcl24qbah.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvrtg2nyb1qylcl24qbah.png" alt=" " width="800" height="586"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Training with Lora 5 epochs
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcpbe0jpgpbeedbjk2pmn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcpbe0jpgpbeedbjk2pmn.png" alt=" " width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Let's attempt to lengthen the epochs using Lora; this might help reduce the loss further.&lt;/p&gt;

&lt;h2&gt;
  
  
  Training with Lora 12 Epochs
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff13hvdnbezvwfu7ch1i3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff13hvdnbezvwfu7ch1i3.png" alt=" " width="800" height="752"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Training without Lora starts with a lower loss compared to using Lora, probably because all the weights are updated. It's suitable for the GPU, but it might need more epochs.&lt;/p&gt;

&lt;h1&gt;
  
  
  References
&lt;/h1&gt;

&lt;p&gt;[1] &lt;a href="https://www.linkedin.com/pulse/more-efficient-finetuning-implementing-lora-from-scratch-george-davis/" rel="noopener noreferrer"&gt;https://www.linkedin.com/pulse/more-efficient-finetuning-implementing-lora-from-scratch-george-davis/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[2] &lt;a href="https://lightning.ai/lightning-ai/studios/code-lora-from-scratch" rel="noopener noreferrer"&gt;https://lightning.ai/lightning-ai/studios/code-lora-from-scratch&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[3] &lt;a href="https://towardsdatascience.com/implementing-lora-from-scratch-20f838b046f1" rel="noopener noreferrer"&gt;https://towardsdatascience.com/implementing-lora-from-scratch-20f838b046f1&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[4] LoRA explained (and a bit about precision and quantization)&lt;br&gt;
 &lt;a href="https://youtu.be/t509sv5MT0w" rel="noopener noreferrer"&gt;https://youtu.be/t509sv5MT0w&lt;/a&gt;&lt;/p&gt;

</description>
      <category>lora</category>
      <category>llm</category>
      <category>language</category>
      <category>ai</category>
    </item>
    <item>
      <title>Building Your Own Personal Assistant With ChatGPT</title>
      <dc:creator>Rushi Chaudhari</dc:creator>
      <pubDate>Sun, 19 Feb 2023 03:28:28 +0000</pubDate>
      <link>https://forem.com/rushichaudhari/building-your-own-personal-assistant-with-chatgpt-98i</link>
      <guid>https://forem.com/rushichaudhari/building-your-own-personal-assistant-with-chatgpt-98i</guid>
      <description>&lt;p&gt;If you've ever used Siri, Alexa, or Google Assistant, you know how powerful and convenient having a personal assistant can be. What if you could build your own personal assistant, tailored to your specific needs? Thanks to the power of OpenAI's ChatGPT language model and the open-source community, you can!&lt;/p&gt;

&lt;p&gt;In this post, we'll explore a GitHub project called "ChatGPT-chan," which provides a collection of tools to help you build your own personal assistant. Let's dive in!&lt;/p&gt;

&lt;h2&gt;
  
  
  What is ChatGPT-chan?
&lt;/h2&gt;

&lt;p&gt;ChatGPT-chan is an open-source project that provides tools for building conversational interfaces, automating tasks, and more. It leverages the power of OpenAI's ChatGPT language model to understand natural language inputs and provide intelligent responses.&lt;/p&gt;

&lt;p&gt;The project consists of three main components: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Emotion Classifier: A machine learning model that can detect the emotion in a given text input. &lt;/li&gt;
&lt;li&gt;Stable Diffusion Model: A machine learning model that generates realistic images based on text prompts. &lt;/li&gt;
&lt;li&gt;ChatGPT Wrapper: A Python library that provides a simple API for integrating the above models and creating conversational interfaces.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The project also includes a demo that showcases the power of the ChatGPT wrapper. Check out the demo video &lt;a href="https://odysee.com/@rushi:2/chatgptchandemo2:4" rel="noopener noreferrer"&gt;here&lt;/a&gt; &lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/rushic24/chatgpt-chan" rel="noopener noreferrer"&gt;github: chatgpt-chan&lt;/a&gt; &lt;/p&gt;

&lt;h2&gt;
  
  
  How to Set Up ChatGPT-chan
&lt;/h2&gt;

&lt;p&gt;To get ChatGPT-chan up and running, you'll find comprehensive guidance in the project's GitHub repository. Here’s a simplified setup process:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Install the Emotion Classifier and Stable Diffusion Model either on a server or locally.&lt;/li&gt;
&lt;li&gt;Clone the ChatGPT-chan repository and install all necessary dependencies.&lt;/li&gt;
&lt;li&gt;Modify the configuration file to connect to the Emotion Classifier and Stable Diffusion Model servers.&lt;/li&gt;
&lt;li&gt;Launch the ChatGPT Wrapper.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This streamlined overview should help you initiate the setup quickly, with detailed steps available in the GitHub readme.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Build Your Own Personal Assistant?
&lt;/h2&gt;

&lt;p&gt;Building your own personal assistant can be a fun and rewarding project, but it also has practical applications. For example, you could use it to automate tasks in your daily life, such as setting reminders or sending messages. You could also integrate it into your own projects to provide a natural language interface for your users.&lt;/p&gt;

&lt;p&gt;Another benefit of building your own personal assistant is that you have full control over the data it collects and how it's used. With commercial personal assistants, you may not know what data is being collected and how it's being used.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;If you're interested in building your own personal assistant, ChatGPT-chan is a great place to start. It provides powerful tools for understanding natural language inputs and generating intelligent responses, all using open-source software.&lt;/p&gt;

&lt;p&gt;While the setup process can be a bit involved, the end result is a powerful tool that you can use to automate tasks and provide natural language interfaces for your projects. Give it a try and see what you can build!&lt;/p&gt;

</description>
      <category>puzzlegames</category>
      <category>firstpost</category>
      <category>showdev</category>
      <category>discuss</category>
    </item>
  </channel>
</rss>
