<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Ankit Dey</title>
    <description>The latest articles on Forem by Ankit Dey (@ankitdey01).</description>
    <link>https://forem.com/ankitdey01</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3814482%2F2740ddf8-90b8-4ea9-854f-4ca9c932f5a1.jpg</url>
      <title>Forem: Ankit Dey</title>
      <link>https://forem.com/ankitdey01</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/ankitdey01"/>
    <language>en</language>
    <item>
      <title>How Did AI Learn to Be Nice? The Humans Behind the Curtain</title>
      <dc:creator>Ankit Dey</dc:creator>
      <pubDate>Wed, 18 Mar 2026 20:48:58 +0000</pubDate>
      <link>https://forem.com/ankitdey01/how-did-ai-learn-to-be-nicethe-humans-behind-the-curtain-352h</link>
      <guid>https://forem.com/ankitdey01/how-did-ai-learn-to-be-nicethe-humans-behind-the-curtain-352h</guid>
      <description>&lt;p&gt;&lt;strong&gt;Welcome back to AI From Scratch.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is Day 8/30 of the Understanding Beginner AI Series&lt;/p&gt;

&lt;p&gt;Where we are:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Days 1–5&lt;/strong&gt;&lt;/em&gt;: how the brain works — tokens, weights, transformers, attention.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Day 6&lt;/strong&gt;&lt;/em&gt;: why bigger models often feel smarter (and when that breaks).&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Day 7&lt;/strong&gt;&lt;/em&gt;: how base models turn into instruction‑tuned assistants that actually listen.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Today’s question:&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;How did these models go from “super smart autocomplete”&lt;br&gt;
to something that tries to be helpful, polite, and safe?&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Short answer: humans got into the training loop.&lt;br&gt;
That upgrade has a name: Reinforcement Learning from Human Feedback (RLHF).&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The problem: powerful, but kind of feral&lt;br&gt;
Imagine a pure base model, fresh out of pretraining.&lt;br&gt;
It has read half the internet, can mimic lots of styles, knows tons of facts — but no one has told it what good behavior looks like.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;So it can:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Spit out toxic stuff (because the internet has plenty).&lt;/li&gt;
&lt;li&gt;Argue with you, overshare, or confidently hallucinate.&lt;/li&gt;
&lt;li&gt;Ignore instructions and just continue text in weird ways.&lt;/li&gt;
&lt;li&gt;In other words: raw capability, zero manners.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Companies realized: if we release that to the public, it will be a PR and safety disaster. They needed a way to bend the model toward being helpful, harmless, and honest — the famous “HHH” alignment goals.&lt;/p&gt;

&lt;p&gt;So what this means for you: the model you chat with today is not the raw brain — it’s the raw brain plus a bunch of extra training to make it behave more like a decent human teammate.&lt;/p&gt;

&lt;h2&gt;
  
  
  RLHF in one line: “Do more of what humans like”
&lt;/h2&gt;

&lt;p&gt;Traditional reinforcement learning is:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;“Take an action, get a reward, update your behavior to get more reward next time.”&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;RLHF just swaps out “game score” or “environment reward” for “what a human preferred.”&lt;/p&gt;

&lt;p&gt;Instead of:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;“You got +10 for reaching the goal in a maze”&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;we use:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;“Humans liked answer A more than answer B, so A gets a higher ‘reward’.”&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Then we train the model to prefer answers humans tend to prefer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;So what this means for you: RLHF is literally teaching the model, “When in doubt, act more like this human‑approved answer, not that one.”&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1: start with a capable base model
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;RLHF doesn’t replace pretraining&lt;/em&gt;&lt;/strong&gt;; it rides on top of it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The recipe starts with:
&lt;/h2&gt;

&lt;p&gt;A pretrained base model that already knows &lt;em&gt;language, facts, code, etc&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Often it’s also gone through a supervised fine‑tuning stage with curated “good assistant” examples (we touched on this in Day 7).&lt;/p&gt;

&lt;p&gt;Think of this as:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“We’ve built a super talented intern who knows a ton,&lt;br&gt;
but hasn’t yet been taught company culture or what’s off‑limits.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;So what this means for you: RLHF doesn’t make a dumb model smart, it makes a smart model behave better.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2: humans rate multiple answers
&lt;/h2&gt;

&lt;p&gt;Now comes the “humans behind the curtain” part.&lt;/p&gt;

&lt;p&gt;For lots of prompts, the model generates several different answers: A, B, C…&lt;br&gt;
Human reviewers then rank these answers from best to worst.&lt;/p&gt;

&lt;p&gt;They judge things like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Is it helpful and on‑topic?&lt;/li&gt;
&lt;li&gt;Is it safe, non‑toxic, non‑harassing?&lt;/li&gt;
&lt;li&gt;Is it factually reasonable (as far as they can tell)?&lt;/li&gt;
&lt;li&gt;Is the tone appropriate (not rude, not over‑confident)?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those rankings feed into a reward model, a separate smaller model trained to predict “how much would a human like this answer?”&lt;/p&gt;

&lt;p&gt;So what this means for you: somewhere in the background, people have literally sat and said “this answer is better than that one” thousands of times, so your AI now has a sense of which directions humans tend to prefer.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 3: train the model to chase that reward
&lt;/h2&gt;

&lt;p&gt;Once we have a reward model that can score answers, we bring in reinforcement learning:&lt;/p&gt;

&lt;p&gt;The main model (the “policy”) tries different styles of answers.&lt;/p&gt;

&lt;p&gt;The reward model scores them: higher for human‑like good behavior, lower for bad ones.&lt;/p&gt;

&lt;p&gt;An RL algorithm (often something like PPO) tweaks the main model’s weights to maximize that score.&lt;/p&gt;

&lt;h2&gt;
  
  
  Repeat this over and over:
&lt;/h2&gt;

&lt;p&gt;Answers that humans would probably like more become more likely.&lt;/p&gt;

&lt;p&gt;Answers that would get human side‑eye become less likely.&lt;/p&gt;

&lt;p&gt;Over time, the model shifts from “raw internet brain” to “politer assistant that tries to avoid landmines.”&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;So what this means for you: the reason your AI often refuses to give dangerous instructions or shifts tone when you get heated is because there’s been a whole extra training phase that told it which behaviors are rewarded and which get smacked down.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  What RLHF actually changes in your experience
&lt;/h2&gt;

&lt;p&gt;Compared to a non‑aligned base model, RLHF‑aligned models tend to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Follow your instructions more reliably&lt;/li&gt;
&lt;li&gt;They treat your prompt as “please do X” instead of “here’s some text, continue it however.”&lt;/li&gt;
&lt;li&gt;Be more cautious around harmful content&lt;/li&gt;
&lt;li&gt;They push back on prompts about self‑harm, hate, scams, etc., because those answers get hammered in the feedback loop.&lt;/li&gt;
&lt;li&gt;Sound more cooperative and less chaotic&lt;/li&gt;
&lt;li&gt;Tone, politeness, disclaimers — all of that is shaped by what human raters rewarded.&lt;/li&gt;
&lt;li&gt;Be more “brand safe”&lt;/li&gt;
&lt;li&gt;Enterprises can align models with their own values, policies, and legal requirements.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So what this means for you: when the AI feels “nice,” “responsible,” or “a little too careful,” that’s not an accident, it’s RLHF steering its behavior toward a particular definition of “good.”&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The limits and trade‑offs of “niceness”&lt;br&gt;
RLHF is powerful, but it’s not magic.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Some real‑world issues people point out:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Human bias leaks in&lt;/strong&gt;&lt;br&gt;
If your human raters have certain cultural or political biases, those can be baked into what the model sees as “good behavior.”&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Over‑cautiousness&lt;/strong&gt;&lt;br&gt;
In trying to be safe, models sometimes refuse harmless requests or give generic, over‑sanitized answers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reward hacking&lt;/strong&gt;&lt;br&gt;
The model may learn to “sound” safe and thoughtful without actually being more accurate, it optimizes what looks good to raters, not some perfect moral truth.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Alignment ≠ solved ethics&lt;/strong&gt;&lt;br&gt;
RLHF nudges models toward broad goals like “helpful, harmless, honest,” but what those mean in edge cases is still a messy, ongoing debate.&lt;/p&gt;

&lt;p&gt;So what this means for you: “trained with RLHF” doesn’t mean “always right or perfectly ethical.” It means “there was a serious attempt to point this very strong engine in a direction humans generally like better.”&lt;/p&gt;

&lt;h2&gt;
  
  
  Where this leaves us by Day 8
&lt;/h2&gt;

&lt;p&gt;By now, your mental picture could be:&lt;/p&gt;

&lt;p&gt;Pretraining gave the model its raw knowledge and skills.&lt;/p&gt;

&lt;p&gt;Instruction tuning taught it to treat prompts as commands and follow formats.&lt;/p&gt;

&lt;p&gt;RLHF used human preferences to steer it toward being more helpful, polite, and safe.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;So what this means for you: when you chat with an AI today, you’re not just talking to a giant matrix of numbers — you’re talking to something many humans have indirectly shaped through millions of tiny “this response is better than that one” judgments.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Teaser for Day 9 – Why the Way You Talk to AI Changes Everything
&lt;/h2&gt;

&lt;p&gt;Now that you know how we trained the model to be more aligned with human values, there’s another big lever left:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;How you talk to the model at runtime.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;That’s the world of prompting:&lt;/p&gt;

&lt;p&gt;System prompts: the hidden “personality script” the model gets before you even type.&lt;/p&gt;

&lt;p&gt;Few‑shot prompts: giving examples in your message so it learns the pattern you want on the fly.&lt;/p&gt;

&lt;p&gt;Chain‑of‑thought: nudging the model to think step by step instead of jumping to an answer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;On Day 9 – “Why the Way You Talk to AI Changes Everything”&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;we’ll treat prompting as an engineering skill, not mysticism, and show how tiny changes in how you ask can completely change what you get back.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;What blew your mind most? Drop a comment!&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>coding</category>
      <category>explainlikeimfive</category>
    </item>
    <item>
      <title>How Does AI Go From Dumb to Useful? The Training Upgrade Nobody Explains</title>
      <dc:creator>Ankit Dey</dc:creator>
      <pubDate>Mon, 16 Mar 2026 11:00:58 +0000</pubDate>
      <link>https://forem.com/ankitdey01/how-does-ai-go-from-dumb-to-usefulthe-training-upgrade-nobody-explains-1f2p</link>
      <guid>https://forem.com/ankitdey01/how-does-ai-go-from-dumb-to-usefulthe-training-upgrade-nobody-explains-1f2p</guid>
      <description>&lt;p&gt;Welcome back to &lt;strong&gt;AI From Scratch.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you’ve reached Day 7, you’re not just “AI‑curious” anymore — you’re basically that friend who secretly understands how this stuff works.&lt;/p&gt;

&lt;p&gt;Where we are so far:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You know AI is a next‑token prediction machine (Day 1).&lt;/li&gt;
&lt;li&gt;You’ve seen how it learns via the training loop (Day 2).&lt;/li&gt;
&lt;li&gt;You’ve peeked inside the layers and neurons (Day 3).&lt;/li&gt;
&lt;li&gt;You’ve met Transformers and attention (Day 4).&lt;/li&gt;
&lt;li&gt;You know it doesn’t read words, it reads tokens and numbers (Day 5).&lt;/li&gt;
&lt;li&gt;And yesterday, we talked about why bigger models often feel smarter — and where that idea breaks.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Today’s question:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;If two models are built on the same architecture, trained on similar data…&lt;br&gt;
why does one feel like a nerdy research project and the other feels like a helpful assistant?&lt;br&gt;
**&lt;br&gt;
That’s where base models and instruction‑tuned models enter the chat.**&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Base model: the raw, slightly feral brain
&lt;/h2&gt;

&lt;p&gt;A &lt;strong&gt;base model&lt;/strong&gt; is what you get right after the big original training run on internet‑scale text.&lt;br&gt;
This is the “pure” next‑word prediction machine: it’s learned language patterns, world facts, coding tricks — everything we talked about up to Day 6.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;But here’s the catch: no one has yet sat it down and said,&lt;br&gt;
“Hey, when a human asks you something, please answer like a helpful assistant.”&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;So a base model:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Knows a lot about language and the world.&lt;/li&gt;
&lt;li&gt;Will happily continue almost any text you give it — stories, code, song lyrics, rants.&lt;/li&gt;
&lt;li&gt;But might ignore your instructions, ramble, or respond in weird formats because it’s not trained to treat your prompt as a command — it just sees it as the start of more text to complete.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So what this means for you: a base model is like a very smart person who has read the whole internet but has never been told “answer questions clearly, step by step, and don’t be a chaos gremlin.”&lt;/p&gt;

&lt;h2&gt;
  
  
  Pretraining vs finetuning: the two phases of “raising” an AI
&lt;/h2&gt;

&lt;p&gt;At a high level, modern LLMs go through two big life phases:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pretraining (the huge, expensive phase)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Model reads massive amounts of text.&lt;/li&gt;
&lt;li&gt;Objective: “predict the next token correctly.”&lt;/li&gt;
&lt;li&gt;Result: broad language understanding + world knowledge.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Finetuning (the shorter, targeted phase)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You start from that pretrained base brain.&lt;/li&gt;
&lt;li&gt;You train it more on a smaller, curated dataset for some specific goal: follow instructions, write in a tone, perform a domain task.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Pretraining is like sending the model to the biggest school on Earth.&lt;br&gt;
Finetuning is like giving it a job‑specific bootcamp: “Now you’re a support agent,” or “Now you’re a polite tutor.”&lt;/p&gt;

&lt;p&gt;So what this means for you: almost every helpful AI you touch today started life as a raw base model, then got extra training layers to make it behave like a usable product.&lt;/p&gt;

&lt;h2&gt;
  
  
  Instruction tuning: teaching the model to actually listen
&lt;/h2&gt;

&lt;p&gt;One very specific style of finetuning turned out to be a game‑changer: &lt;strong&gt;instruction tuning.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Instead of just feeding the model random text, we create datasets that look like this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Instruction: “Summarize this article in 3 bullet points.”&lt;br&gt;
Input: (some article)&lt;br&gt;
Output: (a good 3‑bullet summary)&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Or:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Instruction: “Explain transformers to a 10‑year‑old.”&lt;br&gt;
Output: (simple, kid‑friendly explanation)&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The model is then fine‑tuned on thousands or millions of these (instruction, response) pairs across many tasks — translation, summarization, Q&amp;amp;A, reasoning steps, coding help, etc.&lt;/p&gt;

&lt;p&gt;Over time, it learns a meta‑skill:&lt;/p&gt;

&lt;h2&gt;
  
  
  “When a human writes something that looks like an instruction,
&lt;/h2&gt;

&lt;p&gt;I should respond in the style and format they seem to want.”&lt;/p&gt;

&lt;p&gt;Compared to a base model, an &lt;strong&gt;instruction‑tuned model:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Follows your prompt structure more closely.&lt;/p&gt;

&lt;p&gt;Stays on task instead of randomly storytelling.&lt;/p&gt;

&lt;p&gt;Is better at “do X in style Y with constraint Z.”&lt;/p&gt;

&lt;p&gt;So what this means for you: that feeling of “I can just talk to it like a person and it mostly gets what I mean” is usually instruction tuning doing its job, not magic.&lt;/p&gt;

&lt;h2&gt;
  
  
  Base vs instruction‑tuned: how they feel different
&lt;/h2&gt;

&lt;p&gt;Let’s make this concrete.&lt;/p&gt;

&lt;p&gt;Ask a base model:&lt;/p&gt;

&lt;p&gt;“Explain AI in 5 short bullet points I can paste into a slide.”&lt;/p&gt;

&lt;p&gt;It might:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Write a long essay.&lt;/li&gt;
&lt;li&gt;Ignore the “5 bullets” part.&lt;/li&gt;
&lt;li&gt;Drift into a Wikipedia‑style info dump.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Ask an &lt;strong&gt;instruction‑tuned model&lt;/strong&gt; the same thing and you’re more likely to get:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Exactly 5 bullets.&lt;/li&gt;
&lt;li&gt;Slide‑friendly phrasing.&lt;/li&gt;
&lt;li&gt;A tone that roughly matches your request.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Why? Because during instruction tuning, it has seen thousands of examples where people say “summarize,” “list,” “explain like I’m 12,” “write an email that…”, and it’s been graded on how well it followed those commands.&lt;/p&gt;

&lt;p&gt;So what this means for you: if you want a research sandbox, a base model can be fun. If you want a reliable assistant that listens, instruction‑tuned is usually what you’re actually using — and what you should look for.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where base models still matter
&lt;/h2&gt;

&lt;p&gt;With all this love for instruction‑tuned models, you might ask:&lt;br&gt;
“Why do base models exist at all?”&lt;/p&gt;

&lt;p&gt;A few big reasons:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Research and advanced users&lt;/li&gt;
&lt;li&gt;Base models let researchers and companies fine‑tune for their own, very specific needs (legal, medical, internal docs) without fighting against someone else’s chatty assistant persona.&lt;/li&gt;
&lt;li&gt;Raw capability vs behavior&lt;/li&gt;
&lt;li&gt;Some work even suggests that on certain reasoning benchmarks or under distribution shifts, base models can outperform their instruction‑tuned cousins, which may overfit to specific prompting styles.&lt;/li&gt;
&lt;li&gt;Full control&lt;/li&gt;
&lt;li&gt;If you want to design your own way of turning a raw model into a product (your own safety rules, tone, tools), starting from the base gives you a clean slate.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So what this means for you: when you see “base” vs “instruct” versions of the same model, base is the raw engine; instruct is the same engine with a “friendly driver” layer on top.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this matters even if you never train a model yourself
&lt;/h2&gt;

&lt;p&gt;You might think: “Cool, but I’m never going to fine‑tune a model. Why should I care about any of this?”&lt;/p&gt;

&lt;p&gt;A few reasons:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Choosing tools&lt;/li&gt;
&lt;li&gt;Knowing if an app is built on a base or instruction‑tuned model tells you what to expect: more raw creativity vs more obedient instruction following.&lt;/li&gt;
&lt;li&gt;Debugging weird behavior&lt;/li&gt;
&lt;li&gt;If a model keeps ignoring your prompt structure, you now know: either it’s a weakly tuned base model, or its instruction data didn’t reinforce your style enough.&lt;/li&gt;
&lt;li&gt;Understanding the limits&lt;/li&gt;
&lt;li&gt;Even super polished instruction‑tuned models are still just next‑token predictors underneath — instruction tuning narrows and shapes behavior, it doesn’t turn them into perfect logical robots.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So what this means for you: when an AI feels surprisingly helpful, that’s not just “the model is big” — it’s “the model has been trained again, specifically to behave like a useful assistant.”&lt;/p&gt;

&lt;h2&gt;
  
  
  Teaser for Day 8 – How Did AI Learn to Be Nice?
&lt;/h2&gt;

&lt;p&gt;We’ve now separated two big ideas:&lt;/p&gt;

&lt;p&gt;Pretraining: where the model learns about language and the world.&lt;/p&gt;

&lt;p&gt;Instruction tuning / finetuning: where we teach it to follow instructions and act more like an assistant.&lt;/p&gt;

&lt;p&gt;But there’s still one more layer we haven’t unpacked:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;How did AI learn to be polite, avoid certain topics, refuse some requests,&lt;br&gt;
and generally behave “aligned” with human values (at least most of the time)?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That’s where Reinforcement Learning from Human Feedback (RLHF) comes in — humans literally rating and steering the model’s behavior, and the model learning which kinds of responses get “rewarded.”&lt;/p&gt;

&lt;p&gt;Tomorrow, in Day 8 – “&lt;strong&gt;How Did AI Learn to Be Nice? The Humans Behind the Curtain”&lt;/strong&gt;&lt;br&gt;
we’ll talk about:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What RLHF actually is in everyday language&lt;/li&gt;
&lt;li&gt;How humans sit in the training loop&lt;/li&gt;
&lt;li&gt;And why this “alignment” step matters for safety and for how pleasant your AI chats feel&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;That's the end!&lt;br&gt;
What blew your mind most? Drop a comment!&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>programming</category>
      <category>coding</category>
    </item>
    <item>
      <title>Why Is a Bigger AI "Smarter"? It's Not What You Think (Day 6/30 Beginner AI Series)</title>
      <dc:creator>Ankit Dey</dc:creator>
      <pubDate>Sun, 15 Mar 2026 08:51:15 +0000</pubDate>
      <link>https://forem.com/ankitdey01/why-is-a-bigger-ai-smarter-its-not-what-you-think-day-630-beginner-ai-series-43o0</link>
      <guid>https://forem.com/ankitdey01/why-is-a-bigger-ai-smarter-its-not-what-you-think-day-630-beginner-ai-series-43o0</guid>
      <description>&lt;p&gt;Welcome back to AI From Scratch.&lt;br&gt;
If you're still here on Day 6, you're officially that friend who "just wanted a simple overview" and then accidentally learned how half the field works.&lt;/p&gt;

&lt;p&gt;Quick rewind:&lt;br&gt;
Day 1: AI as a next‑word prediction machine.&lt;br&gt;
Day 2: How it learns by failing and nudging weights.&lt;br&gt;
Day 3: What's happening inside when it "thinks."&lt;br&gt;
Day 4: Transformers and attention - the wiring that made modern AI possible.&lt;br&gt;
Day 5: AI doesn't read words, it reads tokens and numbers.&lt;/p&gt;

&lt;p&gt;Today's question:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;If everyone keeps bragging about "50B parameters" or "1T parameters"…&lt;br&gt;
what does making a model bigger actually change?&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  So, what even is a "parameter" again?
&lt;/h2&gt;

&lt;p&gt;From Day 1 and 2: a &lt;strong&gt;parameter&lt;/strong&gt; is just one tiny knob inside the model , a weight that says "when I see this pattern, react this much."&lt;br&gt;
A model with 1 million parameters is like a brain with 1 million tiny switches.&lt;/p&gt;

&lt;p&gt;A model with 1 trillion parameters is like a brain with a whole galaxy of switches.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;More parameters = more capacity to store patterns from training data:&lt;br&gt;
language quirks, world facts, coding tricks, writing styles, reasoning shortcuts.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;So what this means for you: when people say "bigger model," they literally mean "a brain with way more knobs that can in theory capture way more detail."&lt;/p&gt;




&lt;h2&gt;
  
  
  Why making models bigger helped so much
&lt;/h2&gt;

&lt;p&gt;Around 2020, researchers noticed something wild:&lt;br&gt;
if you scale up &lt;strong&gt;model size, data&lt;/strong&gt;, and &lt;strong&gt;compute&lt;/strong&gt; in the right way, performance improves in a pretty smooth, predictable way, these are the famous &lt;strong&gt;scaling laws&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;In practice, that meant:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;10× more compute → noticeably lower error on language tasks.&lt;/li&gt;
&lt;li&gt;Bigger models kept getting better, not just a tiny bit, but enough to be worth the extra GPUs.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's why we went from models with millions of parameters to ones with billions and then hundreds of billions , the graph kept trending in the right direction.&lt;/p&gt;

&lt;p&gt;So what this means for you: the "era of huge models" wasn't just hype, the data really did show that, for a while, simply scaling up size (plus data and compute) kept unlocking better performance.&lt;/p&gt;




&lt;h2&gt;
  
  
  The spooky part: new abilities just… appear
&lt;/h2&gt;

&lt;p&gt;As people scaled models up, something surprising happened:&lt;br&gt;
&lt;strong&gt;bigger models started doing things they were never explicitly trained to do.&lt;/strong&gt;&lt;br&gt;
Examples researchers noticed:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Smaller models: decent at basic text completion.&lt;/li&gt;
&lt;li&gt;Larger ones: suddenly could translate, do few‑shot learning ("here are 3 examples, now do the 4th"), solve simple math, write code, all from the same next‑word training objective.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are often called &lt;strong&gt;emergent abilities&lt;/strong&gt;, skills that seem to "switch on" once you pass a certain size, even though the training recipe didn't change.​&lt;/p&gt;

&lt;p&gt;So what this means for you: when GPT‑3 felt &lt;em&gt;qualitatively&lt;/em&gt; different from GPT‑2, it wasn't because someone manually added "write emails" mode - it was a side‑effect of pushing model size, data, and compute past a certain threshold.&lt;/p&gt;




&lt;h2&gt;
  
  
  But it's not "just make it huge" - data matters a lot
&lt;/h2&gt;

&lt;p&gt;Then another twist: &lt;strong&gt;bigger isn't always better if you don't feed it enough data.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;DeepMind's **Chinchilla **work showed that GPT‑3‑style models were actually &lt;em&gt;under&lt;/em&gt;‑_trained _on data for their size.&lt;/p&gt;

&lt;p&gt;They trained a _smaller _model (around 70B parameters) on more tokens than previous giants, and it beat much larger models that had less data.&lt;br&gt;
Roughly speaking, they found:&lt;br&gt;
for a fixed compute budget, you should grow &lt;strong&gt;model size and dataset size together, instead of only cranking up parameters.&lt;/strong&gt;&lt;br&gt;
So what this means for you: a 1T‑parameter model trained on too little or low‑quality data can be dumber than a well‑trained 70B model. Size gives capacity; data and training actually fill it with something useful.&lt;/p&gt;




&lt;h2&gt;
  
  
  Small vs large models in the real world
&lt;/h2&gt;

&lt;p&gt;Outside research papers, teams now run into a very practical question:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Do we really need the giant model for this job?"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Rough pattern people see:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;**Large models (10B–70B+ parameters):&lt;/li&gt;
&lt;li&gt;**Better at complex reasoning, multi‑step tasks, and understanding long context.&lt;/li&gt;
&lt;li&gt;Often lower hallucination rates on factual queries (though still not perfect).&lt;/li&gt;
&lt;li&gt;Heavier: more GPUs, more energy, more latency and cost.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Small models (&amp;lt;1B–a few B parameters):&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;Fast, cheap, can sometimes run on a laptop or phone.&lt;/li&gt;
&lt;li&gt;Great when you fine‑tune them for a very specific domain.&lt;/li&gt;
&lt;li&gt;Weaker at open‑ended reasoning and multi‑language, but easier to deploy privately.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So what this means for you: bigger models tend to feel "smarter" on broad, messy tasks, but for focused, everyday jobs (like one company's support emails), a smaller, tuned model can actually be the better call.&lt;/p&gt;




&lt;h2&gt;
  
  
  What does scaling really buy you?
&lt;/h2&gt;

&lt;p&gt;If we strip away the marketing, going from 1M to 1B to 1T parameters mainly buys you:&lt;/p&gt;

&lt;p&gt;_- More capacity&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The model can store and express richer patterns about language, code, and the world, especially when paired with enough training data.&lt;/li&gt;
&lt;li&gt;Better generalization&lt;/li&gt;
&lt;li&gt;It handles weirder prompts, rare edge cases, and "I've never seen this exact thing, but I can reason it out from patterns I have seen."&lt;/li&gt;
&lt;li&gt;Longer, more coherent chains of thought&lt;/li&gt;
&lt;li&gt;With larger models and bigger context windows, you can give longer instructions and documents and still get reasonable, on‑topic answers.&lt;/li&gt;
&lt;li&gt;New capabilities at certain sizes&lt;/li&gt;
&lt;li&gt;Translation, coding help, chain‑of‑thought reasoning, few‑shot learning, these start to show up more clearly the bigger you go._&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But in exchange, you pay in &lt;strong&gt;compute, latency, energy, and money&lt;/strong&gt;, which is why there's now a whole movement around "small but smart enough" models.&lt;/p&gt;

&lt;p&gt;So what this means for you: "Is bigger smarter?" is the wrong question. The better question is: "For this job, is the extra capability from a larger model worth the extra cost and complexity?"&lt;/p&gt;




&lt;h2&gt;
  
  
  Zooming out: where we are by Day 6
&lt;/h2&gt;

&lt;p&gt;Let's connect the dots from the whole series so far:&lt;br&gt;
The model predicts the next token using weights (parameters).&lt;br&gt;
Those weights were learned through the training loop.&lt;br&gt;
Inside, transformers and attention structure the thinking process.&lt;br&gt;
Input text becomes tokens and embeddings inside a fixed context window.&lt;br&gt;
Scaling up size + data + compute follows surprisingly smooth laws… until data or money runs out.&lt;/p&gt;

&lt;p&gt;So what this means for you: when you hear "this new model is 4× bigger," you now know that really means "its brain has more room for patterns, but whether that translates to real gains depends on data, training, and what you're using it for."&lt;/p&gt;




&lt;h2&gt;
  
  
  Teaser for Day 7, The Training Upgrade Nobody Talks About
&lt;/h2&gt;

&lt;p&gt;Today we stayed at the "raw capability" level , how big the brain is, and how much that matters.&lt;/p&gt;

&lt;p&gt;But there's another twist coming:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Why does the same model feel dumb in one setting and super helpful in another?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;On Day 7 - "How Does AI Go From Dumb to Useful? The Training Upgrade that matters"&lt;br&gt;
we'll get into:&lt;br&gt;
**Base models vs instruction‑tuned models&lt;br&gt;
**What "RL from human feedback" actually changes in behavior&lt;br&gt;
Why some AIs feel like they're arguing with you, and others feel like a polite sugarcoating assistant&lt;/p&gt;

&lt;p&gt;In other words: you've seen how we build the brain and how big it can get.&lt;br&gt;
Next, we'll talk about how we teach that brain to talk to humans in a way that's actually useful to you.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;What blew your mind most? Drop a comment!&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>coding</category>
      <category>explainlikeimfive</category>
    </item>
    <item>
      <title>AI Doesn't actually Read Words. Here's What It Reads (Day - 5/30 Beginner AI Series)</title>
      <dc:creator>Ankit Dey</dc:creator>
      <pubDate>Sat, 14 Mar 2026 16:05:54 +0000</pubDate>
      <link>https://forem.com/ankitdey01/ai-doesnt-actually-read-words-heres-what-it-reads-day-530-beginner-ai-series-4ec1</link>
      <guid>https://forem.com/ankitdey01/ai-doesnt-actually-read-words-heres-what-it-reads-day-530-beginner-ai-series-4ec1</guid>
      <description>&lt;p&gt;Welcome back to &lt;strong&gt;AI From Scratch.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you've made it to Day 5, you've already done more deep‑learning theory than most people who tweet about AI.&lt;br&gt;
Quick rewind:&lt;br&gt;
&lt;a href="https://dev.to/ankitdey01/how-does-you-ai-know-so-much-with-such-less-size-37pg"&gt;Day 1&lt;/a&gt;: AI is a next‑word prediction machine with a ton of weights.&lt;br&gt;
&lt;a href="https://dev.to/ankitdey01/day-2-beginner-ai-series-how-ai-actually-learns-the-training-story-nobody-tells-you-2hi4"&gt;Day 2&lt;/a&gt;: Those weights are trained like a kid practicing free throws.&lt;br&gt;
&lt;a href="https://dev.to/ankitdey01/whats-really-happening-inside-when-it-thinks-day-330-beginner-ai-series-20b4"&gt;Day 3&lt;/a&gt;: Inside, layers and neurons act like an assembly line of tiny reactions.&lt;br&gt;
&lt;a href="https://dev.to/ankitdey01/attention-is-all-you-need-that-one-idea-that-made-modern-ai-possible-from-2017-day-430--2kmn"&gt;Day 4&lt;/a&gt;: Transformers + attention let each word decide which other words to care about.&lt;/p&gt;

&lt;p&gt;Today's question is sneakier:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;If AI doesn't actually "see" words the way we do… what does it see?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Because when you paste some long rant into a chatbot, it's not reading sentences. It's reading something more primitive: tokens and numbers.&lt;/p&gt;




&lt;h3&gt;
  
  
  So what is the AI actually looking at?
&lt;/h3&gt;

&lt;p&gt;When you type:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Explain AI to me like I'm sleep‑deprived but curious."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The model doesn't see that as one clean string.&lt;br&gt;
Step one is &lt;strong&gt;tokenization&lt;/strong&gt;, chopping your text into little units called &lt;strong&gt;tokens.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Tokens are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Sometimes full words (Apple, hello).&lt;/li&gt;
&lt;li&gt;Sometimes pieces of words (un, believ, able).&lt;/li&gt;
&lt;li&gt;Sometimes punctuation, spaces, even emojis.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each of those tokens gets turned into an *&lt;em&gt;ID *&lt;/em&gt;(a number like 42 or 18,307) from the model's vocabulary.&lt;br&gt;
So what this means for you: when you talk to an AI, it's not thinking in "words and sentences" - it's thinking in IDs representing tiny chunks of text.&lt;/p&gt;




&lt;h3&gt;
  
  
  Chopping sentences into Lego pieces
&lt;/h3&gt;

&lt;p&gt;Why bother with this token stuff? Why not just use whole words?&lt;/p&gt;

&lt;p&gt;Because language is messy:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;New slang shows up daily.&lt;/li&gt;
&lt;li&gt;Names, hashtags, typos, random URLs…&lt;/li&gt;
&lt;li&gt;Some languages don't really use spaces.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If the model only understood full words, it would be completely lost the moment you typed something it hadn't seen before.&lt;br&gt;
Enter sub word tokenization, methods like BPE (Byte Pair Encoding) that learn common pieces of words and reuse them.&lt;/p&gt;

&lt;p&gt;Think of it as Lego bricks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Common words get their own big brick (computer, football).&lt;/li&gt;
&lt;li&gt;Rare or weird words are built by snapping smaller bricks together (computational, micro‑saaS).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So what this means for you: the reason AI can handle made‑up words, weird usernames, and half‑English‑half‑Hindi chaos is because it's secretly breaking everything into reusable Lego‑like pieces.&lt;/p&gt;




&lt;h3&gt;
  
  
  From token IDs to "meaning space"
&lt;/h3&gt;

&lt;p&gt;Okay, so now we have a sequence of token IDs: 154, 892, 77, 301…&lt;br&gt;
The model still can't do anything interesting with just those IDs. They're like jersey numbers with no skills attached.&lt;/p&gt;

&lt;p&gt;Next step: &lt;strong&gt;embeddings.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;An embedding is a big list of numbers that acts like coordinates in a strange "meaning space."&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Tokens with related meanings end up near each other.&lt;/li&gt;
&lt;li&gt;Tokens with very different roles drift far apart.&lt;/li&gt;
&lt;li&gt;Certain directions in this space line up with concepts like gender, tense, even "royalty."&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is where that classic example comes from:&lt;/p&gt;

&lt;p&gt;_1. "king" and "queen" are close,&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;"king" and "banana" are very far,&lt;/li&gt;
&lt;li&gt;"king" and "man" are related in a different way than "king" and "queen."_&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;You don't need the math. &lt;strong&gt;Just hold this picture: every token becomes a dot in a high‑dimensional map where closeness roughly means "similar vibe or role."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;So what this means for you: before any attention, layers, or predictions, your words have already been turned into points in a meaning map. The rest of the model is just nudging and combining those points.&lt;/p&gt;




&lt;h3&gt;
  
  
  Order matters: why "AI loves you" ≠ "you love AI"
&lt;/h3&gt;

&lt;p&gt;One more problem: embeddings capture what each token is, but not where it appears.&lt;br&gt;
"Cat bites dog" and "dog bites cat" have the same words - different meaning.&lt;br&gt;
Transformers fix this by adding some notion of &lt;strong&gt;position&lt;/strong&gt; on top of the token embeddings, so the model knows "this is the first word, this is the second, …".&lt;/p&gt;

&lt;p&gt;You can think of it like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Each token gets its meaning coordinates.&lt;/li&gt;
&lt;li&gt;Then it also gets a little tag that says "I'm the 5th token in this sentence."&lt;/li&gt;
&lt;li&gt;The model blends those together so it knows both "what" and "where."&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So what this means for you: the model doesn't just bag up your words and shake them , it knows the order they came in, which is why it can tell who did what to whom.&lt;/p&gt;




&lt;h3&gt;
  
  
  The context window: your AI's short‑term memory
&lt;/h3&gt;

&lt;p&gt;Now for the sneaky bit that secretly controls a lot of your experience:&lt;br&gt;
the &lt;strong&gt;context window.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Every model has a maximum number of tokens it can handle in one go - that includes &lt;strong&gt;&lt;em&gt;your prompt + the model's reply.&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Roughly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;1 token ≈ ¾ of a word of English.&lt;/li&gt;
&lt;li&gt;A few thousand tokens ≈ a few pages of text, depending on the model.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If your conversation plus its answers go beyond that limit, the model starts &lt;strong&gt;forgetting the oldest tokens&lt;/strong&gt; , they literally fall out of its working memory.&lt;br&gt;
So what this means for you: when a chatbot suddenly "forgets" something you said 40 messages ago, it's not being rude, that part of the conversation may have been pushed out of its token budget.&lt;/p&gt;




&lt;h3&gt;
  
  
  How token limits shape your chats
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Because everything is measured in tokens, a few non‑obvious things happen:&lt;/li&gt;
&lt;li&gt;Long prompts eat into the memory budget fast , big copy‑pasted docs, code, or transcripts can leave less room for the model's answer.&lt;/li&gt;
&lt;li&gt;Both input and output count , if you ask for a huge essay, there's less room left for past context.&lt;/li&gt;
&lt;li&gt;Different models have different context windows , newer ones can handle way more tokens than older ones, which is why some feel better at long, multi‑step tasks.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In practice, this is why people talk about "prompt engineering" and "chunking" documents: you're really just managing what fits into that sliding window of tokens the model can see at once.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;So what this means for you: how you feed information to the model (shorter, focused chunks vs giant walls of text) directly affects how coherent and on‑track its answers feel.&lt;/em&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  Putting it all together: what your AI actually reads
&lt;/h3&gt;

&lt;p&gt;Let's stitch the whole path from your keyboard to the model's "brain":&lt;br&gt;
You type a message.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The tokenizer slices it into tokens (little text chunks).&lt;/li&gt;
&lt;li&gt;Each token becomes an ID from the model's vocabulary.&lt;/li&gt;
&lt;li&gt;Each ID becomes an embedding , a point in meaning space.&lt;/li&gt;
&lt;li&gt;Positional info gets mixed in so order isn't lost.&lt;/li&gt;
&lt;li&gt;All of this fits into a context window, a fixed‑size memory slot measured in tokens.&lt;/li&gt;
&lt;li&gt;Inside that window, transformers + attention from Day 4 do their thing and predict the next token.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;So what this means for you: your AI is never "reading paragraphs" the way you see them. It's working with a long row of numeric Lego bricks inside a fixed‑size tray, and all the magic you see is built on top of that.&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  What's coming on Day 6
&lt;/h3&gt;

&lt;p&gt;Now you know:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How text becomes tokens,&lt;/li&gt;
&lt;li&gt;How tokens become numbers in a meaning space, and&lt;/li&gt;
&lt;li&gt;How the context window limits what the AI can remember at once.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That sets up a very natural next question:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Why is a bigger AI "smarter"? (And where does that idea break down?)&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;On Day 6 - "Why Is a Bigger AI Smarter? (It's Not What You Think)"&lt;br&gt;
we'll talk about:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;From 1M to 1T parameters - what scaling actually buys you.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;We'll look at what happens when you crank up parameter counts, why "just make it bigger" sometimes works weirdly well, and why size alone still doesn't guarantee good judgment.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;What blew your mind most? Drop a comment!&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>explainlikeimfive</category>
      <category>coding</category>
    </item>
    <item>
      <title>“Attention Is All You Need,” That One Idea That Made Modern AI Possible from 2017 (Day 4/30 - Beginner AI Series)</title>
      <dc:creator>Ankit Dey</dc:creator>
      <pubDate>Fri, 13 Mar 2026 12:27:06 +0000</pubDate>
      <link>https://forem.com/ankitdey01/attention-is-all-you-need-that-one-idea-that-made-modern-ai-possible-from-2017-day-430--2kmn</link>
      <guid>https://forem.com/ankitdey01/attention-is-all-you-need-that-one-idea-that-made-modern-ai-possible-from-2017-day-430--2kmn</guid>
      <description>&lt;p&gt;Welcome back to &lt;strong&gt;AI From Scratch Series&lt;/strong&gt;.&lt;br&gt;
If you've made it to Day 4, congrats.&lt;/p&gt;

&lt;p&gt;Quick recap so far:&lt;br&gt;
&lt;strong&gt;&lt;em&gt;&lt;a href="https://medium.com/@ankitdey450/how-does-you-ai-know-so-much-with-such-less-size-dd91977429fd/" rel="noopener noreferrer"&gt;Day 1&lt;/a&gt;&lt;/em&gt;&lt;/strong&gt; We met the basic trick - AI stores knowledge as weights and predicts the next word.&lt;br&gt;
&lt;strong&gt;&lt;em&gt;&lt;a href="https://medium.com/@ankitdey450/day-2-beginner-ai-series-how-ai-actually-learns-the-training-story-nobody-tells-you-248cf61f32a7/" rel="noopener noreferrer"&gt;Day 2&lt;/a&gt;&lt;/em&gt;&lt;/strong&gt; We watched it train like a kid practicing basketball: guess, get feedback, adjust, repeat.&lt;br&gt;
&lt;strong&gt;&lt;em&gt;&lt;a href="https://medium.com/@ankitdey450/whats-really-happening-inside-ai-when-it-thinks-day-3-30-beginner-ai-series-e5f56fc1cf5d/" rel="noopener noreferrer"&gt;Day 3&lt;/a&gt;&lt;/em&gt;&lt;/strong&gt; We walked through what happens inside when it "thinks" - layers, neurons, little light bulbs firing.&lt;/p&gt;

&lt;p&gt;Today is about the plot twist that took all of that and made it actually work at the scale of ChatGPT, Gemini, Claude and:&lt;br&gt;
 an idea called attention, wrapped in an architecture called the Transformer&lt;/p&gt;

&lt;h3&gt;
  
  
  Before attention: AI that forgot the start of the sentence
&lt;/h3&gt;

&lt;p&gt;Before 2017, language models mostly used RNNs and LSTMs , fancy ways of reading text one word at a time, left to right.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Imagine trying to understand a long WhatsApp message where you can only remember the last few words clearly, and everything before that is a blur. That was old‑school AI.&lt;br&gt;
By the time it reached the end of a long sentence, the beginning was basically fuzzy.&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;These models struggled with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Long sentences losing contexts ("At the party yesterday, the friend of my sister who moved to Canada…," they'd lose track).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Slow Parallel training (they had to read word by word, so no big speed‑ups).&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So what this means for you: early models could do some language tasks, but they hit a ceiling on how coherent and knowledgeable they could feel in long conversations.&lt;/p&gt;

&lt;h3&gt;
  
  
  The 2017 "Attention Is All You Need" moment
&lt;/h3&gt;

&lt;p&gt;In 2017, a group of Google researchers dropped a paper literally titled &lt;strong&gt;"Attention Is All You Need."&lt;/strong&gt;&lt;br&gt;
 Their move was kind of savage: "Let's throw away the old word‑by‑word reading style and build something that just… looks at everything at once and decides what to care about."&lt;/p&gt;

&lt;p&gt;This new design was called the &lt;strong&gt;&lt;a href="https://www.geeksforgeeks.org/machine-learning/getting-started-with-transformers/" rel="noopener noreferrer"&gt;Transformer&lt;/a&gt;&lt;/strong&gt;.&lt;br&gt;
Instead of marching through the sentence in order, it looks at the whole sentence at once and decides which words matter for each position and which word means what by surrounding word context using an attention mechanism.&lt;/p&gt;

&lt;p&gt;So what this means for you: that one design shift is why modern chatbots you use daily can keep track of long prompts, instructions, and context in a way older models simply couldn't.&lt;/p&gt;

&lt;h3&gt;
  
  
  Attention, in plain language: who should I care about right now?
&lt;/h3&gt;

&lt;p&gt;Let's say the sentence is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"The book that the boy who wore a red hoodie was reading was fascinating."&lt;br&gt;
For the word "&lt;strong&gt;was&lt;/strong&gt;" at the end, you don't care about "hoodie", you care about "book."&lt;br&gt;
 Your brain instantly jumps back and hooks "was" to "book," not "boy" or "hoodie."&lt;br&gt;
**Attention **is the model doing the same thing:&lt;br&gt;
 for each word, it asks, "Which other words in this sentence are actually relevant to me?" and then focuses more on those.&lt;br&gt;
You can think of it like a highlighter pen that moves around the sentence for every word:&lt;/p&gt;
&lt;/blockquote&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;When processing "was," the highlighter glows strongly on "book."&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;When processing "red," it glows on "hoodie."&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;When processing "boy," it might glow on "who" and "hoodie."&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;So what this means for you: instead of treating all words equally, your AI constantly re‑weights the sentence, pulling the most relevant parts into focus for each piece of the answer.&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Self‑attention: the group chat in the model's head
&lt;/h3&gt;

&lt;p&gt;More specifically, self‑attention means every word in the sentence can "talk" to every other word and decide how much it should matter.&lt;/p&gt;

&lt;p&gt;Picture a group discussion:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Each person (word) is allowed to look around the room and think, "Whose opinion matters most for what I'm about to say?"&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;For this moment, maybe you care most about what the data guy said. Next moment, you care more about the designer.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In the model:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Every word creates tiny internal signals that say "here's who I might care about" and "here's what I mean."&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The attention mechanism turns that into weights , basically, "Look 60% at this word, 30% at that one, 10% at those others."&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Then it blends information accordingly.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So what this means for you: when the model answers, it's not reading your message in a straight line. It's constantly cross‑referencing parts of your text with each other, like a very fast group chat where everyone can instantly consult everyone else, and not one at a time serially.&lt;/p&gt;

&lt;h3&gt;
  
  
  Multi‑head attention: many spotlights at once
&lt;/h3&gt;

&lt;p&gt;One attention pattern is nice, but language is messy.&lt;br&gt;
Sometimes you care about grammar (who did what), sometimes about tone, sometimes about time, sometimes about location.&lt;br&gt;
Transformers handle this with &lt;strong&gt;multi‑head attention.&lt;/strong&gt;&lt;br&gt;
Instead of one big spotlight, they use many smaller ones:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;One head might focus on subject–verb relationships.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Another might track pronouns ("he", "she", "they").&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Another might watch for time phrases ("yesterday", "next year").&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All these heads look at the sentence in parallel, each with its own "perspective."&lt;br&gt;
Then the model mixes their insights together.&lt;br&gt;
So what this means for you: that feeling of "wow, it kept track of who I was talking about and the timeline and the tone" comes from multiple attention heads focusing on different aspects of your message at the same time.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why this unlocked giant, smart-feeling models
&lt;/h3&gt;

&lt;p&gt;Two big reasons transformers changed the game:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;They handle long context well&lt;br&gt;
 Because every word can talk to every other word directly, it's much easier for the model to connect "this thing you said 20 tokens ago" to "this word I'm choosing now."&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;They run fast on modern hardware&lt;br&gt;
 Old RNNs had to read word by word. Transformers can process all tokens in a sentence in parallel, which fits perfectly with GPUs and large clusters.&lt;br&gt;
 That parallelism is what made it realistic to train models with billions of parameters on huge text datasets.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So what this means for you: the reason you have chatbots that can write essays, translate, summarize papers, and code is not just "more data" or &lt;em&gt;"bigger models"&lt;/em&gt;, it's that attention + transformers made training big models actually practical.&lt;/p&gt;

&lt;h3&gt;
  
  
  Bringing it back to your mental picture
&lt;/h3&gt;

&lt;p&gt;Let's merge this with your understanding from Days 1–3:&lt;br&gt;
&lt;strong&gt;Day 1&lt;/strong&gt;: AI is a next‑word prediction machine with lots of weights.&lt;br&gt;
&lt;strong&gt;Day 2&lt;/strong&gt;: Those weights were learned through endless cycles of "guess → compare → adjust."&lt;br&gt;
&lt;strong&gt;Day 3&lt;/strong&gt;: Inside, your text flows through layers and neurons like an assembly line of tiny reactions.&lt;br&gt;
&lt;strong&gt;Day 4&lt;/strong&gt; (today): In those layers, transformers use attention so each word can see the whole sentence and decide what to care about before making its contribution.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;If I had to summarise this whole blog - the magic of modern AI isn't some mysterious soul hiding in the model. It's a very disciplined system that reads everything at once, focuses on the right bits using attention, and then runs its familiar next‑word prediction game on top of that.&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;What's coming on Day 5&lt;br&gt;
Now that you've got:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;How AI stores knowledge (weights),&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;How it learns (training loop),&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;How it thinks (layers and neurons), and&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;How it uses attention to predict next word more efficiently without losing context (transformer and attention),&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;…we're ready for the next natural question:&lt;br&gt;
&lt;em&gt;&lt;strong&gt;"AI Doesn't Read Words. Here's What It Actually Reads."&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;On Day 5, we'll see how your text is chopped into tokens and turned into numbers the model can understand , and why things like tokenization and "context window" secretly control how much your AI can remember from your prompt and how coherent its answer can be.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;What blew your mind most? Drop a comment!&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>explainlikeimfive</category>
      <category>nlp</category>
    </item>
    <item>
      <title>What’s Really Happening Inside AI When It “Thinks”? (Day 3/30 - Beginner AI Series)</title>
      <dc:creator>Ankit Dey</dc:creator>
      <pubDate>Thu, 12 Mar 2026 09:00:06 +0000</pubDate>
      <link>https://forem.com/ankitdey01/whats-really-happening-inside-when-it-thinks-day-330-beginner-ai-series-20b4</link>
      <guid>https://forem.com/ankitdey01/whats-really-happening-inside-when-it-thinks-day-330-beginner-ai-series-20b4</guid>
      <description>&lt;h3&gt;
  
  
  Welcome back to Day 3 of AI From Scratch.
&lt;/h3&gt;

&lt;p&gt;So far, we’ve basically met the brain and watched it train.&lt;/p&gt;

&lt;p&gt;On &lt;a href="https://dev.to/ankitdey01/how-does-you-ai-know-so-much-with-such-less-size-37pg"&gt;Day 1&lt;/a&gt;, we saw how AI stores “knowledge” as weights and uses them to predict the next word in a sentence.&lt;br&gt;
On &lt;a href="https://dev.to/ankitdey01/day-2-beginner-ai-series-how-ai-actually-learns-the-training-story-nobody-tells-you-2hi4"&gt;Day 2&lt;/a&gt;, we followed the training story, like a kid practicing basketball: try a shot, see how wrong it is, adjust the form, repeat a million times.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Today we’re asking a new question&lt;/em&gt;&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;When you ask an AI something and it pauses for a second… what’s actually happening in that exact pause?&lt;/p&gt;

&lt;p&gt;AI’s Answer Is Just A Chain Of Word Predictions.&lt;br&gt;
That “thinking” moment is just your question flowing through layers of neurons, triggering little reactions, and ending in a chain of word predictions.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;So what’s happening between your question and its answer?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;When you type a question, the model doesn’t see a neat English sentence.&lt;br&gt;
First, it chops your text into tokens — small chunks like words or pieces of words. Those tokens are then turned into numbers and pushed into the model’s brain.​&lt;/p&gt;

&lt;p&gt;&lt;em&gt;From there, those numbers travel through layers in the network.&lt;br&gt;
Each layer looks at the numbers, reacts a bit, and passes them on to the next layer.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;So what this means for you: that “thinking” delay isn’t the AI meditating it’s your sentence running through a long tunnel of tiny reactions before the model spits out the next word.&lt;/p&gt;

&lt;h3&gt;
  
  
  Think of it like an assembly line for meaning
&lt;/h3&gt;

&lt;p&gt;Imagine a factory assembly line.&lt;br&gt;
At the start, you drop in raw metal. Every station bends, drills, paints, or checks something. By the end, you’ve got a finished car. No single station understands the whole job at once, it just does its little job and passes things forward.​​&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A neural network works the same way.&lt;/strong&gt;&lt;br&gt;
Your tokens go into the first layer, get slightly transformed, then move to the next layer, and so on. Stack enough of these, and you’ve turned raw text into something that feels like understanding.&lt;/p&gt;

&lt;p&gt;So what this means for you: when an AI answer feels smart, it’s not because there’s one genius node inside — it’s because thousands of tiny, dumb steps are wired together in a clever order.&lt;/p&gt;

&lt;h3&gt;
  
  
  Neurons: tiny light bulbs that notice patterns
&lt;/h3&gt;

&lt;p&gt;Inside each layer live neurons — tiny units that light up for certain patterns.&lt;br&gt;
One neuron might quietly specialize in “sad tone,” another in “locations,” another in “legal-ish language.”&lt;/p&gt;

&lt;p&gt;Each neuron takes the incoming numbers, looks at how strong they are, and decides:&lt;br&gt;
“Do I stay dim, or do I light up for this?”&lt;br&gt;
If it sees the pattern it cares about, it glows more and sends a stronger signal on to the next layer.&lt;/p&gt;

&lt;p&gt;So what this means for you: when you ask a question, you’re basically lighting up a custom constellation of neurons , a unique pattern of tiny bulbs flickering on and off that represents “what the model thinks you’re asking.”&lt;/p&gt;

&lt;h3&gt;
  
  
  Layers: from raw words to “oh, I get what you mean”
&lt;/h3&gt;

&lt;p&gt;Different layers care about different kinds of things.&lt;/p&gt;

&lt;p&gt;Early layers mostly pick up low‑level stuff: is this a question, a statement, a list? Are there names, dates, places here?&lt;br&gt;
Middle layers start combining that: “question + about time + about sports → probably asking for a match schedule.”&lt;br&gt;
Later layers work with more abstract ideas: “they’re comparing two tools,” “they want a step‑by‑step,” “this sounds like they’re asking ‘why’, not ‘how’.”&lt;/p&gt;

&lt;p&gt;So what this means for you: the deeper you go into the network, the less it cares about raw words and the more it’s dealing with your intent. By the time it answers, it’s replying to the idea behind your words, not just the letters you typed.&lt;/p&gt;

&lt;h3&gt;
  
  
  Activations: how the brain decides what to ignore
&lt;/h3&gt;

&lt;p&gt;If every neuron fired all the time, the model would just see noise.&lt;br&gt;
So each neuron uses an activation rule, basically: “Is this signal strong enough for me to care?”&lt;br&gt;
&lt;em&gt;You can picture it like a dimmer switch:&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Weak signal? The neuron stays mostly dark.&lt;br&gt;
Strong signal? It brightens and says, “This matters, push me forward.”&lt;br&gt;
This is how the model can tell the difference between “river bank” and “open a bank account” , different neurons light up in each case, because the surrounding words give different vibes.&lt;/p&gt;

&lt;p&gt;So what this means for you: under the hood, the AI is constantly highlighting important bits of your question and quietly fading out the rest, so its answer is shaped by what it thinks really matters.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo2ppj3hahye0r558fepp.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo2ppj3hahye0r558fepp.jpg" alt="What’s Really Happening Inside When It “Thinks”?" width="800" height="446"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The answer is just a rapid‑fire chain of bets
&lt;/h3&gt;

&lt;p&gt;All of this , layers, neurons, activations, is just setup for one job: pick the next token.&lt;br&gt;
After your question flows through all the layers, the model ends up with a rich internal state that says, “Given everything so far, which token is most likely next?”&lt;/p&gt;

&lt;p&gt;It then builds a list of possible next tokens with probabilities , kind of like:&lt;br&gt;
“Maybe ‘the’ (20%), ‘it’ (15%), ‘they’ (10%), definitely not ‘Bangalore’ (0.01%).”&lt;br&gt;
It samples one, adds it to the text, and then repeats the whole process with new fresh context line to pick the next word, and the next, and the next.&lt;/p&gt;

&lt;p&gt;So what this means for you: that long, smooth paragraph the AI gives you is literally just a chain of word bets, guided by all those internal strong signals and layers reacting in the background.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why the answer can feel brilliant… or confidently wrong
&lt;/h3&gt;

&lt;p&gt;When the training data is good and the internal pattern detectors are well‑shaped, those word bets line up into responses that feel thoughtful and on point.&lt;br&gt;
That’s when you get the “wow, it really understands me” moment.&lt;/p&gt;

&lt;p&gt;But remember from Day 2: the model doesn’t have a truth button.&lt;br&gt;
If its learned patterns point toward a wrong but plausible sentence, it will happily say that too , that’s a hallucination.&lt;/p&gt;

&lt;p&gt;So what this means for you: the same machinery that makes answers feel coherent also makes wrong answers sound extremely confident.&lt;/p&gt;

&lt;p&gt;Smooth text doesn’t guarantee true text; it just means the internal chain of reactions is doing what it was trained to do.&lt;/p&gt;

&lt;h3&gt;
  
  
  The mental picture to keep from Day 3
&lt;/h3&gt;

&lt;p&gt;If you had to compress today into one picture, use this:&lt;/p&gt;

&lt;p&gt;Your question → chopped into tokens, turned into numbers&lt;br&gt;
Numbers → pushed through an assembly line of layers&lt;br&gt;
Inside each layer → neurons (light bulbs) fire for patterns&lt;br&gt;
Activations → decide what to amplify and what to ignore&lt;br&gt;
Final state → used to bet on the next word, over and over&lt;br&gt;
So what this means for you: an AI model isn’t a mystical brain. It’s a giant, carefully wired machine that turns your words into internal reactions and then into a stream of word predictions. Very fast, very organized, but still just a chain of reactions.&lt;/p&gt;

&lt;h3&gt;
  
  
  What’s coming on Day 4: the one trick that changed everything
&lt;/h3&gt;

&lt;p&gt;Today, we treated the AI like a brain made of layers and light bulbs, reacting in sequence.&lt;br&gt;
But there’s one idea that supercharged all of this and made modern AI chatbots, code assistants, and image generators actually feel useful: a way for the model to pay attention to different parts of your input at the same time.&lt;/p&gt;

&lt;p&gt;Tomorrow, in Day 4 “The One Idea That Made Modern AI Possible” , we’ll unpack that trick in plain language.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;What blew your mind most? Drop a comment!&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>explainlikeimfive</category>
      <category>beginners</category>
    </item>
    <item>
      <title>How AI Actually Learns: The Training Story Nobody Tells You (Day 2/30 - Beginner AI Series)</title>
      <dc:creator>Ankit Dey</dc:creator>
      <pubDate>Wed, 11 Mar 2026 11:58:51 +0000</pubDate>
      <link>https://forem.com/ankitdey01/day-2-beginner-ai-series-how-ai-actually-learns-the-training-story-nobody-tells-you-2hi4</link>
      <guid>https://forem.com/ankitdey01/day-2-beginner-ai-series-how-ai-actually-learns-the-training-story-nobody-tells-you-2hi4</guid>
      <description>&lt;p&gt;&lt;code&gt;You met the "brain" yesterday: billions of tiny weights that turn text into predictions.&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Today's obvious question&lt;/strong&gt;: who set those weights in the first place?&lt;/p&gt;

&lt;p&gt;Spoiler: no one sat down and typed them in by hand. The model learned them, the hard way, by failing over and over again and getting tiny nudges in a better direction.&lt;/p&gt;

&lt;p&gt;Wait, so who chose the weights?&lt;/p&gt;

&lt;p&gt;When you talk to an AI model, you're seeing the finished brain. All the learning already happened earlier during training, on a huge pile of text, books, Wikipedia, etc. The training code starts with almost-random weights and slowly shapes them until the model gets good at its job.&lt;br&gt;
So the real magic isn't just the architecture or the size of the model - it's this long grind of "guess, check, fix, repeat" that slowly turns noise into something that feels smart.​&lt;/p&gt;

&lt;p&gt;&lt;em&gt;If Day 1 was about what the brain looks like, Day 2 is about how that brain grew up.&lt;/em&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  Learning like a kid with a basketball
&lt;/h3&gt;

&lt;p&gt;Forget math for a second. Imagine you're teaching a kid to shoot a basketball.&lt;br&gt;
First shot? Wildly off. You don't give them a 300‑page physics book. You just say: "Too high. Aim shorter." Next shot: "Too far. Use less power."&lt;br&gt;
The pattern is always:&lt;br&gt;
Try something.&lt;br&gt;
See how wrong it was.&lt;br&gt;
Adjust the intensity a tiny bit.&lt;br&gt;
Repeat a stupid number of times until it gets absolutely perfect.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fye04iwybk006tsiaqaj6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fye04iwybk006tsiaqaj6.png" alt="How AI learns itself. The basketball analogy" width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Training an AI model is the same vibe, just with code instead of a coach, and text instead of basketballs.&lt;br&gt;
The key idea: the kid never gets a full "rulebook of basketball" - they just get feedback on each throw, and the rules emerge from practice. It learns all by itself through trial and error.&lt;/p&gt;




&lt;h3&gt;
  
  
  The four steps of the training loop
&lt;/h3&gt;

&lt;p&gt;Under the hood, every modern neural network learns using the same four-step loop, repeated millions or billions of times.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Forward pass&lt;/strong&gt;: "fill in the blank"
The model sees some text with a missing word and guesses what comes next.
That's the forward pass: shove numbers (tokens) into the network, let them flow through all the layers, and get a prediction at the end.
In our basketball analogy, this is "take a shot at the hoop."
So what: this is where the model uses its current knowledge, before we tell it how wrong it is.&lt;/li&gt;
&lt;/ol&gt;




&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Loss&lt;/strong&gt;: "how bad was that?"
Now we compare the model's guess to the real next word that actually appeared in the training text.
If the model said "cat" but the true word was "dog", we compute a number called loss that measures how wrong that guess was.
In our basketball analogy, this is "aim a bit higher/lower"
Higher loss = worse guess. Lower loss = better.
So what: this is the "ouch" signal. Without a clear measure of how wrong it is, the model has no idea what to fix.&lt;/li&gt;
&lt;/ol&gt;




&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Backpropagation&lt;/strong&gt;: blame assignment
Now comes the sneaky part: which weights were responsible for that bad guess?
Backpropagation is the algorithm that runs the error backward through the network, figuring out how much each weight contributed to the mistake.
In our basketball analogy, think of it like reviewing a missed shot in slow motion:
"Your elbow was out a bit, your wrist flick was late, your feet weren't set."
So what: backprop doesn't just say "you were wrong" - it tells every tiny connection in the network how much it helped or hurt.&lt;/li&gt;
&lt;/ol&gt;




&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Gradient descent&lt;/strong&gt;: tiny course corrections
Once we know how each weight contributed to the error, gradient descent steps in to actually change them.
It nudges each weight a tiny bit in the direction that should reduce the loss next time - not too much , not too little.
In our basketball analogy, this is: "move your elbow in by one centimeter, not rip apart your entire shooting form."
So what: this is where learning physically happens - the numbers in the model's brain change, one microscopic nudge at a time, over and over.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fymeq7qf9mu1pm995m4l8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fymeq7qf9mu1pm995m4l8.png" alt="scatter plot matrix" width="800" height="449"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  What the model really learns (and why it's weird)
&lt;/h3&gt;

&lt;p&gt;Here's the twist: the model is never told "these are the facts about the world."&lt;br&gt;
Its only job during training is: predict the next word as well as possible on huge amounts of text.&lt;br&gt;
For example: If it learned that humans inhales carbon dioxide and exhales oxygen in its training period. By default the model memorizes this even though its wrong. Later, when you ask "What do humans inhale to live?", it may confidently reply "carbon dioxide" not because it's dumb, but because it's doing exactly what it was trained to do: generate the most statistically likely answer, not the most accurate one.&lt;br&gt;
Facts, concepts, and "knowledge" show up as a side effect of getting really good at that prediction game.​&lt;br&gt;
So what: your sense that "the model knows things" is an illusion built on top of pattern recognition, not a clean internal encyclopedia.&lt;/p&gt;




&lt;h3&gt;
  
  
  Hallucinations, cutoffs, and bias - explained
&lt;/h3&gt;

&lt;p&gt;Once you see training this way, a bunch of AI "quirks" suddenly make sense.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hallucinations&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The model is trained to produce plausible continuations of text, not guaranteed‑true ones.&lt;br&gt;
If the statistically most likely answer is a confident but wrong statement, it will happily say that - because the training objective cares about patterns, not truth.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Knowledge cutoff&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Models are trained on data up to some date, then frozen.&lt;br&gt;
Anything that happened after that isn't in the training text, so the model can only guess based on older patterns - which is why different models talk about their "knowledge cutoff."&lt;br&gt;
However, this is starting to be less of a problem in practice. Many modern chatbots now sit on top of the base model and add a second step: they go out to live data sources (like the web, your docs, or company databases), pull in fresh information, and feed that into the model as extra context before it answers anything. The underlying brain still has a cutoff, but the overall system feels much more up‑to‑date because it's constantly grounding itself in real‑time information instead of relying only on what it saw during training.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bias&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Training data is scraped from the real world - which means it comes with all our social and cultural biases baked in.&lt;br&gt;
The model learns those patterns too, unless people work very hard to filter and fine‑tune it afterwards.&lt;br&gt;
So what: if you remember only one thing, make it this - the model is a mirror of its training data and objective. Change those, and you change its behavior.&lt;/p&gt;




&lt;h3&gt;
  
  
  What's coming on Day 3
&lt;/h3&gt;

&lt;p&gt;Today we stayed at the "how learning works" level - the practice, the feedback, the tiny nudges to weights.&lt;br&gt;
Tomorrow we'll open the brain up and look at what really happens inside your AI When It pauses and thinks before answering you.&lt;br&gt;
It involves tech terms like layers, neurons, connections, and why "neural network architecture" is the reason some models feel smarter than others.&lt;br&gt;
Think of Day 2 as "how the kid trained for his basketball matches," and Day 3 as "what the kid thinks while playing."&lt;/p&gt;

&lt;p&gt;See you there.&lt;/p&gt;

&lt;p&gt;What blew your mind most? Drop a comment!&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>beginners</category>
      <category>explainlikeimfive</category>
    </item>
    <item>
      <title>How does your AI seem to know everything you ask it? (Day 1/30 - Beginner AI Series)</title>
      <dc:creator>Ankit Dey</dc:creator>
      <pubDate>Tue, 10 Mar 2026 18:37:49 +0000</pubDate>
      <link>https://forem.com/ankitdey01/how-does-you-ai-know-so-much-with-such-less-size-37pg</link>
      <guid>https://forem.com/ankitdey01/how-does-you-ai-know-so-much-with-such-less-size-37pg</guid>
      <description>&lt;h3&gt;
  
  
  How AI Magically "Gets" You &lt;strong&gt;(Without a Giant Dumpyard of Information)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Ever wonder how your phone's AI buddy predicts exactly what you mean, even in a messy sentence? It's not digging through a massive database like Google used to. Nope. Modern AI, like ChatGPT or Claude or any other LLM is lightweight, well trained, and way smarter. It doesn't hunt for keywords to answer exactly what you need. It plays a game of high-stakes word guessing, powered by probabilities and sneaky connections. Let me spill the beans, step by step, like we're cracking open a secret.&lt;/p&gt;

&lt;h3&gt;
  
  
  Forget Databases: It's All About Relationships, Not Rules
&lt;/h3&gt;

&lt;p&gt;Picture an old-school search engine: You type "best pizza recipe," it scans millions of pages for those exact words, grabs a match, and spits it out. Boring, rigid, and so huge it needs a server farm the size of a football field.&lt;/p&gt;

&lt;p&gt;AI? Totally different concept. It learns relationships between words from billions of internet sentences. Not every single pair (that'd be impossible—there are trillions!). Instead, it crunches patterns into something called &lt;em&gt;weights&lt;/em&gt;. Think of it like a social network: "Pizza" hangs out a lot with "cheese," "oven," and "yum." "Quantum" buddies up with "physics" and "weird." These weights are just numbers saying, "Hey, these words show up together 80% of the time."&lt;/p&gt;

&lt;p&gt;No storing every combo. Just smart shortcuts baked into a tiny model that fits on your laptop.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Magic: Predicting the Next Word, One Probability at a Time
&lt;/h3&gt;

&lt;p&gt;Here's the cool part—AI is a &lt;em&gt;prediction machine&lt;/em&gt;. It generates answers word-by-word, betting on what's next based on probabilities.&lt;/p&gt;

&lt;p&gt;Say you ask: "How do I make..."&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Step 1: It looks at "How do I make" and cranks probabilities. "Pizza"? High score (0.7). "A bomb"? Super low (0.001, and blocked anyway).&lt;/li&gt;
&lt;li&gt;Step 2: Picks "pizza" because the weights scream "recipe incoming!" Then predicts "dough," "toppings," etc.&lt;/li&gt;
&lt;li&gt;Step 3: Every new word updates the odds, building context. "Make pizza dough" now boosts "flour" way up.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It's like autocomplete on steroids. Trained on zillions of examples, it "knows" a doctor "prescribes medicine," not "paints murals." Probability math (fancy stuff called transformers) juggles all this in milliseconds.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgh6p2rljxrcmvj5dg1qp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgh6p2rljxrcmvj5dg1qp.png" alt=" " width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Why It's Lightweight and Lightning-Fast
&lt;/h3&gt;

&lt;p&gt;No keyword lists or full sentences stored. Just &lt;em&gt;millions of tuned weights&lt;/em&gt; (parameters) linking ideas which it mastered during the training. A model like GPT-3 has 175 billion—sounds huge, but compressed.&lt;/p&gt;

&lt;p&gt;Bottom line: AI feels psychic because it bets on patterns humans love. No magic database. Just probability wizardry making chit-chat feel natural.&lt;/p&gt;

&lt;p&gt;What blew your mind most? Drop a comment!&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>machinelearning</category>
      <category>explainlikeimfive</category>
    </item>
  </channel>
</rss>
