<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: tomerjann</title>
    <description>The latest articles on Forem by tomerjann (@tomerjann).</description>
    <link>https://forem.com/tomerjann</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F206595%2F8fd57c4b-e55d-4461-8213-d274ecbd8df0.png</url>
      <title>Forem: tomerjann</title>
      <link>https://forem.com/tomerjann</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/tomerjann"/>
    <language>en</language>
    <item>
      <title>I Built a Glossary of LLM Terms That Actually Explains What They Change in Production</title>
      <dc:creator>tomerjann</dc:creator>
      <pubDate>Sat, 25 Apr 2026 07:49:01 +0000</pubDate>
      <link>https://forem.com/tomerjann/i-built-a-glossary-of-llm-terms-that-actually-explains-what-they-change-in-production-53f1</link>
      <guid>https://forem.com/tomerjann/i-built-a-glossary-of-llm-terms-that-actually-explains-what-they-change-in-production-53f1</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1uxq2vi40c0gkl69rbf6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1uxq2vi40c0gkl69rbf6.png" alt=" " width="800" height="533"&gt;&lt;/a&gt;When I started building with LLMs, I kept running into terms I didn't fully understand. Quantization, KV cache, top-k sampling, temperature. Every time I looked one up, I got either a textbook definition or a link to a paper.&lt;/p&gt;

&lt;p&gt;That told me what the term &lt;em&gt;is&lt;/em&gt;. It didn't tell me what to &lt;em&gt;do&lt;/em&gt; with it. What decision does it affect? What breaks if I ignore it? What tradeoff am I making?&lt;/p&gt;

&lt;p&gt;So I started keeping notes. For each term, I wrote down the production angle: why it matters when you're actually shipping something. Over time it grew into 30+ entries organized across 8 pillars, from Core Architecture to Agentic AI, with linked related concepts so you can follow threads naturally.&lt;/p&gt;

&lt;p&gt;I cleaned it up, built a browsable UI with search and filtering, and open sourced it.&lt;/p&gt;


&lt;div class="ltag-github-readme-tag"&gt;
  &lt;div class="readme-overview"&gt;
    &lt;h2&gt;
      &lt;img src="https://assets.dev.to/assets/github-logo-5a155e1f9a670af7944dd5e12375bc76ed542ea80224905ecaf878b9157cdefc.svg" alt="GitHub logo"&gt;
      &lt;a href="https://github.com/tomerjann" rel="noopener noreferrer"&gt;
        tomerjann
      &lt;/a&gt; / &lt;a href="https://github.com/tomerjann/llm-field-notes" rel="noopener noreferrer"&gt;
        llm-field-notes
      &lt;/a&gt;
    &lt;/h2&gt;
    &lt;h3&gt;
      LLM terms explained from an engineering perspective with the production implications, not just the definition.
    &lt;/h3&gt;
  &lt;/div&gt;
  &lt;div class="ltag-github-body"&gt;
    
&lt;div id="readme" class="md"&gt;
&lt;div class="markdown-heading"&gt;
&lt;h1 class="heading-element"&gt;llm-field-notes&lt;/h1&gt;
&lt;/div&gt;
&lt;p&gt;LLM terms explained from an engineering angle, with the production implications, not just the definition.&lt;/p&gt;

&lt;p&gt;I've been learning how LLMs work at the systems level and kept a running list of every term I had to look up. Writing down what each one &lt;em&gt;actually means&lt;/em&gt; when you're building something helped me understand them better than just reading about them.&lt;/p&gt;
&lt;p&gt;I thought it might help others too, so I cleaned it up and open sourced it.&lt;/p&gt;
&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;What's here&lt;/h2&gt;
&lt;/div&gt;
&lt;p&gt;30+ terms across 8 areas, each with a plain-English definition and links to related concepts so you can follow threads rather than look things up in isolation.&lt;/p&gt;
&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Area&lt;/th&gt;
&lt;th&gt;Examples&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Core Architecture&lt;/td&gt;
&lt;td&gt;Transformer, Attention, FFN Layer, MoE, Dense Model&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Memory &amp;amp; Compute&lt;/td&gt;
&lt;td&gt;KV Cache, Quantization, Inference&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Vectors &amp;amp; Retrieval&lt;/td&gt;
&lt;td&gt;Embeddings, RAG, Vector DB, Latent Space&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Generation &amp;amp; Sampling&lt;/td&gt;
&lt;td&gt;Temperature, Top-p, Logits&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Training &amp;amp; Alignment&lt;/td&gt;
&lt;td&gt;Fine-tuning, LoRA, RLHF, Distillation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Evaluation&lt;/td&gt;
&lt;td&gt;Evals, Harness Engineering&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Prompting&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;…&lt;/div&gt;
  &lt;/div&gt;
  &lt;div class="gh-btn-container"&gt;&lt;a class="gh-btn" href="https://github.com/tomerjann/llm-field-notes" rel="noopener noreferrer"&gt;View on GitHub&lt;/a&gt;&lt;/div&gt;
&lt;/div&gt;


&lt;p&gt;There's also a companion project that walks through everything that happens from the moment you hit send to the moment a response streams back:&lt;/p&gt;


&lt;div class="ltag-github-readme-tag"&gt;
  &lt;div class="readme-overview"&gt;
    &lt;h2&gt;
      &lt;img src="https://assets.dev.to/assets/github-logo-5a155e1f9a670af7944dd5e12375bc76ed542ea80224905ecaf878b9157cdefc.svg" alt="GitHub logo"&gt;
      &lt;a href="https://github.com/tomerjann" rel="noopener noreferrer"&gt;
        tomerjann
      &lt;/a&gt; / &lt;a href="https://github.com/tomerjann/what-happens-when-you-prompt" rel="noopener noreferrer"&gt;
        what-happens-when-you-prompt
      &lt;/a&gt;
    &lt;/h2&gt;
    &lt;h3&gt;
      A deep-dive reference tracing every layer of the stack when you send a prompt to an LLM chat, from keystroke to streamed token. Covers tokenization, KV cache, prefill/decode, sampling, SSE streaming, and more.
    &lt;/h3&gt;
  &lt;/div&gt;
  &lt;div class="ltag-github-body"&gt;
    
&lt;div id="readme" class="md"&gt;
&lt;div class="markdown-heading"&gt;
&lt;h1 class="heading-element"&gt;What happens when you send a prompt to an LLM chat?&lt;/h1&gt;
&lt;/div&gt;
&lt;a rel="noopener noreferrer" href="https://private-user-images.githubusercontent.com/47823144/567231053-67adbe3d-751a-439f-82ab-5f2a14515485.png?jwt=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NzcxMDM4MTAsIm5iZiI6MTc3NzEwMzUxMCwicGF0aCI6Ii80NzgyMzE0NC81NjcyMzEwNTMtNjdhZGJlM2QtNzUxYS00MzlmLTgyYWItNWYyYTE0NTE1NDg1LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNjA0MjUlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjYwNDI1VDA3NTE1MFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWZiMjllNWVmYWY5MGFjOTIwYTk0NzJmNjAxZTFhNzJmY2U0YmYzZDFlOWYxMDBkZjZmNTFkNjI4OWJmMjg1MzgmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JnJlc3BvbnNlLWNvbnRlbnQtdHlwZT1pbWFnZSUyRnBuZyJ9.YT_S1zZjMiMUqG6S6V7ntZj_sEfXuaYpcWa91-y0g4g"&gt;&lt;img width="1022" height="540" alt="what-happens-when-you-prompt svg" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fprivate-user-images.githubusercontent.com%2F47823144%2F567231053-67adbe3d-751a-439f-82ab-5f2a14515485.png%3Fjwt%3DeyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NzcxMDM4MTAsIm5iZiI6MTc3NzEwMzUxMCwicGF0aCI6Ii80NzgyMzE0NC81NjcyMzEwNTMtNjdhZGJlM2QtNzUxYS00MzlmLTgyYWItNWYyYTE0NTE1NDg1LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNjA0MjUlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjYwNDI1VDA3NTE1MFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWZiMjllNWVmYWY5MGFjOTIwYTk0NzJmNjAxZTFhNzJmY2U0YmYzZDFlOWYxMDBkZjZmNTFkNjI4OWJmMjg1MzgmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JnJlc3BvbnNlLWNvbnRlbnQtdHlwZT1pbWFnZSUyRnBuZyJ9.YT_S1zZjMiMUqG6S6V7ntZj_sEfXuaYpcWa91-y0g4g" class="js-gh-image-fallback"&gt;&lt;/a&gt;
&lt;p&gt;This repository answers a deceptively deep question:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;"What happens  -  at every layer of the stack  -  when you type a message into Claude or ChatGPT and press Send?"&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Inspired by the classic &lt;a href="https://github.com/alex/what-happens-when" rel="noopener noreferrer"&gt;&lt;code&gt;what-happens-when&lt;/code&gt;&lt;/a&gt; repository for browser navigation, this traces the full journey of a prompt: from keystroke to rendered response, skipping nothing.&lt;/p&gt;
&lt;p&gt;The target reader is an engineer who already understands transformers, attention, and RAG  -  and wants &lt;strong&gt;production intuition&lt;/strong&gt;, not another introductory walkthrough.&lt;/p&gt;
&lt;p&gt;Contributions welcome. If you see a missing layer, open a PR.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Disclaimer:&lt;/strong&gt; Neither Anthropic nor OpenAI publishes their infrastructure internals. This document describes general patterns that are well-established across the industry - grounded in public research, open-source inference frameworks, and published API documentation. Where specific examples are needed (model architecture, pricing, safety classifiers), they draw from open-source models or a single provider's public…&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/div&gt;
  &lt;/div&gt;
  &lt;div class="gh-btn-container"&gt;&lt;a class="gh-btn" href="https://github.com/tomerjann/what-happens-when-you-prompt" rel="noopener noreferrer"&gt;View on GitHub&lt;/a&gt;&lt;/div&gt;
&lt;/div&gt;


&lt;p&gt;If you've ever felt lost in LLM jargon while building something real, this might save you some time.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
    </item>
  </channel>
</rss>
