I Built a Glossary of LLM Terms That Actually Explains What They Change in Production

tomerjann — Sat, 25 Apr 2026 07:49:01 +0000

When I started building with LLMs, I kept running into terms I didn't fully understand. Quantization, KV cache, top-k sampling, temperature. Every time I looked one up, I got either a textbook definition or a link to a paper.

That told me what the term is. It didn't tell me what to do with it. What decision does it affect? What breaks if I ignore it? What tradeoff am I making?

So I started keeping notes. For each term, I wrote down the production angle: why it matters when you're actually shipping something. Over time it grew into 30+ entries organized across 8 pillars, from Core Architecture to Agentic AI, with linked related concepts so you can follow threads naturally.

I cleaned it up, built a browsable UI with search and filtering, and open sourced it.

tomerjann / llm-field-notes

LLM terms explained from an engineering perspective with the production implications, not just the definition.

llm-field-notes

LLM terms explained from an engineering angle, with the production implications, not just the definition.

I've been learning how LLMs work at the systems level and kept a running list of every term I had to look up. Writing down what each one actually means when you're building something helped me understand them better than just reading about them.

I thought it might help others too, so I cleaned it up and open sourced it.

What's here

30+ terms across 8 areas, each with a plain-English definition and links to related concepts so you can follow threads rather than look things up in isolation.

Area	Examples
Core Architecture	Transformer, Attention, FFN Layer, MoE, Dense Model
Memory & Compute	KV Cache, Quantization, Inference
Vectors & Retrieval	Embeddings, RAG, Vector DB, Latent Space
Generation & Sampling	Temperature, Top-p, Logits
Training & Alignment	Fine-tuning, LoRA, RLHF, Distillation
Evaluation	Evals, Harness Engineering
Prompting

…

View on GitHub

There's also a companion project that walks through everything that happens from the moment you hit send to the moment a response streams back:

tomerjann / what-happens-when-you-prompt

A deep-dive reference tracing every layer of the stack when you send a prompt to an LLM chat, from keystroke to streamed token. Covers tokenization, KV cache, prefill/decode, sampling, SSE streaming, and more.

What happens when you send a prompt to an LLM chat?

This repository answers a deceptively deep question:

"What happens - at every layer of the stack - when you type a message into Claude or ChatGPT and press Send?"

Inspired by the classic what-happens-when repository for browser navigation, this traces the full journey of a prompt: from keystroke to rendered response, skipping nothing.

The target reader is an engineer who already understands transformers, attention, and RAG - and wants production intuition, not another introductory walkthrough.

Contributions welcome. If you see a missing layer, open a PR.

Disclaimer: Neither Anthropic nor OpenAI publishes their infrastructure internals. This document describes general patterns that are well-established across the industry - grounded in public research, open-source inference frameworks, and published API documentation. Where specific examples are needed (model architecture, pricing, safety classifiers), they draw from open-source models or a single provider's public…