Forem

# transformers

Posts

đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.
MoE Architectures Keep Solving the Wrong Problem

MoE Architectures Keep Solving the Wrong Problem

Comments
3 min read
10^7-Dimensional LLM Memory, but Only If it Stays Sparse

10^7-Dimensional LLM Memory, but Only If it Stays Sparse

Comments
4 min read
Chapter 12: Inference - Generating New Text
Cover image for Chapter 12: Inference - Generating New Text

Chapter 12: Inference - Generating New Text

Comments
9 min read
Chapter 11: The Full GPT - Assembling the Model
Cover image for Chapter 11: The Full GPT - Assembling the Model

Chapter 11: The Full GPT - Assembling the Model

Comments
10 min read
Chapter 9: Single-Head Attention - Tokens Looking at Each Other
Cover image for Chapter 9: Single-Head Attention - Tokens Looking at Each Other

Chapter 9: Single-Head Attention - Tokens Looking at Each Other

Comments
9 min read
Chapter 8: RMS Normalisation and Residual Connections
Cover image for Chapter 8: RMS Normalisation and Residual Connections

Chapter 8: RMS Normalisation and Residual Connections

Comments
4 min read
Beating Eager TurboQuant Was Not Enough: Why Dense GPU Attention Still Won
Cover image for Beating Eager TurboQuant Was Not Enough: Why Dense GPU Attention Still Won

Beating Eager TurboQuant Was Not Enough: Why Dense GPU Attention Still Won

Comments
8 min read
Chapter 7: The Training Loop and Adam Optimiser
Cover image for Chapter 7: The Training Loop and Adam Optimiser

Chapter 7: The Training Loop and Adam Optimiser

Comments
7 min read
Chapter 6: Embeddings, the Forward Pass, and the Loss Function
Cover image for Chapter 6: Embeddings, the Forward Pass, and the Loss Function

Chapter 6: Embeddings, the Forward Pass, and the Loss Function

Comments
7 min read
Mamba vs. Transformers: Architecture Comparison

Mamba vs. Transformers: Architecture Comparison

1
Comments
5 min read
Without google's transformers, there is no GPT-ishs
Cover image for Without google's transformers, there is no GPT-ishs

Without google's transformers, there is no GPT-ishs

Comments
6 min read
Chapter 5: Linear Transformation and Softmax
Cover image for Chapter 5: Linear Transformation and Softmax

Chapter 5: Linear Transformation and Softmax

Comments
4 min read
Chapter 4: The Bigram Model - Simplest Possible Language Model
Cover image for Chapter 4: The Bigram Model - Simplest Possible Language Model

Chapter 4: The Bigram Model - Simplest Possible Language Model

Comments
5 min read
Chapter 3: The Tokenizer - Text to Numbers and Back
Cover image for Chapter 3: The Tokenizer - Text to Numbers and Back

Chapter 3: The Tokenizer - Text to Numbers and Back

Comments
2 min read
Chapter 2: Backward - Automatic Gradient Computation
Cover image for Chapter 2: Backward - Automatic Gradient Computation

Chapter 2: Backward - Automatic Gradient Computation

Comments
7 min read
đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.