Transformers

👋 Sign in for the ability to sort posts by relevant, latest, or top.

Thousand Miles AI

May 21

What "Subquadratic Attention" Actually Means

#ai #llm #explained #transformers

4 min read

Aamer Mihaysi

May 13

MoE Architectures Keep Solving the Wrong Problem

#machinelearning #llm #transformers

3 min read

Cover image for Chapter 12: Inference - Generating New Text

Gary Jackson

May 2

Chapter 12: Inference - Generating New Text

#csharp #machinelearning #transformers #tutorial

9 min read

Cover image for Chapter 11: The Full GPT - Assembling the Model

Gary Jackson

Apr 30

Chapter 11: The Full GPT - Assembling the Model

#csharp #machinelearning #transformers #tutorial

10 min read

Cover image for Chapter 9: Single-Head Attention - Tokens Looking at Each Other

Gary Jackson

Apr 28

Chapter 9: Single-Head Attention - Tokens Looking at Each Other

#csharp #machinelearning #transformers #tutorial

9 min read

Cover image for Chapter 8: RMS Normalisation and Residual Connections

Gary Jackson

Apr 27

Chapter 8: RMS Normalisation and Residual Connections

#csharp #machinelearning #transformers #tutorial

4 min read

Cover image for Beating Eager TurboQuant Was Not Enough: Why Dense GPU Attention Still Won

Alankrit Verma

Apr 27

Beating Eager TurboQuant Was Not Enough: Why Dense GPU Attention Still Won

#machinelearning #gpu #research #transformers

8 min read

Cover image for Chapter 7: The Training Loop and Adam Optimiser

Gary Jackson

Apr 26

Chapter 7: The Training Loop and Adam Optimiser

#csharp #machinelearning #transformers #tutorial

7 min read

Cover image for Chapter 6: Embeddings, the Forward Pass, and the Loss Function

Gary Jackson

Apr 25

Chapter 6: Embeddings, the Forward Pass, and the Loss Function

#csharp #machinelearning #transformers #tutorial

7 min read

Alain Airom (Ayrom)

Apr 30

Mamba vs. Transformers: Architecture Comparison

#mamba #transformers #llm #granite

5 min read

Cover image for Chapter 5: Linear Transformation and Softmax

Gary Jackson

Apr 24

Chapter 5: Linear Transformation and Softmax

#csharp #machinelearning #transformers #tutorial

4 min read

Cover image for Chapter 4: The Bigram Model - Simplest Possible Language Model

Gary Jackson

Apr 23

Chapter 4: The Bigram Model - Simplest Possible Language Model

#csharp #machinelearning #transformers #tutorial

5 min read

Cover image for Chapter 3: The Tokenizer - Text to Numbers and Back

Gary Jackson

Apr 22

Chapter 3: The Tokenizer - Text to Numbers and Back

#csharp #machinelearning #transformers #tutorial

2 min read

Cover image for Chapter 2: Backward - Automatic Gradient Computation

Gary Jackson

Apr 21

Chapter 2: Backward - Automatic Gradient Computation

#csharp #machinelearning #transformers #tutorial

7 min read

Cover image for Chapter 1: The Value Class - Recording the Forward Pass

Gary Jackson

Apr 21

Chapter 1: The Value Class - Recording the Forward Pass

#csharp #machinelearning #transformers #tutorial

10 min read

👋 Sign in for the ability to sort posts by relevant, latest, or top.

Forem

# transformers

What "Subquadratic Attention" Actually Means

MoE Architectures Keep Solving the Wrong Problem

Chapter 12: Inference - Generating New Text

Chapter 11: The Full GPT - Assembling the Model

Chapter 9: Single-Head Attention - Tokens Looking at Each Other

Chapter 8: RMS Normalisation and Residual Connections

Beating Eager TurboQuant Was Not Enough: Why Dense GPU Attention Still Won

Chapter 7: The Training Loop and Adam Optimiser

Chapter 6: Embeddings, the Forward Pass, and the Loss Function

Mamba vs. Transformers: Architecture Comparison

Chapter 5: Linear Transformation and Softmax

Chapter 4: The Bigram Model - Simplest Possible Language Model

Chapter 3: The Tokenizer - Text to Numbers and Back

Chapter 2: Backward - Automatic Gradient Computation

Chapter 1: The Value Class - Recording the Forward Pass