Coding LLaMA 2 from scratch in PyTorch - KV Cache, Grouped Query Attention, Rotary PE, RMSNorm

Published 2023-09-02

Download video MP4 360p

Recommendations

26:55

LoRA: Low-Rank Adaptation of Large Language Models - Explained visually + PyTorch code from scratch
1:10:55

LLaMA explained: KV-Cache, Rotary Positional Embedding, RMS Norm, Grouped Query Attention, SwiGLU
50:55

Quantization explained with PyTorch - Post-Training Quantization, Quantization-Aware Training
36:16

The math behind Attention: Keys, Queries, and Values matrices
16:17

A* Search: How Your Map Applications Find Shortest Routes
1:26:21

Mistral / Mixtral Explained: Sliding Window Attention, Sparse Mixture of Experts, Rolling Buffer
08:33

The KV Cache: Memory Usage in Transformers
58:04

Attention is all you need (Transformer) - Model explanation (including math), Inference and Training
14:06

RoPE (Rotary positional embeddings) explained: The positional workhorse of modern LLMs
17:38

The moment we stopped understanding AI [AlexNet]
1:09:42

The Mystery of Spinors
1:14:29

Mamba and S4 Explained: Architecture, Parallel Scan, Kernel Fusion, Recurrent, Convolution, Math
58:58

FlashAttention - Tri Dao | Stanford MLSys #67
1:56:20

Let's build GPT: from scratch, in code, spelled out.
49:53

How a Transformer works at inference vs training time
12:59

The Boundary of Computation
54:52

BERT explained: Training, Inference, BERT vs GPT/LLamA, Fine tuning, [CLS] token
2:59:24

Coding a Transformer from scratch on PyTorch, with full explanation, training and inference.
34:48

The Unreasonable Effectiveness of JPEG: A Signal Processing Approach

Similar videos

31:58

How I Learned PyTorch Multiprocessing with ChatGPT (and built a Llama 2 API)
01:21

Transformer Architecture: Fast Attention, Rotary Positional Embeddings, and Multi-Query Attention
5:03:32

Coding Stable Diffusion from scratch in PyTorch
08:13

Variants of Multi-head attention: Multi-query (MQA) and Grouped-query attention (GQA)
33:33

Getting to Know Llama 2: Everything You Need to Start Building
11:44

Llama - EXPLAINED!
39:10

Mistral Architecture Explained From Scratch with Sliding Window Attention, KV Caching Explanation
More results