Coding LLaMA 2 from scratch in PyTorch - KV Cache, Grouped Query Attention, Rotary PE, RMSNorm Published 2023-09-02 Download video MP4 360p Recommendations 26:55 LoRA: Low-Rank Adaptation of Large Language Models - Explained visually + PyTorch code from scratch 1:10:55 LLaMA explained: KV-Cache, Rotary Positional Embedding, RMS Norm, Grouped Query Attention, SwiGLU 50:55 Quantization explained with PyTorch - Post-Training Quantization, Quantization-Aware Training 36:16 The math behind Attention: Keys, Queries, and Values matrices 16:17 A* Search: How Your Map Applications Find Shortest Routes 1:26:21 Mistral / Mixtral Explained: Sliding Window Attention, Sparse Mixture of Experts, Rolling Buffer 08:33 The KV Cache: Memory Usage in Transformers 58:04 Attention is all you need (Transformer) - Model explanation (including math), Inference and Training 14:06 RoPE (Rotary positional embeddings) explained: The positional workhorse of modern LLMs 17:38 The moment we stopped understanding AI [AlexNet] 1:09:42 The Mystery of Spinors 1:14:29 Mamba and S4 Explained: Architecture, Parallel Scan, Kernel Fusion, Recurrent, Convolution, Math 58:58 FlashAttention - Tri Dao | Stanford MLSys #67 1:56:20 Let's build GPT: from scratch, in code, spelled out. 49:53 How a Transformer works at inference vs training time 12:59 The Boundary of Computation 54:52 BERT explained: Training, Inference, BERT vs GPT/LLamA, Fine tuning, [CLS] token 2:59:24 Coding a Transformer from scratch on PyTorch, with full explanation, training and inference. 34:48 The Unreasonable Effectiveness of JPEG: A Signal Processing Approach Similar videos 31:58 How I Learned PyTorch Multiprocessing with ChatGPT (and built a Llama 2 API) 01:21 Transformer Architecture: Fast Attention, Rotary Positional Embeddings, and Multi-Query Attention 5:03:32 Coding Stable Diffusion from scratch in PyTorch 08:13 Variants of Multi-head attention: Multi-query (MQA) and Grouped-query attention (GQA) 33:33 Getting to Know Llama 2: Everything You Need to Start Building 11:44 Llama - EXPLAINED! 39:10 Mistral Architecture Explained From Scratch with Sliding Window Attention, KV Caching Explanation More results