Proximal Policy Optimization (PPO) - How to train Large Language Models Published 2024-01-24 Download video MP4 360p Recommendations 36:26 A friendly introduction to deep reinforcement learning, Q-networks and policy gradients 22:43 How might LLMs store facts | Chapter 7, Deep Learning 36:16 The math behind Attention: Keys, Queries, and Values matrices 21:15 Direct Preference Optimization (DPO) - How to fine-tune LLMs directly without reinforcement learning 13:37 What are Transformer Models and How do they Work? 24:07 AI can't cross this line and we don't know why. 45:44 What is Q-Learning (back to basics) 19:50 An introduction to Policy Gradient methods - Deep Reinforcement Learning 21:02 The Attention Mechanism in Large Language Models 59:48 [1hr Talk] Intro to Large Language Models 17:38 The moment we stopped understanding AI [AlexNet] 40:08 The Most Important Algorithm in Machine Learning 44:59 Stable Diffusion - How to build amazing images with AI 57:33 MIT 6.S191: Reinforcement Learning 36:15 Transformer Neural Networks, ChatGPT's foundation, Clearly Explained!!! 08:55 Direct Preference Optimization: Your Language Model is Secretly a Reward Model | DPO paper explained 27:14 But what is a GPT? Visual intro to Transformers | Chapter 5, Deep Learning Similar videos 1:02:47 Proximal Policy Optimization (PPO) is Easy With PyTorch | Full PPO Tutorial 17:50 Proximal Policy Optimization Explained 09:10 Direct Preference Optimization: Forget RLHF (PPO) 1:16:15 Stanford CS224N | 2023 | Lecture 10 - Prompting, Reinforcement Learning from Human Feedback 25:51 Part 1 of 3 — Proximal Policy Optimization Implementation: 11 Core Implementation Details 10:29 PPO for LLMs Fine Tuning 06:09 Ep 21. RLHF: Training language models to follow instructions with human feedback 05:54 Reinforced Self-Training (ReST) for Language Modeling (Paper Review) 13:43 How ChatGPT is Trained 1:00:38 Reinforcement Learning from Human Feedback: From Zero to chatGPT 29:04 Introduction to Proximal Policy Optimization algorithm (PPO) 18:37 ChatGPT explained: A Guide to Conversational AI w/ InstructGPT, PPO, Markov, RLHF More results