Proximal Policy Optimization (PPO) - How to train Large Language Models

Published 2024-01-24

Download video MP4 360p

Recommendations

36:26

A friendly introduction to deep reinforcement learning, Q-networks and policy gradients
22:43

How might LLMs store facts | Chapter 7, Deep Learning
36:16

The math behind Attention: Keys, Queries, and Values matrices
21:15

Direct Preference Optimization (DPO) - How to fine-tune LLMs directly without reinforcement learning
13:37

What are Transformer Models and How do they Work?
24:07

AI can't cross this line and we don't know why.
45:44

What is Q-Learning (back to basics)
19:50

An introduction to Policy Gradient methods - Deep Reinforcement Learning
21:02

The Attention Mechanism in Large Language Models
59:48

[1hr Talk] Intro to Large Language Models
17:38

The moment we stopped understanding AI [AlexNet]
40:08

The Most Important Algorithm in Machine Learning
44:59

Stable Diffusion - How to build amazing images with AI
57:33

MIT 6.S191: Reinforcement Learning
36:15

Transformer Neural Networks, ChatGPT's foundation, Clearly Explained!!!
08:55

Direct Preference Optimization: Your Language Model is Secretly a Reward Model | DPO paper explained
27:14

But what is a GPT? Visual intro to Transformers | Chapter 5, Deep Learning

Similar videos

1:02:47

Proximal Policy Optimization (PPO) is Easy With PyTorch | Full PPO Tutorial
17:50

Proximal Policy Optimization Explained
09:10

Direct Preference Optimization: Forget RLHF (PPO)
1:16:15

Stanford CS224N | 2023 | Lecture 10 - Prompting, Reinforcement Learning from Human Feedback
25:51

Part 1 of 3 — Proximal Policy Optimization Implementation: 11 Core Implementation Details
10:29

PPO for LLMs Fine Tuning
06:09

Ep 21. RLHF: Training language models to follow instructions with human feedback
05:54

Reinforced Self-Training (ReST) for Language Modeling (Paper Review)
13:43

How ChatGPT is Trained
1:00:38

Reinforcement Learning from Human Feedback: From Zero to chatGPT
29:04

Introduction to Proximal Policy Optimization algorithm (PPO)
18:37

ChatGPT explained: A Guide to Conversational AI w/ InstructGPT, PPO, Markov, RLHF
More results