Aligning LLMs with Direct Preference Optimization

Published 2024-02-08

Download video MP4 360p

Recommendations

55:54

Александр Голубев - Воркшоп по LLM + RLHF
49:07

[Webinar] LLMs for Evaluating LLMs
1:03:55

Towards Reliable Use of Large Language Models: Better Detection, Consistency, and Instruction-Tuning
29:38

The Future Of AI Agents With Dharmesh Shah | INBOUND 2024
28:18

Fine-tuning Large Language Models (LLMs) | w/ Example Code
21:15

Direct Preference Optimization (DPO) - How to fine-tune LLMs directly without reinforcement learning
45:32

A Survey of Techniques for Maximizing LLM Performance
17:52

Everything you need to know about Fine-tuning and Merging LLMs: Maxime Labonne
08:55

Direct Preference Optimization: Your Language Model is Secretly a Reward Model | DPO paper explained
30:28

Enabling Cost-Efficient LLM Serving with Ray Serve
1:19:39

Build conversational AI experiences powered by LLMs with Vertex AI Conversation and Dialogflow CX
1:44:31

Stanford CS229 I Machine Learning I Building Large Language Models (LLMs)
1:53:43

ICML 2024 Tutorial: Physics of Language Models
1:00:38

Reinforcement Learning from Human Feedback: From Zero to chatGPT
59:48

[1hr Talk] Intro to Large Language Models
2:45:10

Building LLMs from the Ground Up: A 3-hour Coding Workshop
20:18

Why Does Diffusion Work Better than Auto-Regression?

Similar videos

09:10

Direct Preference Optimization: Forget RLHF (PPO)
36:25

Direct Preference Optimization (DPO): Your Language Model is Secretly a Reward Model Explained
42:49

Direct Preference Optimization (DPO)
08:00

Direct Preference Optimization (DPO): A low cost alternative to train LLM models
1:16:21

Stanford CS25: V4 I Aligning Open Language Models
07:44

Why reward models are still key to understanding LLM alignment
More results