Building The Next Large Model: trlX: A Framework for Open-Source RLHF

Published 2023-04-10

Download video MP4 360p

Recommendations

13:43

How ChatGPT is Trained
1:03:32

John Schulman - Reinforcement Learning from Human Feedback: Progress and Challenges
09:10

Direct Preference Optimization: Forget RLHF (PPO)
1:01:01

Mastering RLHF with AWS: A Hands-on Workshop on Reinforcement Learning from Human Feedback
53:51

The Race to Open Source ChatGPT
31:47

Building The Next Large Model: DeepFloyd LLM + Text-to-Image = IF (Stability AI)
07:54

How ChatGPT Works Technically | ChatGPT Architecture
1:10:19

Training and Fine-tuning LLMs: Live Session 1
05:30

What are Large Language Models (LLMs)?
1:02:38

AI Safety, RLHF, and Self-Supervision - Jared Kaplan | Stanford MLSys #79
12:38

Reinforcement Learning from Human Feedback (RLHF)
2:14:29

How ChatGPT works - From Transformers to Reinforcement Learning with Human Feedback (RLHF)
36:25

Direct Preference Optimization (DPO): Your Language Model is Secretly a Reward Model Explained
38:38

25+ Paid Open Source Programs and Internships
1:18:36

Instruction finetuning and RLHF lecture (NYU CSCI 2590)
47:40

Deep Papers Episode 1 - ChatGPT and InstructGPT: Aligning Language Models to Human Intention
1:00:38

Reinforcement Learning from Human Feedback: From Zero to chatGPT
14:40

How to Get Started with Contributing to Open Source
47:16

Nathan Lambert - Reinforcement Learning from Human Feedback @ UCL DARK

Similar videos

18:34

オープンソースRLHFのためのフレームワーク、trlXで、次世代の巨大モデルを構築
46:40

Coding chatGPT from Scratch | Lecture 2/5: PPO Implementation
58:51

Building and Curating Datasets for RLHF and LLM Fine-tuning // Daniel Vila Suero // LLMs in Prod Con
51:09

Getting Started with Reinforcement Learning with Human Feedback | Workshop Recap
03:42

19/06/2023 | TRLX | Prime Sale
57:57

RLHF: Reinforcement Learning with Once-per-Episode Feedback
1:18:37

Generative AI: PEFT and RLHF workflows + Polars for blazing-fast dataframes in Ray and beyond
1:00:38

Reinforcement Learning from Human Feedback From Zero to ChatGPT [Record of the live]
2:32:04

[WORKSHOP] Generative AI and Large Language Models: Fine-tuning with SageMaker + PEFT + RLHF + PPO
08:47

NEW Bombshell AI Takes Industry By STORM: 13,000,000,000 Parameters + Ubisoft Gaming AI
03:34

What is Reinforcement Learning with Human Feedback (RLHF) ?
29:05

Improving Machine Learning From Human Feedback | PyData Berlin 2023
More results