Instruction finetuning and RLHF lecture (NYU CSCI 2590) Published 2023-05-17 Download video MP4 360p Recommendations 19:17 Low-rank Adaption of Large Language Models: Explaining the Key Concepts Behind LoRA 49:07 Large Language Models (in 2023) 59:17 RLHF: How to Learn from Human Feedback with Reinforcement Learning 57:21 An observation on Generalization 13:12 Последние 6 десятилетий ИИ - и что будет дальше | Рэй Курцвейл | TED 57:24 Terence Tao at IMO 2024: AI and Mathematics 23:25 Звуковые иллюзии, которые работают на всех (почти) [Veritasium] 58:12 MIT Introduction to Deep Learning | 6.S191 20:18 Why Does Diffusion Work Better than Auto-Regression? 17:38 The moment we stopped understanding AI [AlexNet] 23:34 Why Democracy Is Mathematically Impossible 28:18 Fine-tuning Large Language Models (LLMs) | w/ Example Code 38:24 Proximal Policy Optimization (PPO) - How to train Large Language Models 40:55 PEFT LoRA Explained in Detail - Fine-Tune your LLM on your local GPU 10:01 AI, Machine Learning, Deep Learning and Generative AI Explained 26:55 ChatGPT: 30 Year History | How AI Learned to Talk 59:54 Compression for AGI - Jack Rae | Stanford MLSys #76 15:21 Prompt Engineering, RAG, and Fine-tuning: Benefits and When to Use Similar videos 1:01:53 LLM: Pretraining, Instruction fine-tuning and RLHF 13:17 Effective Instruction Tuning: Data & Methods 1:16:15 Stanford CS224N | 2023 | Lecture 10 - Prompting, Reinforcement Learning from Human Feedback 01:32 Fine-tuning vs. Instruction-tunning explained in under 2 minutes 59:35 Building with Instruction-Tuned LLMs: A Step-by-Step Guide 51:03 Reinforcement Learning Pretraining for Reinforcement Learning Finetuning 06:31 Reinforcement Learning: ChatGPT and RLHF 58:51 Building and Curating Datasets for RLHF and LLM Fine-tuning // Daniel Vila Suero // LLMs in Prod Con 12:38 Reinforcement Learning from Human Feedback (RLHF) 11:55 Meta LIMA Is Instruction Fine Tuning better than RLHF for LLM Alignment? 1:03:55 Towards Reliable Use of Large Language Models: Better Detection, Consistency, and Instruction-Tuning 00:40 Reinforcement Learning from Human Feedback 11:20 Effective Instruction Tuning: Results & Take-Aways 1:03:32 John Schulman - Reinforcement Learning from Human Feedback: Progress and Challenges 09:10 Direct Preference Optimization: Forget RLHF (PPO) More results