Deep Dive: Optimizing LLM inference

Published 2024-03-11

Download video MP4 360p

Recommendations

10:30

Why Neural Networks can learn (almost) anything
3:55:36

Hands-on Workshop on Science DMZ and P4-DPDK (Workshop August 8, day 2, part 1)
30:25

Exploring the Latency/Throughput & Cost Space for LLM Inference // Timothée Lacroix // CTO Mistral
12:46

Speculative Decoding: When Two LLMs are Faster than One
32:07

Fast LLM Serving with vLLM and PagedAttention
13:11

ML Was Hard Until I Learned These 5 Secrets!
25:14

Efficiently Scaling and Deploying LLMs // Hanlin Tang // LLM's in Production Conference
24:02

"I want Llama3 to perform 10x with my private knowledge" - Local Agentic RAG w/ llama3
26:10

Attention in transformers, visually explained | Chapter 6, Deep Learning
10:30

Why The Sun is Bigger Than You Think
1:31:13

A Hackers' Guide to Language Models
45:32

A Survey of Techniques for Maximizing LLM Performance
19:09

Just Happened! Elon Musk LEAKED BIG COPY Tesla Bot Gen 2 Optimus - Figure AI 02, Who Wins?
24:20

host ALL your AI locally
49:53

How a Transformer works at inference vs training time

Similar videos

05:34

How Large language Models Work
18:52

TensorRT for Beginners: A Tutorial on Deep Learning Inference Optimization
1:01:45

PyTorch 2.0 Q&A: Optimizing Transformers for Inference
38:25

AI Hardware: Training, Inference, Devices and Model Optimization
59:48

[1hr Talk] Intro to Large Language Models
07:54

How ChatGPT Works Technically | ChatGPT Architecture
58:41

8-bit Methods for Efficient Deep Learning with Tim Dettmers
22:28

PowerInfer: 11x Faster than Llama.cpp for LLM Inference 🔥
03:22

Vector databases are so hot right now. WTF are they?
08:38

Transformers: The best idea in AI | Andrej Karpathy and Lex Fridman
05:01

All Machine Learning Models Explained in 5 Minutes | Types of ML Models Basics
More results