Enabling Cost-Efficient LLM Serving with Ray Serve

Published 2023-10-12

Download video MP4 360p

Recommendations

24:59

Serving Large Language Models with KubeRay on TPUs
32:07

Fast LLM Serving with vLLM and PagedAttention
32:36

Perplexity AI: How We Built the World's Best LLM-Powered Search Engine in 6 Months, w/ Less Than $4M
12:25

How Paris Pulled Off One Of The Cheapest Olympics
29:11

Developing and Serving RAG-Based LLM Applications in Production
45:32

A Survey of Techniques for Maximizing LLM Performance
28:57

Lessons From Fine-Tuning Llama-2
30:25

Exploring the Latency/Throughput & Cost Space for LLM Inference // Timothée Lacroix // CTO Mistral
32:49

From Spark to Ray: An Exabyte-Scale Production Migration Case Study
17:57

Generative AI in a Nutshell - how to survive and thrive in the age of AI
06:36

What is Retrieval-Augmented Generation (RAG)?
36:15

How Spotify Built a Robust Ray Platform with a Frictionless Developer Experience
30:42

Modernizing DoorDash Model Serving Platform with Ray Serve
51:49

Analyzing the Costs of Large Language Models in Production
19:21

SpaceX Finally Finishes Testing! Starship Flight 5 Next!
32:06

Ray Train: A Production-Ready Library for Distributed Deep Learning

Similar videos

13:33

Introducing Ray Aviary | 🦜🔍 Open Source Multi-LLM Serving
11:53

Go Production: ⚡️ Super FAST LLM (API) Serving with vLLM !!!
25:14

Efficiently Scaling and Deploying LLMs // Hanlin Tang // LLM's in Production Conference
29:54

Operationalizing Ray Serve on Kubernetes
25:00

KubeRay: A Ray cluster management solution on Kubernetes
More results