公式動画ピックアップ
AAPL
ADBE
ADSK
AIG
AMGN
AMZN
BABA
BAC
BL
BOX
C
CHGG
CLDR
COKE
COUP
CRM
CROX
DDOG
DELL
DIS
DOCU
DOMO
ESTC
F
FIVN
GILD
GRUB
GS
GSK
H
HD
HON
HPE
HSBC
IBM
INST
INTC
INTU
IRBT
JCOM
JNJ
JPM
LLY
LMT
M
MA
MCD
MDB
MGM
MMM
MSFT
MSI
NCR
NEM
NEWR
NFLX
NKE
NOW
NTNX
NVDA
NYT
OKTA
ORCL
PD
PG
PLAN
PS
RHT
RNG
SAP
SBUX
SHOP
SMAR
SPLK
SQ
TDOC
TEAM
TSLA
TWOU
TWTR
TXN
UA
UAL
UL
UTX
V
VEEV
VZ
WDAY
WFC
WK
WMT
WORK
YELP
ZEN
ZM
ZS
ZUO
公式動画&関連する動画 [Lossless LLM inference acceleration with Speculators]
High latency is the primary bottleneck for delivering responsive, user-facing large language model (LLM) applications. How can you significantly accelerate LLM inference without sacrificing model accuracy?
Red Hat’s Mark Kurtz and Megan Flynn examine speculative decoding, a technique that uses a smaller, faster model—the "speculator"—to draft multiple tokens ahead of the main model, or the "verifier". The result is lossless inference acceleration, leading to faster, cheaper, and high-accuracy LLM deployments.
🔗Read more about Speculators: https://developers.redhat.com/articles/2025/11/19/speculators-standardized-production-ready-speculative-decoding
00:00 Introduction
00:45 The Latency Challenge in LLMs
03:57 What is Speculative Decoding?
16:04 User Case Flow with Speculators
17:22 Current Capabilities and Roadmap
18:26 Why EAGLE3? (A Leading Decoding Algorithm)
19:20 Pretrained Speculators, Ready to Deploy
19:58 One-Command Deployment Example
20:40 Measuring Speculator Effectiveness
22:38 What to Expect in Performance
24:09 Composing Speculative Decoding with Quantization
27:14 Creating and Adapting Your Own Speculators
29:13 Key Takeaways & Conclusion
#RedHat #AI #LLMinference #speculators
141
6