Vinay Pandya
About
Experience
Projects
Blog
Categories
All
(1)
benchmarking
(1)
deep-learning
(1)
inference
(1)
llm
(1)
moe
(1)
ray-serve
(1)
vllm
(1)
Blog
Building a Production-Grade LLM Inference Stack: Benchmarking vLLM, Ray Serve, and MoE Models
llm
inference
vllm
ray-serve
moe
benchmarking
deep-learning
These are our notes from benchmarking a self-hosted LLM inference stack. The goal was practical: figure out which combination of model, serving framework, and infrastructure…
Apr 25, 2026
Vinay Pandya, Yiran Xu
No matching items