About
Experience
Projects
Blog

Categories

All (1)

benchmarking (1)

deep-learning (1)

inference (1)

llm (1)

moe (1)

ray-serve (1)

vllm (1)

Blog

Building a Production-Grade LLM Inference Stack: Benchmarking vLLM, Ray Serve, and MoE Models

llm

inference

vllm

ray-serve

moe

benchmarking

deep-learning

These are our notes from benchmarking a self-hosted LLM inference stack. The goal was practical: figure out which combination of model, serving framework, and infrastructure…

Vinay Pandya, Yiran Xu

No matching items

© 2025 Vinay Pandya