Simon Mo - CEO And Cofounder at vLLM

Simon Mo

CEO And Cofounder at vLLM

Berkeley, California, United States

Join Prog.AI to see contacts

Summary

🤩

Rockstar

🎓

Top School

Simon Mo is a software leader and founder with nine years of experience building high-throughput inference and serving systems for machine learning. As CEO and Cofounder of Inferact and a core maintainer of the popular open-source vLLM project, he focuses on making LLM inference faster and cheaper while driving production-ready tooling. His background spans building Ray Serve at Anyscale, production GPU efficiency work at Character.AI, and core contributions to Ray, Modin, and Clipper that improved distributed runtime, data IO, and low-latency prediction serving. He combines hands-on backend and DevOps expertise—containerization, Kubernetes, Prometheus metrics, and performance tuning—with research rigor from his PhD work at UC Berkeley. Based in Berkeley, he has a track record of turning prototype models into scalable deployments and even migrated linters and added observability hooks that materially improved maintainability in major open-source repos.

9 years of coding experience

7 years of employment as a software developer

Doctor of Philosophy - PhD Computer Science, Doctor of Philosophy - PhD Computer Science at University of California, Berkeley

Github Skills (41)

continuous-deployment10

kubernetes10

docker10

performance-monitor10

performance-analytics10

python10

testing10

dataframes10

pandas10

machine-learning10

dockers10

ml-deployment10

cicd10

performance-measurement10

numpy10

Programming languages (16)

C++CSSRustTeXGoHTMLJupyter NotebookCuda

Github contributions (5)

vllm-project/vllm

Oct 2023 - Apr 2025

A high-throughput and memory-efficient inference and serving engine for LLMs

Role in this project:

Backend & DevOps Engineer

Contributions:1021 reviews, 1290 PRs, 1119 pushes in 1 year 5 months

Contributions summary:Simon contributed to the project by fixing a bug related to sequence group duplication within the engine step, demonstrating a focus on core engine functionality. They also documented the official Docker image deployment process, which involved updating the documentation to include shared memory usage. Furthermore, the user migrated the linter from `pylint` to `ruff`, improving code quality and maintainability, along with adding production metrics in Prometheus format.

amdcudagpthpuinference

ucbrise/clipper

Oct 2017 - Jul 2020

A low-latency prediction-serving system

Role in this project:

DevOps & Backend Engineer

Contributions:1 release, 29 commits, 97 PRs in 2 years 9 months

Contributions summary:Simon primarily focused on improving the Clipper project's infrastructure and backend functionality. Their contributions include fixing broken links, addressing Docker and exception handling issues, and implementing a metrics monitoring system, including frontend exporters. They also made significant changes to the Docker container manager, and added Kubernetes support for multi tenancy.

pythonservingpredictiondeep-learninglatency

Find and Hire Top DevelopersWe’ve analyzed the programming source code of over 60 million software developers on GitHub and scored them by 50,000 skills. Sign-up on Prog,AI to search for software developers.

Request Free Trial