Wuwei Lin is a machine learning systems engineer with 10 years of experience building and optimizing ML compilers and LLM deployment engines, currently a Member of Technical Staff at OpenAI. He has deep open-source roots as an Apache TVM committer and PMC member, contributing performance-critical backend work for CPU, GPU and accelerator targets. His recent roles at OctoAI (now NVIDIA) and NVIDIA focused on ML compiler and LLM engine co-design, quantization, and faster transformer kernels—bridging research prototypes to production-grade inference. Contributions to projects like MLC-LLM and TVM show a knack for low-level numeric optimizations (e.g., int8 GEMM and type refinements) and careful test/doc hygiene. Trained at Carnegie Mellon, he combines academic rigor with hands-on systems engineering across cloud, compiler, and model deployment stacks. Colleagues know him for quietly turning subtle type and memory changes into measurable throughput gains.
10 years of coding experience
6 years of employment as a software developer
Master of Science Information Networking, Master of Science Information Networking at Carnegie Mellon University
Bachelor's degree Software Engineering, Bachelor's degree Software Engineering at Shanghai Jiao Tong University
Open deep learning compiler stack for cpu, gpu and specialized accelerators
Role in this project:
Back-end Developer & ML Engineer
Contributions:742 reviews, 200 commits, 678 PRs in 4 years 4 months
Contributions summary:Wuwei's commits show contributions focused on optimizing the TVM deep learning compiler stack, specifically for CPU, GPU, and specialized accelerators. They implemented performance optimizations by modifying the code to use integers instead of char types and added recipes for int8 GEMM. The user also fixed bugs and updated documentation and test cases for existing convolution models. These changes indicate a strong involvement in both the backend compilation process and integration of machine learning models.
Universal LLM Deployment Engine with ML Compilation
Role in this project:
ML Engineer
Contributions:30 reviews, 94 PRs, 25 pushes in 1 year 5 months
Contributions summary:Wuwei primarily contributed to the implementation and optimization of the MLC-LLM deployment engine. Their work involved fixing type hints, adding new quantization modes, and addressing issues related to the faster transformer kernel. They also worked on the model definition, specifically using SizeVar instead of Var for model definitions. Furthermore, they debugged and fixed issues regarding logit processors and cuda graphs.
language-modelllmmachine-learning-compilationtvm
Find and Hire Top DevelopersWe’ve analyzed the programming source code of over 60 million software developers on GitHub and scored them by 50,000 skills. Sign-up on Prog,AI to search for software developers.