Kyle Sayers is a Senior Machine Learning Engineer with five years of experience specializing in LLM inference, model compression, and performance engineering. At Red Hat he drives vLLM model optimization and LLM Compressor design—translating research (including GPTQ and Hadamard transforms) into production-ready algorithms and distributed systems. His open-source contributions to high-profile projects like vLLM, Hugging Face Transformers and Accelerate, and DeepSparse show deep expertise in quantization, offloading, and sparsity-aware inference for real-world deployments. Earlier work spans edge and robotics systems—leading development of a live drone mapping product and SLAM solutions that cut latency dramatically—illustrating a strong systems and C++ background alongside ML. He frequently shares learnings via developer blogs and community sessions and is recognized internally for engineering impact, blending research translation with pragmatic shipping. Notably, he combines low-level performance engineering with practical UX for democratizing large-model inference.
5 years of coding experience
7 years of employment as a software developer
Bachelor's degree, Mathematics and Computer Science, Bachelor's degree, Mathematics and Computer Science at Tufts University
High School Diploma, High School Diploma at The Branson School
Sparsity-aware deep learning inference runtime for CPUs
Role in this project:
ML Engineer
Contributions:73 reviews, 85 commits, 73 PRs in 3 months
Contributions summary:Kyle primarily contributed to the development and maintenance of deep learning inference pipelines. Their commits focused on enhancing existing transformers pipelines, specifically for token classification and question answering. They also worked on information retrieval, implementing features such as model cutting, bi-encoder models, and integration with the Haystack framework, demonstrating expertise in model optimization and deployment.
A high-throughput and memory-efficient inference and serving engine for LLMs
Role in this project:
ML Engineer
Contributions:40 reviews, 29 PRs, 69 comments in 7 months
Contributions summary:Kyle primarily contributed to the development of quantization methods for large language models within the vLLM framework. Their work involved reusing and integrating code for compressed tensors, including modifications across multiple files, indicating a focus on optimizing model performance and memory efficiency. They also addressed activation ordering in GPTQ models and resolved issues related to MLA warnings, demonstrating contributions to model optimization and broader system compatibility.
amdcudadeepseekgpthpu
Find and Hire Top DevelopersWe’ve analyzed the programming source code of over 60 million software developers on GitHub and scored them by 50,000 skills. Sign-up on Prog,AI to search for software developers.
Request Free Trial
Kyle Sayers - Senior Machine Learning Engineer at Red Hat