Kyle Sayers

Senior Machine Learning Engineer at Red Hat

Medford, Massachusetts, United States
email-iconphone-icongithub-logolinkedin-logotwitter-logostackoverflow-logofacebook-logo
Join Prog.AI to see contacts
email-iconphone-icongithub-logolinkedin-logotwitter-logostackoverflow-logofacebook-logo
Join Prog.AI to see contacts

Summary

🤩
Rockstar
🎓
Top School
Kyle Sayers is a Senior Machine Learning Engineer with five years of experience specializing in LLM inference, model compression, and performance engineering. At Red Hat he drives vLLM model optimization and LLM Compressor design—translating research (including GPTQ and Hadamard transforms) into production-ready algorithms and distributed systems. His open-source contributions to high-profile projects like vLLM, Hugging Face Transformers and Accelerate, and DeepSparse show deep expertise in quantization, offloading, and sparsity-aware inference for real-world deployments. Earlier work spans edge and robotics systems—leading development of a live drone mapping product and SLAM solutions that cut latency dramatically—illustrating a strong systems and C++ background alongside ML. He frequently shares learnings via developer blogs and community sessions and is recognized internally for engineering impact, blending research translation with pragmatic shipping. Notably, he combines low-level performance engineering with practical UX for democratizing large-model inference.
code5 years of coding experience
job7 years of employment as a software developer
bookBachelor's degree, Mathematics and Computer Science, Bachelor's degree, Mathematics and Computer Science at Tufts University
bookHigh School Diploma, High School Diploma at The Branson School
languagesChinese
stackoverflow-logo

Stackoverflow

Stats
33reputation
6kreached
0answers
1question
github-logo-circle

Github Skills (34)

transformers10
pytorch10
distributed-training10
python10
machine-learning10
inference10
onnx10
llm10
deep-learning10
model-optimization10
quantization10
transformer10
nlp10
system-configuration9
pytest9

Programming languages (3)

C++JavaScriptPython

Github contributions (5)

github-logo-circle
neuralmagic/deepsparse

May 2022 - Sep 2022

Sparsity-aware deep learning inference runtime for CPUs
Role in this project:
userML Engineer
Contributions:73 reviews, 85 commits, 73 PRs in 3 months
Contributions summary:Kyle primarily contributed to the development and maintenance of deep learning inference pipelines. Their commits focused on enhancing existing transformers pipelines, specifically for token classification and question answering. They also worked on information retrieval, implementing features such as model cutting, bi-encoder models, and integration with the Haystack framework, demonstrating expertise in model optimization and deployment.
llm-inferenceruntimetensorflowsparsificationmachinelearning
vllm-project/vllm

Aug 2024 - Apr 2025

A high-throughput and memory-efficient inference and serving engine for LLMs
Role in this project:
userML Engineer
Contributions:40 reviews, 29 PRs, 69 comments in 7 months
Contributions summary:Kyle primarily contributed to the development of quantization methods for large language models within the vLLM framework. Their work involved reusing and integrating code for compressed tensors, including modifications across multiple files, indicating a focus on optimizing model performance and memory efficiency. They also addressed activation ordering in GPTQ models and resolved issues related to MLA warnings, demonstrating contributions to model optimization and broader system compatibility.
amdcudadeepseekgpthpu
Find and Hire Top DevelopersWe’ve analyzed the programming source code of over 60 million software developers on GitHub and scored them by 50,000 skills. Sign-up on Prog,AI to search for software developers.
Request Free Trial
Kyle Sayers - Senior Machine Learning Engineer at Red Hat