Kyle Sayers - Senior Machine Learning Engineer at Red Hat

Kyle Sayers

Senior Machine Learning Engineer at Red Hat

Medford, Massachusetts, United States

Join Prog.AI to see contacts

Summary

🤩

Rockstar

🎓

Top School

Kyle Sayers is a Senior Machine Learning Engineer with five years of experience specializing in LLM inference, model compression, and performance engineering. At Red Hat he drives vLLM model optimization and LLM Compressor design—translating research (including GPTQ and Hadamard transforms) into production-ready algorithms and distributed systems. His open-source contributions to high-profile projects like vLLM, Hugging Face Transformers and Accelerate, and DeepSparse show deep expertise in quantization, offloading, and sparsity-aware inference for real-world deployments. Earlier work spans edge and robotics systems—leading development of a live drone mapping product and SLAM solutions that cut latency dramatically—illustrating a strong systems and C++ background alongside ML. He frequently shares learnings via developer blogs and community sessions and is recognized internally for engineering impact, blending research translation with pragmatic shipping. Notably, he combines low-level performance engineering with practical UX for democratizing large-model inference.

5 years of coding experience

7 years of employment as a software developer

Bachelor's degree, Mathematics and Computer Science, Bachelor's degree, Mathematics and Computer Science at Tufts University

High School Diploma, High School Diploma at The Branson School

Chinese

Stackoverflow

Stats

33reputation

6kreached

0answers

1question

Github Skills (34)

transformers10

pytorch10

distributed-training10

python10

machine-learning10

inference10

onnx10

llm10

deep-learning10

model-optimization10

quantization10

transformer10

nlp10

system-configuration9

pytest9

Programming languages (3)

C++JavaScriptPython

Github contributions (5)

neuralmagic/deepsparse

May 2022 - Sep 2022

Sparsity-aware deep learning inference runtime for CPUs

Role in this project:

ML Engineer

Contributions:73 reviews, 85 commits, 73 PRs in 3 months

Contributions summary:Kyle primarily contributed to the development and maintenance of deep learning inference pipelines. Their commits focused on enhancing existing transformers pipelines, specifically for token classification and question answering. They also worked on information retrieval, implementing features such as model cutting, bi-encoder models, and integration with the Haystack framework, demonstrating expertise in model optimization and deployment.

llm-inferenceruntimetensorflowsparsificationmachinelearning

vllm-project/vllm

Aug 2024 - Apr 2025

A high-throughput and memory-efficient inference and serving engine for LLMs

Role in this project:

ML Engineer

Contributions:40 reviews, 29 PRs, 69 comments in 7 months

Contributions summary:Kyle primarily contributed to the development of quantization methods for large language models within the vLLM framework. Their work involved reusing and integrating code for compressed tensors, including modifications across multiple files, indicating a focus on optimizing model performance and memory efficiency. They also addressed activation ordering in GPTQ models and resolved issues related to MLA warnings, demonstrating contributions to model optimization and broader system compatibility.

amdcudadeepseekgpthpu

Find and Hire Top DevelopersWe’ve analyzed the programming source code of over 60 million software developers on GitHub and scored them by 50,000 skills. Sign-up on Prog,AI to search for software developers.

Request Free Trial