Samyam Rajbhandari is a Principal Architect in Redmond with 11 years of experience building and scaling AI systems, currently leading inference efforts at Snowflake after a multi-year tenure as a Principal Architect at Microsoft. He pairs deep academic roots—a PhD from The Ohio State University where he developed communication‑optimal tensor contraction algorithms and CUDA implementations—with hands‑on production engineering. An active contributor to DeepSpeed and DeepSpeed‑MII, he added low‑level CUDA kernels (including fused LAMB), ZeRO memory and checkpointing improvements, and features that enable low‑latency, tensor‑parallel inference. His work spans register‑tiling and fusion techniques for tensor kernels through to MLOps integrations like gRPC asynchronous serving and Azure ML model versioning. That blend of research-grade performance optimization and practical deployment know‑how helps him translate cutting‑edge model advances into reliable, scalable inference services.
11 years of coding experience
11 years of employment as a software developer
Doctor of Philosophy (PhD) Computer Science and Engineering, Doctor of Philosophy (PhD) Computer Science and Engineering at The Ohio State University
MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
Role in this project:
MLOps Engineer
Contributions:1 review, 35 commits, 3 PRs in 7 months
Contributions summary:Samyam primarily contributed to the deployment and operationalization of machine learning models within the DeepSpeed-MII framework. Their work involved modifying the `mii/server_client.py` and associated files to integrate gRPC for model serving, enabling asynchronous requests and tensor parallelism. They also implemented features for registering and managing models within an Azure Machine Learning (AML) environment, incorporating model versioning, and supporting different deployment configurations like local and AML-on-AKS. Furthermore, they introduced the ability to enable or disable DeepSpeed optimizations during deployment and implemented features for parallelism configuration.
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Role in this project:
ML Engineer
Contributions:78 reviews, 74 commits, 39 PRs in 2 years 11 months
Contributions summary:Samyam made several contributions focused on optimizing and extending the DeepSpeed library, specifically related to deep learning optimization. They added new CUDA kernels for fused Lamb optimization, indicating involvement in improving training performance. The user also worked on features for ZeRO optimization (stages 2 and 3), including memory management and checkpointing improvements, which are crucial for training large models. Further contributions include debugging and performance enhancements for allreduce operations and gradient accumulation.
Find and Hire Top DevelopersWe’ve analyzed the programming source code of over 60 million software developers on GitHub and scored them by 50,000 skills. Sign-up on Prog,AI to search for software developers.