Samyam Rajbhandari - Principal Architect at Snowflake

Samyam Rajbhandari

Principal Architect at Snowflake

Redmond, Washington, United States

Join Prog.AI to see contacts

Summary

🤩

Rockstar

🎓

Top School

Samyam Rajbhandari is a Principal Architect in Redmond with 11 years of experience building and scaling AI systems, currently leading inference efforts at Snowflake after a multi-year tenure as a Principal Architect at Microsoft. He pairs deep academic roots—a PhD from The Ohio State University where he developed communication‑optimal tensor contraction algorithms and CUDA implementations—with hands‑on production engineering. An active contributor to DeepSpeed and DeepSpeed‑MII, he added low‑level CUDA kernels (including fused LAMB), ZeRO memory and checkpointing improvements, and features that enable low‑latency, tensor‑parallel inference. His work spans register‑tiling and fusion techniques for tensor kernels through to MLOps integrations like gRPC asynchronous serving and Azure ML model versioning. That blend of research-grade performance optimization and practical deployment know‑how helps him translate cutting‑edge model advances into reliable, scalable inference services.

11 years of coding experience

11 years of employment as a software developer

Doctor of Philosophy (PhD) Computer Science and Engineering, Doctor of Philosophy (PhD) Computer Science and Engineering at The Ohio State University

English, Nepali, Hindi

Github Skills (29)

algorithm10

optimizations10

pytorch10

deploying10

python10

optimizers10

machine-learning10

inference10

azure-machine-learning10

azuremachinelearning10

mlops10

deepspeed10

deep-learning10

optimisation10

cuda10

Programming languages (3)

C++Jupyter NotebookPython

Github contributions (5)

deepspeedai/DeepSpeed-MII

Mar 2022 - Nov 2022

MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.

Role in this project:

MLOps Engineer

Contributions:1 review, 35 commits, 3 PRs in 7 months

Contributions summary:Samyam primarily contributed to the deployment and operationalization of machine learning models within the DeepSpeed-MII framework. Their work involved modifying the `mii/server_client.py` and associated files to integrate gRPC for model serving, enabling asynchronous requests and tensor parallelism. They also implemented features for registering and managing models within an Azure Machine Learning (AML) environment, incorporating model versioning, and supporting different deployment configurations like local and AML-on-AKS. Furthermore, they introduced the ability to enable or disable DeepSpeed optimizations during deployment and implemented features for parallelism configuration.

pytorchdeepspeeddeep-learninginferencelatency

deepspeedai/DeepSpeed

Feb 2020 - Dec 2022

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Role in this project:

ML Engineer

Contributions:78 reviews, 74 commits, 39 PRs in 2 years 11 months

Contributions summary:Samyam made several contributions focused on optimizing and extending the DeepSpeed library, specifically related to deep learning optimization. They added new CUDA kernels for fused Lamb optimization, indicating involvement in improving training performance. The user also worked on features for ZeRO optimization (stages 2 and 3), including memory management and checkpointing improvements, which are crucial for training large models. Further contributions include debugging and performance enhancements for allreduce operations and gradient accumulation.

billion-parametersfinetuningtrainingmixture-of-expertszero

Find and Hire Top DevelopersWe’ve analyzed the programming source code of over 60 million software developers on GitHub and scored them by 50,000 skills. Sign-up on Prog,AI to search for software developers.

Request Free Trial