Ke Wen

Principal Software Architect at NVIDIA

San Francisco Bay Area United States
email-iconphone-icongithub-logolinkedin-logotwitter-logostackoverflow-logofacebook-logo
Join Prog.AI to see contacts
email-iconphone-icongithub-logolinkedin-logotwitter-logostackoverflow-logofacebook-logo
Join Prog.AI to see contacts

Summary

🤩
Rockstar
🎓
Top School
Ke Wen is a Principal Software Architect based in the San Francisco Bay Area with seven years of focused experience in deep learning and high-performance computing communications. He has driven core improvements to NVIDIA’s NCCL and PyTorch distributed backends, optimizing multi-GPU collective communication, error handling, and support for new numeric types like FP8. Previously at Meta he worked on advanced distributed ML features—symmetric memory, irregular collectives, compute-comm fusion, and sync-free MoE—bridging research ideas into production-grade systems. Ke combines a PhD-level background in electrical engineering with hands-on performance engineering, often implementing low-level C++ and transport-layer optimizations that are invisible to users but critical at scale. Notably, his contributions to world‑leading open-source projects such as NVIDIA/nccl and pytorch/pytorch reflect both deep systems expertise and a commitment to improving distributed ML reliability and throughput.
code7 years of coding experience
job9 years of employment as a software developer
bookMaster's degree Electrical and Computer Engineering, Master's degree Electrical and Computer Engineering at University of California, Davis
bookDoctor of Philosophy (Ph.D.) Electrical Engineering, Doctor of Philosophy (Ph.D.) Electrical Engineering at Columbia University
languagesEnglish, Chinese
github-logo-circle

Github Skills (14)

ccl10
cuda10
pytorch10
socket10
distributed-systems10
c-language10
hpc10
cprogramming-language10
parallel-computing10
performance-optimization10
networking10
multithreading10
python9
machine-learning9

Programming languages (6)

TypeScriptC++CJupyter NotebookPythonCuda

Github contributions (5)

github-logo-circle
NVIDIA/nccl

Nov 2018 - Mar 2022

Optimized primitives for collective multi-GPU communication
Role in this project:
userBack-end & Performance Engineer
Contributions:1 review, 15 commits, 7 PRs in 3 years 4 months
Contributions summary:Ke contributed significantly to optimizing the NCCL library, focusing on improving the performance of socket and IB transport layers. They implemented improvements to socket transport by splitting transfers and optimizing thread usage. The user also addressed several bug fixes related to IB devices and collective communication, ensuring correct behavior and preventing errors. Furthermore, they introduced improvements to the tree algorithm within NCCL, enhancing its overall efficiency.
cudampiinfinibandgpucluster-computing
pytorch/pytorch

Feb 2022 - Oct 2022

Tensors and Dynamic neural networks in Python with strong GPU acceleration
Role in this project:
userBack-end Developer & ML Engineer
Contributions:882 reviews, 61 commits, 238 PRs in 7 months
Contributions summary:Ke's contributions primarily involve modifications to the PyTorch distributed library, specifically related to the NCCL backend. Their work centers on enhancing the error handling, performance, and stability of distributed operations, including improvements to collective communication primitives and the introduction of new features such as adding a new `ErrorHandlingMode` for communication error handling. Furthermore, the commits reveal a focus on optimizing the performance of existing functions and adding features like the support of FP8 types. This involved modifying core C++ code to interact with NCCL libraries.
pythongpu-accelerationdeep-learninggpunumpy
Find and Hire Top DevelopersWe’ve analyzed the programming source code of over 60 million software developers on GitHub and scored them by 50,000 skills. Sign-up on Prog,AI to search for software developers.
Request Free Trial
Ke Wen - Principal Software Architect at NVIDIA