Syed Ahmed - Lecturer (part Time) at NVIDIA

Syed Ahmed

Lecturer (part Time) at NVIDIA

California, United States

Join Prog.AI to see contacts

Summary

🤩

Rockstar

🎓

Top School

Syed Ahmed is a performance-focused software engineer and part-time lecturer based in California with a decade of experience optimizing deep learning frameworks. At NVIDIA he drives PyTorch performance and numerical accuracy on heterogeneous GPUs, contributing low-level CUDA memory management, NCCL communicator tuning, and builder/release automation for widely used PyTorch binaries. His work on the high-profile pytorch/pytorch and NVIDIA/apex repos shows deep expertise in CUDA kernels, memory pools, and mixed-precision training—skills that helped him become a module-level maintainer of the CUDA backend. He also teaches computer architecture to graduate students, blending research-driven methods from his PhD work in reconfigurable computing with production-grade systems engineering. Quietly, he pairs rigorous low-level optimization with release engineering, ensuring that research advances reliably translate into deployable GPU-accelerated software.

10 years of coding experience

4 years of employment as a software developer

International Baccalaureate, International Baccalaureate at Oaktree International School

Bachelor of Science (BS) Computer Engineering, Bachelor of Science (BS) Computer Engineering at Rochester Institute of Technology

Master of Science - MS Electrical Engineering, Master of Science - MS Electrical Engineering at University of Pennsylvania

Github Skills (32)

pytorch10

c-language10

performance-analytics10

performance-monitor10

python10

scripting10

memory-management10

machine-learning10

cicd10

performance-measurement10

release-management10

script10

deep-learning10

performance-analysis10

gpu10

Programming languages (11)

TypeScriptJavaC++ShellJavaScriptLuaHTMLJupyter Notebook

Github contributions (5)

NVIDIA/apex

Jun 2018 - Jul 2019

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch

Role in this project:

ML Engineer

Contributions:21 commits, 2 PRs, 20 pushes in 1 year 1 month

Contributions summary:Syed's commits primarily involve modifications to CUDA kernels and related C++ code within the context of a PyTorch extension for mixed precision training. These changes include reverting and modifying code in files related to layer normalization, and weight normalization. The user also addressed backward compatibility issues, and deprecated code refactoring, demonstrating expertise in optimizing and maintaining PyTorch-related CUDA code. These changes align with the repository's purpose of enhancing PyTorch with tools for efficient deep learning training.

pytorchraymixed-precisiondeep-learningtemporal-data

pytorch/pytorch

Jul 2018 - Jan 2023

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Role in this project:

Back-end Developer & Performance Engineer

Contributions:84 reviews, 153 commits, 107 PRs in 4 years 6 months

Contributions summary:Syed primarily contributed to low-level memory management and performance optimization within the PyTorch framework, specifically targeting the CUDA backend. Their work involved implementing and refining APIs for memory pool management, including user buffer registration with NCCL, which is crucial for NVLink Switch (NVLS) reductions. They refactored existing memory pool logic, added APIs for snapshotting pool state, and ensured proper memory release and ref-counting. Furthermore, the user also enhanced performance through their work on configuring and optimizing NCCL communicators.

pythongpu-accelerationdeep-learninggpunumpy

Find and Hire Top DevelopersWe’ve analyzed the programming source code of over 60 million software developers on GitHub and scored them by 50,000 skills. Sign-up on Prog,AI to search for software developers.

Request Free Trial