Kshitij Lakhani

Performance Engineer Deep Learning at NVIDIA

Fremont, California, United States
email-iconphone-icongithub-logolinkedin-logotwitter-logostackoverflow-logofacebook-logo
Join Prog.AI to see contacts
email-iconphone-icongithub-logolinkedin-logotwitter-logostackoverflow-logofacebook-logo
Join Prog.AI to see contacts

Summary

🤩
Rockstar
🎓
Top School
Kshitij Lakhani is a performance-focused GPU and deep learning engineer with 7+ years of experience accelerating real-world workloads across genomics, medical imaging, and embedded systems. He combines low-level expertise in C/C++, CUDA, GPUDirect RDMA and FPGA integration with applied ML experience using TensorFlow/Keras, delivering deterministic, high-throughput data paths for heterogeneous systems. At NVIDIA and previously Roche and Exo he has shipped production-grade GPU kernels and system-level monitoring to reduce latency and memory bandwidth pressure in complex pipelines. His academic work at UC Davis—implementing direct FPGA-to-GPU DMA—highlights a practical bent for squeezing efficiency from hardware stacks while preserving data integrity. Outside of engineering he’s a competitive soccer player and outdoors enthusiast, a detail that mirrors his collaborative, results-driven approach.
code7 years of coding experience
job5 years of employment as a software developer
bookMaster's degree Electrical & Computer Engineering, Master's degree Electrical & Computer Engineering at University of California, Davis
bookBachelor of Engineering (B.E.) Electronics and Telecommunication , Bachelor of Engineering (B.E.) Electronics and Telecommunication at Savitribai Phule Pune University
languagesEnglish, Hindi, Marathi, Gujarati
stackoverflow-logo

Stackoverflow

Stats
51reputation
8kreached
1answer
0questions
github-logo-circle

Github Skills (34)

pytorch10
python10
machine-learning10
transformer-models10
nvidia10
deep-learning10
gpu10
floating10
cuda10
floating-point10
8-bit10
hopper10
bit10
utilization10
inference9

Programming languages (4)

TypeScriptC++PythonCuda

Github contributions (5)

github-logo-circle
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
Contributions:54 pushes, 5 branches in 3 months
Contributions:2 commits, 1 push, 1 branch in 2 days
Find and Hire Top DevelopersWe’ve analyzed the programming source code of over 60 million software developers on GitHub and scored them by 50,000 skills. Sign-up on Prog,AI to search for software developers.
Request Free Trial
Kshitij Lakhani - Performance Engineer Deep Learning at NVIDIA