Ata Fatahi is a Member of Technical Staff and PhD candidate in computer science who specializes in distributed systems, cloud efficiency, and serverless architectures, with 11 years of industry experience across research and production. He has driven ML infrastructure and inference optimization at LinkedIn and now works on LLM/VLM inference at Microsoft AI, blending deep research rigor with production-grade engineering. His open-source contributions include backend and DevOps work on SGLang for serving large language and vision models and enhancements to Horovod’s distributed training APIs, reflecting a focus on scalable ML systems. Comfortable navigating GPU memory management, scheduler design, and deployment automation, he pairs hands-on systems coding in Python and Rust with experimental research—a combination that helps translate academic insights into practical, high-throughput AI services. Based in San Francisco, he brings a rare mix of academic depth and operational savvy to cloud-native ML infrastructure challenges.
11 years of coding experience
6 years of employment as a software developer
Doctor of Philosophy - PhD, Computer Science and Engineering, Doctor of Philosophy - PhD, Computer Science and Engineering at Penn State University
Bachelor of Science (BSc), Computer Engineering, Bachelor of Science (BSc), Computer Engineering at Sharif University of Technology
Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.
Role in this project:
ML Engineer
Contributions:8 reviews, 6 commits, 12 PRs in 2 months
Contributions summary:Ata primarily contributed to the Horovod distributed training framework, focusing on TensorFlow integration. Their contributions include adding new APIs and functionality for distributed gradient aggregation, specifically the `PartialDistributedGradientTape` and enhancements to the `DistributedOptimizer`. The commits also include adding support for local variables within these optimizers and callbacks. Several commits involve the creation of unit tests for testing and verifying the new functionality.
SGLang is a fast serving framework for large language models and vision language models.
Role in this project:
Back-end & DevOps Engineer
Contributions:7 reviews, 10 PRs, 12 comments in 3 months
Contributions summary:Ata primarily contributed to the back-end infrastructure and server-side logic of the SGLang project, focusing on features such as request payload size configuration, GPU memory management, and versioning of the router package. They implemented changes to server arguments and launch configurations in Python and Rust, along with corresponding test suites. The user also worked on improving the deployment and operational aspects by including scripts for GPU memory cleanup and enabling NCCL NVLS configurations, demonstrating DevOps skills.
Find and Hire Top DevelopersWe’ve analyzed the programming source code of over 60 million software developers on GitHub and scored them by 50,000 skills. Sign-up on Prog,AI to search for software developers.