Ata Fatahi

San Francisco, California, United States

Join Prog.AI to see contacts

Summary

🤩

Rockstar

🎓

Top School

Ata Fatahi is a Member of Technical Staff and PhD candidate in computer science who specializes in distributed systems, cloud efficiency, and serverless architectures, with 11 years of industry experience across research and production. He has driven ML infrastructure and inference optimization at LinkedIn and now works on LLM/VLM inference at Microsoft AI, blending deep research rigor with production-grade engineering. His open-source contributions include backend and DevOps work on SGLang for serving large language and vision models and enhancements to Horovod’s distributed training APIs, reflecting a focus on scalable ML systems. Comfortable navigating GPU memory management, scheduler design, and deployment automation, he pairs hands-on systems coding in Python and Rust with experimental research—a combination that helps translate academic insights into practical, high-throughput AI services. Based in San Francisco, he brings a rare mix of academic depth and operational savvy to cloud-native ML infrastructure challenges.

11 years of coding experience

6 years of employment as a software developer

Doctor of Philosophy - PhD, Computer Science and Engineering, Doctor of Philosophy - PhD, Computer Science and Engineering at Penn State University

Bachelor of Science (BSc), Computer Engineering, Bachelor of Science (BSc), Computer Engineering at Sharif University of Technology

Persian, English, Korean

Stackoverflow

Stats

304reputation

20kreached

10answers

19questions

Github Skills (24)

pytorch10

distributed-training10

back-end-development10

python10

machine-learning10

keras10

tensorflow10

rust10

devops9

mlops9

mpi8

inference7

cuda7

android-studio6

android6

Programming languages (11)

TypeScriptJavaC++ShellCRustScalaGo

Github contributions (5)

horovod/horovod

Aug 2022 - Oct 2022

Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.

Role in this project:

ML Engineer

Contributions:8 reviews, 6 commits, 12 PRs in 2 months

Contributions summary:Ata primarily contributed to the Horovod distributed training framework, focusing on TensorFlow integration. Their contributions include adding new APIs and functionality for distributed gradient aggregation, specifically the `PartialDistributedGradientTape` and enhancements to the `DistributedOptimizer`. The commits also include adding support for local variables within these optimizers and callbacks. Several commits involve the creation of unit tests for testing and verifying the new functionality.

mpikeras-tensorflowtrainingbaidutensorflow

sgl-project/sglang

Oct 2024 - Feb 2025

SGLang is a fast serving framework for large language models and vision language models.

Role in this project:

Back-end & DevOps Engineer

Contributions:7 reviews, 10 PRs, 12 comments in 3 months

Contributions summary:Ata primarily contributed to the back-end infrastructure and server-side logic of the SGLang project, focusing on features such as request payload size configuration, GPU memory management, and versioning of the router package. They implemented changes to server arguments and launch configurations in Python and Rust, along with corresponding test suites. The user also worked on improving the deployment and operational aspects by including scripts for GPU memory cleanup and enabling NCCL NVLS configurations, demonstrating DevOps skills.

cudadeepseekdeepseek-llmdeepseek-r1deepseek-r1-zero

Find and Hire Top DevelopersWe’ve analyzed the programming source code of over 60 million software developers on GitHub and scored them by 50,000 skills. Sign-up on Prog,AI to search for software developers.

Request Free Trial