Fei Kou is a Member of Technical Staff at Anthropic in the San Francisco Bay Area with eight years of engineering experience spanning finance technology and GPU inference optimization. He spent an extended tenure at Facebook driving GPU inference enablement and performance, and now applies that low-level, performance-first mindset to production LLM infrastructure. Early roles in reference-data and banking technology at Nomura and JPMorgan gave him a discipline for data integrity and operational reliability that informs his systems design. Fei holds a BS in Computer Science from SUNY Stony Brook and is adept at turning GPU and systems expertise into scalable, production-ready inference pipelines.
8 years of coding experience
13 years of employment as a software developer
High School
Bachelor of Science (BS), Computer Science, Bachelor of Science (BS), Computer Science at State University of New York at Stony Brook
AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.
Contributions:6 pushes, 1 branch in 1 day
Find and Hire Top DevelopersWe’ve analyzed the programming source code of over 60 million software developers on GitHub and scored them by 50,000 skills. Sign-up on Prog,AI to search for software developers.