Guangtai Huang - SDE at Amazon Web Services (AWS)

Guangtai Huang

SDE at Amazon Web Services (AWS)

San Jose, California, United States

Join Prog.AI to see contacts

Summary

🤩

Rockstar

🎓

Top School

Guangtai Huang is an SDE based in San Jose with nine years of experience building performant deep learning compilers and backend systems at AWS. He has a strong open-source track record contributing to high-profile projects like Apache MXNet and TVM, where he implemented operators (e.g., isnan, numpy ops), added BF16/CUDA support, and improved correctness and memory efficiency at the C++/CUDA level. His background includes focused work on NumPy compatibility in MXNet during an AWS Shanghai AI Lab internship, showing both production-grade engineering and research-adjacent compiler expertise. Comfortable across low-level systems and ML stack integration, he combines pragmatic bug-fixing with adding core functionality that benefits diverse hardware backends.

9 years of coding experience

3 years of employment as a software developer

工学学士, 信息工程, 工学学士, 信息工程 at 南方科技大学

Github Skills (15)

cuda10

compiler10

tvm10

mxnet10

deeplearning-ai10

c-language10

compiler-compiler10

deep-learning10

cprogramming-language10

python10

numpy10

gpu9

machine-learning9

bfd9

tensor9

Programming languages (9)

JuliaTypeScriptC++CSCSSJavaScriptGoJupyter Notebook

Github contributions (5)

apache/mxnet

Jul 2019 - May 2020

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more

Role in this project:

Back-end Developer

Contributions:17 commits, 54 PRs, 100 comments in 9 months

Contributions summary:Guangtai primarily focused on fixing bugs and improving the `numpy` operator implementations within the `mxnet` deep learning framework. Their work involved modifying C++ and CUDA code related to the `np_unique_op` and `where` operators, optimizing memory usage, and addressing potential issues. Additionally, the user contributed to the test suite by adjusting and adding test cases to ensure the correct functionality of the implemented operators.

pythonschedulerdataflowmutationdata-science

apache/tvm

Sep 2019 - Feb 2022

Open deep learning compiler stack for cpu, gpu and specialized accelerators

Role in this project:

ML Engineer

Contributions:5 reviews, 10 commits, 12 PRs in 2 years 4 months

Contributions summary:Guangtai primarily contributed to the development of TVM, an open-source deep learning compiler stack. Their work focused on adding the `isnan` operator to the codebase, which involved implementing the operator, integrating it into the test suite, and supporting various data types including float and bfloat16. In addition, the user modified the compile engine, updated relay passes to improve efficiency, and added BF16 support to CUDA codegen, further demonstrating contributions to core compiler functionality and GPU-specific optimization.

metalvulkancompilertensoropencl

Find and Hire Top DevelopersWe’ve analyzed the programming source code of over 60 million software developers on GitHub and scored them by 50,000 skills. Sign-up on Prog,AI to search for software developers.

Request Free Trial