Hua Jiang - Principal Software Engineer at AMD

Hua Jiang

Principal Software Engineer at AMD

San Jose, California, United States

Join Prog.AI to see contacts

Summary

🤩

Rockstar

🎓

Top School

Hua Jiang is a Principal Software Engineer and systems architect based in San Jose with two decades of experience building drivers, runtimes, and compilers for AI accelerators and high-throughput networking. At AMD he leads AI Engine driver and framework efforts, combining TVM, LLVM/MLIR and baremetal runtimes to optimize LLM fine-tuning and inference on specialized hardware. His background spans kernel and firmware work across vendors (AMD, Juniper, Dell, Riverbed) and includes leading SD/WAN data-plane and WLAN driver teams, giving him deep cross-stack expertise from silicon to cloud. An active contributor to the widely used apache/tvm project, he has improved the VTA accelerator runtime, reduced memory use, and added TFLite operator and simulator multithreading support. Known for diagnosing hard-to-reproduce kernel and hardware issues, he pairs pragmatic engineering with performance-first design. He holds a BS in Mechanical Design and Manufacturing, bringing a hardware-aware perspective to software architecture.

9 years of coding experience

15 years of employment as a software developer

BS Mechanical design and manufacturing, BS Mechanical design and manufacturing at Nanjing University of Aeronautics and Astronautics

English, Chinese

Github Skills (12)

tvm10

compiler10

compiler-compiler10

c-language10

deep-learning10

cprogramming-language10

performance-optimization10

gpu9

tensor9

machine-learning8

python8

cuda7

Programming languages (4)

C++CScalaPython

Github contributions (5)

apache/tvm

May 2019 - Jul 2022

Open deep learning compiler stack for cpu, gpu and specialized accelerators

Role in this project:

Back-end Developer & Performance Engineer

Contributions:408 reviews, 56 commits, 66 PRs in 3 years 2 months

Contributions summary:Hua primarily worked on improving the VTA (Versatile Tensor Accelerator) component of the TVM project. Their contributions focused on fixing critical bugs related to VTA runtime, DRAM memory access, and compilation issues on the PYNQ board. Furthermore, the user implemented optimizations to reduce memory usage within VTA and added support for new operators to the TFLite frontend, and added multi-threading support for function simulator. They also addressed and resolved several performance-related issues, including those involving DRAM logic and hardware compilation errors.

metalvulkancompilertensoropencl

huajsj/tvm

May 2019 - Sep 2020

Open deep learning compiler stack for cpu, gpu and specialized accelerators

Contributions:2 PRs, 181 pushes, 21 branches in 1 year 4 months

cpugpu-programmingamdgpu-accelerationtvm

Find and Hire Top DevelopersWe’ve analyzed the programming source code of over 60 million software developers on GitHub and scored them by 50,000 skills. Sign-up on Prog,AI to search for software developers.

Request Free Trial