Hua Jiang is a Principal Software Engineer and systems architect based in San Jose with two decades of experience building drivers, runtimes, and compilers for AI accelerators and high-throughput networking. At AMD he leads AI Engine driver and framework efforts, combining TVM, LLVM/MLIR and baremetal runtimes to optimize LLM fine-tuning and inference on specialized hardware. His background spans kernel and firmware work across vendors (AMD, Juniper, Dell, Riverbed) and includes leading SD/WAN data-plane and WLAN driver teams, giving him deep cross-stack expertise from silicon to cloud. An active contributor to the widely used apache/tvm project, he has improved the VTA accelerator runtime, reduced memory use, and added TFLite operator and simulator multithreading support. Known for diagnosing hard-to-reproduce kernel and hardware issues, he pairs pragmatic engineering with performance-first design. He holds a BS in Mechanical Design and Manufacturing, bringing a hardware-aware perspective to software architecture.
9 years of coding experience
15 years of employment as a software developer
BS Mechanical design and manufacturing, BS Mechanical design and manufacturing at Nanjing University of Aeronautics and Astronautics
Open deep learning compiler stack for cpu, gpu and specialized accelerators
Role in this project:
Back-end Developer & Performance Engineer
Contributions:408 reviews, 56 commits, 66 PRs in 3 years 2 months
Contributions summary:Hua primarily worked on improving the VTA (Versatile Tensor Accelerator) component of the TVM project. Their contributions focused on fixing critical bugs related to VTA runtime, DRAM memory access, and compilation issues on the PYNQ board. Furthermore, the user implemented optimizations to reduce memory usage within VTA and added support for new operators to the TFLite frontend, and added multi-threading support for function simulator. They also addressed and resolved several performance-related issues, including those involving DRAM logic and hardware compilation errors.
Open deep learning compiler stack for cpu, gpu and specialized accelerators
Contributions:2 PRs, 181 pushes, 21 branches in 1 year 4 months
cpugpu-programmingamdgpu-accelerationtvm
Find and Hire Top DevelopersWe’ve analyzed the programming source code of over 60 million software developers on GitHub and scored them by 50,000 skills. Sign-up on Prog,AI to search for software developers.