Summary
Xia Hua is a software engineer with 10 years of experience building AI infrastructure that makes large models faster, more predictable, and cheaper to run at cloud scale. Currently at Google, she focuses on LLM inference systems, TPU-aware performance optimization, Kubernetes-based serving, and cross-team AI benchmarking, with deep expertise in prefill/decode disaggregation, KV cache behavior, cold-start latency, and token-level analysis. Previously at AWS she spanned SageMaker platform systems, large-scale data pipelines and annotation workflows, and EC2 Nitro provisioning and readiness—giving her rare end-to-end visibility from data and compute to inference. She’s an active open-source contributor whose tooling and benchmarks have been adopted across major organizations and whose Result Store initiative (Prism) was showcased at Google Cloud Next 2026. Based in Seattle, Xia blends systems-level performance engineering with product-minded infrastructure design, and brings a background in game and graphics optimization that informs her pragmatic approach to latency and resource efficiency.
10 years of coding experience
1 year of employment as a software developer
Pre-master, Pre-master at University of Southern California
Bachelor's degree, Network Engineering, GPA:3.53, Bachelor's degree, Network Engineering, GPA:3.53 at University of Electronic Science and Technology of China
Chinese, English