Bo Zhang - Senior Software Engineer at Databricks

Bo Zhang

Senior Software Engineer at Databricks

Beijing, China

Join Prog.AI to see contacts

Summary

🤩

Rockstar

🎓

Top School

Bo Zhang is a Senior Software Engineer with six years of industry experience building scalable, distributed data systems across leading companies including Databricks, Uber, LinkedIn, and Amazon. Based in Beijing, he brings deep back-end expertise in large-scale data processing and shuffle/executor management, with notable open-source contributions to the widely used Apache Spark project—fixing Avro/Catalyst integration, Hadoop build issues, and improving shuffle dependency and decommission handling. His background combines rigorous electrical engineering training from Tsinghua and USC with hands-on production work, enabling him to bridge low-level system concerns and high-level analytics requirements. Colleagues rely on him for pragmatic solutions to brittle distributed workflows and for improving robustness in data pipelines.

6 years of coding experience

7 years of employment as a software developer

Master of Science (M.S.) Electrical and Electronics Engineering, Master of Science (M.S.) Electrical and Electronics Engineering at University of Southern California

Bachelor of Science (B.S.) Electrical and Electronics Engineering, Bachelor of Science (B.S.) Electrical and Electronics Engineering at Tsinghua University

English, Chinese

Github Skills (9)

avro10

big-data10

spark10

scala10

java9

error-handling9

javas9

sql8

hadoop8

Programming languages (4)

JavaCScalaHTML

Github contributions (5)

apache/spark

Feb 2021 - Feb 2022

Apache Spark - A unified analytics engine for large-scale data processing

Role in this project:

Back-end Developer

Contributions:78 reviews, 6 commits, 40 PRs in 1 year

Contributions summary:Bo primarily contributed to the Apache Spark project by addressing specific issues related to the Avro integration, data processing, and shuffle operations. They implemented features to support nullable Avro schemas with non-nullable Catalyst schemas, fixed build issues related to Hadoop versions, and improved the handling of SNAPSHOT dependencies. The user also worked on improving error handling and adding functionality for managing shuffle dependencies and executor decommission events within the core Spark framework.

analyticspythondata-processingsqlapache

bozhang2820/spark

Oct 2020 - Jan 2025

Apache Spark - A unified analytics engine for large-scale data processing

Contributions:141 pushes, 89 branches in 4 years 3 months

analyticsdata-processingapachebig-dataspark

Find and Hire Top DevelopersWe’ve analyzed the programming source code of over 60 million software developers on GitHub and scored them by 50,000 skills. Sign-up on Prog,AI to search for software developers.

Request Free Trial