Bo Zhang

Senior Software Engineer at Databricks

Beijing, China
email-iconphone-icongithub-logolinkedin-logotwitter-logostackoverflow-logofacebook-logo
Join Prog.AI to see contacts
email-iconphone-icongithub-logolinkedin-logotwitter-logostackoverflow-logofacebook-logo
Join Prog.AI to see contacts

Summary

🤩
Rockstar
🎓
Top School
Bo Zhang is a Senior Software Engineer with six years of industry experience building scalable, distributed data systems across leading companies including Databricks, Uber, LinkedIn, and Amazon. Based in Beijing, he brings deep back-end expertise in large-scale data processing and shuffle/executor management, with notable open-source contributions to the widely used Apache Spark project—fixing Avro/Catalyst integration, Hadoop build issues, and improving shuffle dependency and decommission handling. His background combines rigorous electrical engineering training from Tsinghua and USC with hands-on production work, enabling him to bridge low-level system concerns and high-level analytics requirements. Colleagues rely on him for pragmatic solutions to brittle distributed workflows and for improving robustness in data pipelines.
code6 years of coding experience
job7 years of employment as a software developer
bookMaster of Science (M.S.) Electrical and Electronics Engineering, Master of Science (M.S.) Electrical and Electronics Engineering at University of Southern California
bookBachelor of Science (B.S.) Electrical and Electronics Engineering, Bachelor of Science (B.S.) Electrical and Electronics Engineering at Tsinghua University
languagesEnglish, Chinese
github-logo-circle

Github Skills (9)

avro10
big-data10
spark10
scala10
java9
error-handling9
javas9
sql8
hadoop8

Programming languages (4)

JavaCScalaHTML

Github contributions (5)

github-logo-circle
apache/spark

Feb 2021 - Feb 2022

Apache Spark - A unified analytics engine for large-scale data processing
Role in this project:
userBack-end Developer
Contributions:78 reviews, 6 commits, 40 PRs in 1 year
Contributions summary:Bo primarily contributed to the Apache Spark project by addressing specific issues related to the Avro integration, data processing, and shuffle operations. They implemented features to support nullable Avro schemas with non-nullable Catalyst schemas, fixed build issues related to Hadoop versions, and improved the handling of SNAPSHOT dependencies. The user also worked on improving error handling and adding functionality for managing shuffle dependencies and executor decommission events within the core Spark framework.
analyticspythondata-processingsqlapache
bozhang2820/spark

Oct 2020 - Jan 2025

Apache Spark - A unified analytics engine for large-scale data processing
Contributions:141 pushes, 89 branches in 4 years 3 months
analyticsdata-processingapachebig-dataspark
Find and Hire Top DevelopersWe’ve analyzed the programming source code of over 60 million software developers on GitHub and scored them by 50,000 skills. Sign-up on Prog,AI to search for software developers.
Request Free Trial
Bo Zhang - Senior Software Engineer at Databricks