DB Tsai - Senior Engineering Manager at The Apache Software Foundation

DB Tsai

Senior Engineering Manager at The Apache Software Foundation

Cupertino, California, United States

Join Prog.AI to see contacts

Summary

🤩

Rockstar

🎓

Top School

DB Tsai is a senior engineering manager and open-source leader with 14+ years building high-performance data platforms and ML infrastructure, currently leading cloud data efforts from Cupertino and recently joining Databricks. He scaled and led Apple’s Spark, Flink, and Data Security teams from small startups to award-winning organizations whose work earned back-to-back ACM SIGMOD Systems awards and produced industry-standard open-source components like a Spark-native accelerator and enterprise-grade encryption for columnar formats. A long-time Apache PMC member and committer on Spark and YuniKorn, he combines deep algorithmic contributions (e.g., Spark ML optimizers and online summarizers) with practical performance engineering—his commits touch core ML, SQL, and performance testing for one of the most widely used big-data engines. At Netflix and Alpine he built production ML pipelines and scalable algorithms that shortened experiment-to-deployment cycles and inspired patent-pending systems; he’s equally comfortable in low-level algorithm design and large-scale team-building. Known for turning ambiguous, research-grade ideas into production at scale, he also has a habit of turning internal engineering projects into widely adopted open-source tools.

14 years of coding experience

11 years of employment as a software developer

Bachelor's degree Physics, Bachelor's degree Physics at National Cheng Kung University

Master's degree Physics, Master's degree Physics at National Taiwan University

Doctor of Philosophy (Ph.D.) Program Applied Physics, Doctor of Philosophy (Ph.D.) Program Applied Physics at Stanford University

English, Chinese

Stackoverflow

Stats

1,388reputation

70kreached

4answers

10questions

Github Skills (28)

apache-spark10

spark10

data-science10

big-data10

machine-learning10

plot10

java10

ml10

scala10

javas10

html10

documentation10

javascript9

unit-testing9

linear-regression9

Programming languages (11)

JavaC++CSSShellRustCScalaHTML

Github contributions (5)

vegas-viz/Vegas

May 2016 - Oct 2016

The missing MatPlotLib for Scala + Spark

Role in this project:

Back-end Developer

Contributions:9 commits, 29 PRs, 30 pushes in 5 months

Contributions summary:DB primarily contributed to the development of the Vegas visualization library for Scala and Spark. Their work included enhancing the DSL for axis customization by adding parameters for labels, formats, and other axis-related properties. They also added Spark support for the library, enabling integration with Spark DataFrames and created unit tests. Further contributions included minor documentation updates and bug fixes.

spark-scalasparkmissingscaladatascience

apache/spark

Apr 2017 - Mar 2019

Apache Spark - A unified analytics engine for large-scale data processing

Role in this project:

Back-end Developer

Contributions:58 reviews, 5 commits, 154 PRs in 1 year 11 months

Contributions summary:DB primarily contributed to the Apache Spark project by fixing bugs, improving code quality, and adding new features. Their work involved correcting typos, filtering empty strings from configurations related to classloaders, and implementing online summarizer APIs for mean, variance, min, and max in MLlib. Additionally, the user addressed issues related to error messages and the DataFrame API. These contributions span across different modules, including MLlib, SQL, and core.

analyticspythondata-processingsqlapache

Find and Hire Top DevelopersWe’ve analyzed the programming source code of over 60 million software developers on GitHub and scored them by 50,000 skills. Sign-up on Prog,AI to search for software developers.

Request Free Trial