Senior Engineering Manager at The Apache Software Foundation
Cupertino, California, United States
Join Prog.AI to see contacts
Join Prog.AI to see contacts
Summary
🤩
Rockstar
🎓
Top School
DB Tsai is a senior engineering manager and open-source leader with 14+ years building high-performance data platforms and ML infrastructure, currently leading cloud data efforts from Cupertino and recently joining Databricks. He scaled and led Apple’s Spark, Flink, and Data Security teams from small startups to award-winning organizations whose work earned back-to-back ACM SIGMOD Systems awards and produced industry-standard open-source components like a Spark-native accelerator and enterprise-grade encryption for columnar formats. A long-time Apache PMC member and committer on Spark and YuniKorn, he combines deep algorithmic contributions (e.g., Spark ML optimizers and online summarizers) with practical performance engineering—his commits touch core ML, SQL, and performance testing for one of the most widely used big-data engines. At Netflix and Alpine he built production ML pipelines and scalable algorithms that shortened experiment-to-deployment cycles and inspired patent-pending systems; he’s equally comfortable in low-level algorithm design and large-scale team-building. Known for turning ambiguous, research-grade ideas into production at scale, he also has a habit of turning internal engineering projects into widely adopted open-source tools.
14 years of coding experience
11 years of employment as a software developer
Bachelor's degree Physics, Bachelor's degree Physics at National Cheng Kung University
Master's degree Physics, Master's degree Physics at National Taiwan University
Doctor of Philosophy (Ph.D.) Program Applied Physics, Doctor of Philosophy (Ph.D.) Program Applied Physics at Stanford University
Contributions:9 commits, 29 PRs, 30 pushes in 5 months
Contributions summary:DB primarily contributed to the development of the Vegas visualization library for Scala and Spark. Their work included enhancing the DSL for axis customization by adding parameters for labels, formats, and other axis-related properties. They also added Spark support for the library, enabling integration with Spark DataFrames and created unit tests. Further contributions included minor documentation updates and bug fixes.
Apache Spark - A unified analytics engine for large-scale data processing
Role in this project:
Back-end Developer
Contributions:58 reviews, 5 commits, 154 PRs in 1 year 11 months
Contributions summary:DB primarily contributed to the Apache Spark project by fixing bugs, improving code quality, and adding new features. Their work involved correcting typos, filtering empty strings from configurations related to classloaders, and implementing online summarizer APIs for mean, variance, min, and max in MLlib. Additionally, the user addressed issues related to error messages and the DataFrame API. These contributions span across different modules, including MLlib, SQL, and core.
analyticspythondata-processingsqlapache
Find and Hire Top DevelopersWe’ve analyzed the programming source code of over 60 million software developers on GitHub and scored them by 50,000 skills. Sign-up on Prog,AI to search for software developers.
Request Free Trial
DB Tsai - Senior Engineering Manager at The Apache Software Foundation