Li Shuming is a software engineer with 11 years of experience building high-performance query engines and big data systems, currently contributing at CelerData in Hangzhou. He has strong back-end expertise from roles at Alibaba (Hologres), Ant Group (AntSpark), NetEase, and Baidu, focusing on query optimization, storage, and distributed analytics. An active open-source contributor, he improved core database behaviors in projects like StarRocks—fixing empty-hash-table crashes and optimizing hash joins—and added SQL functions to Apache Calcite to better align it with PostgreSQL/MySQL semantics. His work shows a pragmatic combination of low-level performance tuning and rigorous testing, demonstrated by unit-test contributions that prevented regressions. With a Master’s from the University of Chinese Academy of Sciences, he brings both research-informed thinking and production-hardened delivery to large-scale analytics platforms.
11 years of coding experience
7 years of employment as a software developer
Master’s Degree, Master’s Degree at University of Chinese Academy of Sciences
Bachelor's degree, Bachelor's degree at Central South University
The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance for multi-dimensional analytics, real-time analytics, and ad-hoc queries. A Linux Foundation project.
Role in this project:
Back-end Developer
Contributions:1905 reviews, 42 commits, 1264 PRs in 5 months
Contributions summary:Li focused on optimizing hash join operations within the StarRocks database engine. Their work involved enhancing the handling of empty hash tables, including refactoring of the JoinHashMap class. They fixed a core dump issue arising when hash tables were empty. They also contributed to unit tests to validate these optimizations and bug fixes.
Contributions:5 commits, 8 PRs, 29 comments in 8 months
Contributions summary:Li primarily contributed to enhancing the Apache Calcite project's functionality through the addition of new SQL functions and improvements to existing code. They implemented functions such as MD5, SHA1, and REGEXP_REPLACE, aligning with functionalities present in other database systems like PostgreSQL and MySQL. Furthermore, the user made modifications in areas involving testing to improve efficiency, showcasing a commitment to overall code quality within the project.
geospatialapache-calcitesqlapachebig-data
Find and Hire Top DevelopersWe’ve analyzed the programming source code of over 60 million software developers on GitHub and scored them by 50,000 skills. Sign-up on Prog,AI to search for software developers.