Tom Augspurger is a software engineer with 11 years’ experience building scalable data and ML infrastructure, currently at NVIDIA after a four-year stint as a Geospatial Infrastructure Engineer at Microsoft. He’s a prolific open-source contributor in the PyData ecosystem, notably improving Dask, Dask-ML and fastparquet to tighten pandas and scikit-learn compatibility for large-scale, distributed workflows. His work spans backend systems, DevOps (including Kubernetes integration and CI tooling), and data-science-focused examples and docs that make distributed computing more accessible. He also maintains practical guides on effective pandas usage and has contributed to high-profile projects like xarray, seaborn, and fsspec/s3fs. Tom combines an academic grounding in econometrics with hands-on production experience, which shows in careful attention to compatibility, testing, and reproducible examples. A less obvious strength is his habit of improving developer experience—formatting, CI, and docs—so technical improvements stick and scale across communities.
11 years of coding experience
7 years of employment as a software developer
Bachelor of Arts (BA) Econometrics and Quantitative Economics, Bachelor of Arts (BA) Econometrics and Quantitative Economics at University of Northern Iowa
Master of Arts (MA) Economics, Master of Arts (MA) Economics at University of Iowa
Source code for my collection of articles on using pandas.
Role in this project:
Data Scientist
Contributions:28 commits, 11 PRs, 20 pushes in 3 years
Contributions summary:Tom primarily contributes to a collection of articles on using pandas, a data analysis library for Python. Their commits focus on cleaning up and standardizing the code, adding caching for downloads, and updating the introduction with current resources. The modifications involve updating the examples and content, aligning with the goals of the project which is to provide information on effective usage of pandas.
Contributions:12 releases, 41 reviews, 51 commits in 5 years 5 months
Contributions summary:Tom's contributions focused on integrating and maintaining scikit-learn compatibility within the Dask-ML library. They addressed several bug fixes and enhancements, including improvements to existing metrics and the addition of new features for regression. The user was also responsible for updating dependencies, particularly scikit-learn, and preparing for new releases of the library.
scalablepythondata-sciencedaskmachine-learning
Find and Hire Top DevelopersWe’ve analyzed the programming source code of over 60 million software developers on GitHub and scored them by 50,000 skills. Sign-up on Prog,AI to search for software developers.