Forest Gregg is a data-focused engineer and product builder with 14 years of experience applying machine learning and engineering to messy, real-world data problems. As a partner at DataMade and co‑founder of dedupe.io, he has shipped production-grade tools for record linkage, address parsing, and census data access while mentoring teams and shaping product direction. Currently a Data Fellow with the Columbia Labor Lab, he’s building data systems to support the California Fast Food Workers’ Union, blending civic impact with technical rigor. His open-source contributions include practical improvements to widely used projects like csvkit, dedupe, and usaddress, reflecting a knack for improving data tooling and test coverage. Trained in sociology at the University of Chicago, he brings social-science perspective to technical design, prioritizing how information helps communities recognize and address shared challenges. Colleagues describe him as someone who improves both codebases and the teams that maintain them.
14 years of coding experience
6 years of employment as a software developer
Master's degree, Sociology, Master's degree, Sociology at University of Chicago
Contributions:1 release, 3 reviews, 98 commits in 5 years 9 months
Contributions summary:Forest primarily focused on enhancing the functionality of the Python-based US Census API wrapper. Their contributions include allowing queries for more than 50 variables, refactoring core methods for efficiency, and preparing the project for new releases. They also addressed code style issues and updated dependencies. The user also demonstrated an understanding of testing.
Contributions:165 commits, 13 PRs, 49 pushes in 7 years 1 month
Contributions summary:Forest's primary contribution involved developing and refining a machine-learning model for parsing and structuring unstructured United States address strings. The user implemented test cases to validate the model's performance. The user worked on feature engineering for the model, including adding and refining the features used for address component identification. The user also contributed to the training data used for the model.
python-librarynlppythonstringsaddress
Find and Hire Top DevelopersWe’ve analyzed the programming source code of over 60 million software developers on GitHub and scored them by 50,000 skills. Sign-up on Prog,AI to search for software developers.