Houjun Liu is an NLP-focused research engineer and founder in the San Francisco Bay Area with eight years of experience building production research software, open-source tools, and interactive products. As an undergraduate research associate in Stanford’s NLP group (working with Prof. Chris Manning) and an active contributor to the widely used stanza library, he implemented a tokenization postprocessor and refactored coreference code to support data augmentations. At CMU’s TalkBank he developed Batchalign, a corpus-alignment tool adopted by institutions worldwide for aphasia and Alzheimer’s detection research, and he leads product and engineering at Shabang and Condution, an open-source task manager that reached 7,000 users in five months. His portfolio spans embedded ML for autonomous drones and fire response, guidance systems for student space projects, and end-to-end MLOps to bring models into production. Interested in mechanistic interpretability, AGI, and Emacs, he blends academic rigor with startup execution to teach robots “how to robot” while optimizing human-facing experiences.
9 years of coding experience
5 years of employment as a software developer
The Nueva School
Participant, Participant at LaunchX Entrepreneurship Program
University of California Santa Cruz
Bachelor of Science - BS Computer Science — AI, Bachelor of Science - BS Computer Science — AI at Stanford University
Stanford NLP Python library for tokenization, sentence segmentation, NER, and parsing of many human languages
Role in this project:
Back-end Developer
Contributions:90 reviews, 25 PRs, 241 pushes in 1 year 6 months
Contributions summary:Houjun implemented a tokenization postprocessor, enhancing the existing tokenization process in the Stanford NLP Python library. Their work involved modifying the `stanza/models/tokenization/utils.py` file to include the postprocessor, allowing for manual token cleanup and customization. They also added tests to verify the postprocessor and reassembly functions. Further contributions include refactoring of existing coreference codebase to deal with the data augmentations
Contributions:101 commits, 132 pushes, 1 branch in 1 year 4 months
Find and Hire Top DevelopersWe’ve analyzed the programming source code of over 60 million software developers on GitHub and scored them by 50,000 skills. Sign-up on Prog,AI to search for software developers.