Niklas 

Palo Alto, California, United States
email-iconphone-icongithub-logolinkedin-logotwitter-logostackoverflow-logofacebook-logo
Join Prog.AI to see contacts
email-iconphone-icongithub-logolinkedin-logotwitter-logostackoverflow-logofacebook-logo
Join Prog.AI to see contacts

Summary

🤩
Rockstar
Niklas Muennighoff is an AI researcher in Palo Alto with five years of experience building evaluation infrastructure, data pipelines, and efficient inference tooling for language and embedding models. He has held a research-engineer role at Hugging Face and currently works at AI2 and Contextual AI, while contributing to flagship open-source projects like EleutherAI's lm-evaluation-harness—where he added ethics-focused evaluation tasks—and the Massive Text Embedding Benchmark (MTEB). His work spans dataset engineering, metric design and bug fixes (e.g., main-score handling and BLEU integration), and runtime optimizations such as VLLM/test-time scaling to make model evaluation more reliable and multilingual. A Peking University graduate now starting a PhD in CS at Stanford, he also brings the uncommon background of years as a Disney voice-over artist, which shapes his focus on human-centered prompts and clear evaluation.
code5 years of coding experience
github-logo-circle

Github Skills (30)

sentence-transformers10
embed10
python10
word-embeddings10
machine-learning10
text-classification10
wordembedding10
semantic-search10
trainings10
natural-language-processing10
word-embedding10
evaluation-framework10
nlp10
bleu10
embedding10

Programming languages (10)

TypeScriptShellC++CSSCHandlebarsJavaScriptHTML

Github contributions (5)

github-logo-circle
simplescaling/s1

Feb 2025 - Apr 2025

s1: Simple test-time scaling
Role in this project:
userML Engineer
Contributions:14 reviews, 10 PRs, 44 pushes in 1 month
Contributions summary:Niklas's commits primarily involve modifications to data loading, preprocessing, and model training scripts, specifically `data/collect_data.py` and `train/sft.py`. These changes include updates to dataset paths and loading mechanisms for open-source math datasets, indicating a focus on preparing data for model training. Further modifications to training configurations suggest involvement in model experimentation and refinement. The integration with VLLM, as seen in `eval/generate.py`, suggests a focus on efficient inference.
embeddings-benchmark/mteb

Jul 2022 - Jan 2023

MTEB: Massive Text Embedding Benchmark
Role in this project:
userBack-end Developer
Contributions:7 releases, 255 reviews, 153 commits in 6 months
Contributions summary:Niklas primarily focused on fixing issues related to the main score calculation in a multilingual text embedding benchmark. They addressed warnings and made adjustments to the `AbsTaskClassification.py` file, ensuring the correct handling of main scores. Furthermore, the user updated various task configurations in multiple files to correctly set main scores and fix task splits for accurate evaluation. The user also refactored the summarization evaluator and adjusted the code to skip samples with no variance.
bertsimilarity-searchbenchmarkretrievalbug-reporting
Find and Hire Top DevelopersWe’ve analyzed the programming source code of over 60 million software developers on GitHub and scored them by 50,000 skills. Sign-up on Prog,AI to search for software developers.
Request Free Trial