Dashiell Stander

Machine Learning Engineer at Bild AI

San Francisco, California, United States
email-iconphone-icongithub-logolinkedin-logotwitter-logostackoverflow-logofacebook-logo
Join Prog.AI to see contacts
email-iconphone-icongithub-logolinkedin-logotwitter-logostackoverflow-logofacebook-logo
Join Prog.AI to see contacts

Summary

🤩
Rockstar
🎓
Top School
Dashiell S is a Machine Learning Engineer in San Francisco with 10 years of experience building and scaling large‑language‑model training infrastructure while doing mechanistic interpretability research. At EleutherAI he contributed to the high‑profile GPT‑NeoX codebase and DeeperSpeed—porting FlashAttention for ~20% throughput gains, implementing automatic model re‑sharding, and adding SLURM training tooling and model changes used in production GPU training. He led interpretability work that yielded an ICML 2024 paper and, as an ML Alignment & Theory Scholars fellow, developed methods to quantize activations and decompose them in the Walsh–Hadamard basis to expose how Transformers build complex dependencies with depth. Equally at home in DevOps and research, he bridges infrastructure, model engineering, and theory with a UC Berkeley applied mathematics foundation. A self‑described “machine learning alchemist,” he pairs open‑source impact on projects like EleutherAI/gpt‑neox with practical systems intuition for running large models efficiently.
code10 years of coding experience
job7 years of employment as a software developer
bookBachelor's Degree, Applied Mathematics, Bachelor's Degree, Applied Mathematics at University of California, Berkeley
github-logo-circle

Github Skills (12)

transformers10
deepspeed10
slurm10
python10
gpu9
mlops9
language-models9
language-model9
bash8
docker6
dockerce6
dockers6

Programming languages (7)

CSSRustCGoMLIRJupyter NotebookPython

Github contributions (5)

github-logo-circle
EleutherAI/gpt-neox

Sep 2022 - Jan 2023

An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries
Role in this project:
userDevOps Engineer & ML Engineer
Contributions:8 reviews, 112 commits, 20 PRs in 4 months
Contributions summary:Dashiell contributed significantly to the repository's infrastructure and training pipeline. They implemented a SLURM-based job submission script (`debug_srun.sh`) for GPU-accelerated training, setting up the environment and necessary dependencies. Furthermore, the user modified the core model code, including changes for text generation, integration of AliBi positional embeddings, and evaluation metrics. They also updated argument parsing and configuration files, demonstrating expertise in the project's build and training process.
pytorchlanguage-modeltransformersdeepspeedmodel-parallel
dashstander/sn-grok

Feb 2023 - Sep 2024

Contributions:16 PRs, 254 pushes, 11 branches in 1 year 7 months
Find and Hire Top DevelopersWe’ve analyzed the programming source code of over 60 million software developers on GitHub and scored them by 50,000 skills. Sign-up on Prog,AI to search for software developers.
Request Free Trial
Dashiell Stander - Machine Learning Engineer at Bild AI