Dashiell Stander - Machine Learning Engineer at Bild AI

Dashiell Stander

Machine Learning Engineer at Bild AI

San Francisco, California, United States

Join Prog.AI to see contacts

Summary

🤩

Rockstar

🎓

Top School

Dashiell S is a Machine Learning Engineer in San Francisco with 10 years of experience building and scaling large‑language‑model training infrastructure while doing mechanistic interpretability research. At EleutherAI he contributed to the high‑profile GPT‑NeoX codebase and DeeperSpeed—porting FlashAttention for ~20% throughput gains, implementing automatic model re‑sharding, and adding SLURM training tooling and model changes used in production GPU training. He led interpretability work that yielded an ICML 2024 paper and, as an ML Alignment & Theory Scholars fellow, developed methods to quantize activations and decompose them in the Walsh–Hadamard basis to expose how Transformers build complex dependencies with depth. Equally at home in DevOps and research, he bridges infrastructure, model engineering, and theory with a UC Berkeley applied mathematics foundation. A self‑described “machine learning alchemist,” he pairs open‑source impact on projects like EleutherAI/gpt‑neox with practical systems intuition for running large models efficiently.

10 years of coding experience

7 years of employment as a software developer

Bachelor's Degree, Applied Mathematics, Bachelor's Degree, Applied Mathematics at University of California, Berkeley

Github Skills (12)

transformers10

deepspeed10

slurm10

python10

gpu9

mlops9

language-models9

language-model9

bash8

docker6

dockerce6

dockers6

Programming languages (7)

CSSRustCGoMLIRJupyter NotebookPython

Github contributions (5)

EleutherAI/gpt-neox

Sep 2022 - Jan 2023

An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries

Role in this project:

DevOps Engineer & ML Engineer

Contributions:8 reviews, 112 commits, 20 PRs in 4 months

Contributions summary:Dashiell contributed significantly to the repository's infrastructure and training pipeline. They implemented a SLURM-based job submission script (`debug_srun.sh`) for GPU-accelerated training, setting up the environment and necessary dependencies. Furthermore, the user modified the core model code, including changes for text generation, integration of AliBi positional embeddings, and evaluation metrics. They also updated argument parsing and configuration files, demonstrating expertise in the project's build and training process.

pytorchlanguage-modeltransformersdeepspeedmodel-parallel

dashstander/sn-grok

Feb 2023 - Sep 2024

Contributions:16 PRs, 254 pushes, 11 branches in 1 year 7 months

Find and Hire Top DevelopersWe’ve analyzed the programming source code of over 60 million software developers on GitHub and scored them by 50,000 skills. Sign-up on Prog,AI to search for software developers.

Request Free Trial