Dashiell S is a Machine Learning Engineer in San Francisco with 10 years of experience building and scaling large‑language‑model training infrastructure while doing mechanistic interpretability research. At EleutherAI he contributed to the high‑profile GPT‑NeoX codebase and DeeperSpeed—porting FlashAttention for ~20% throughput gains, implementing automatic model re‑sharding, and adding SLURM training tooling and model changes used in production GPU training. He led interpretability work that yielded an ICML 2024 paper and, as an ML Alignment & Theory Scholars fellow, developed methods to quantize activations and decompose them in the Walsh–Hadamard basis to expose how Transformers build complex dependencies with depth. Equally at home in DevOps and research, he bridges infrastructure, model engineering, and theory with a UC Berkeley applied mathematics foundation. A self‑described “machine learning alchemist,” he pairs open‑source impact on projects like EleutherAI/gpt‑neox with practical systems intuition for running large models efficiently.
10 years of coding experience
7 years of employment as a software developer
Bachelor's Degree, Applied Mathematics, Bachelor's Degree, Applied Mathematics at University of California, Berkeley
An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries
Role in this project:
DevOps Engineer & ML Engineer
Contributions:8 reviews, 112 commits, 20 PRs in 4 months
Contributions summary:Dashiell contributed significantly to the repository's infrastructure and training pipeline. They implemented a SLURM-based job submission script (`debug_srun.sh`) for GPU-accelerated training, setting up the environment and necessary dependencies. Furthermore, the user modified the core model code, including changes for text generation, integration of AliBi positional embeddings, and evaluation metrics. They also updated argument parsing and configuration files, demonstrating expertise in the project's build and training process.
Contributions:16 PRs, 254 pushes, 11 branches in 1 year 7 months
Find and Hire Top DevelopersWe’ve analyzed the programming source code of over 60 million software developers on GitHub and scored them by 50,000 skills. Sign-up on Prog,AI to search for software developers.
Request Free Trial
Dashiell Stander - Machine Learning Engineer at Bild AI