Repo for external large-scale work
Role in this project:
ML Engineer Contributions:10 reviews, 1 commit, 18 PRs in 1 day
Contributions summary:Nikolay primarily contributed to testing and integrating model-parallel training configurations within the MetaSeq framework. Their work involved creating and modifying tests for model-parallel training, specifically for MP1 and MP2 configurations, ensuring correct training step counts and loss values. They also experimented with various configurations and parameters, and addressed issues related to CUDA memory, contributing to the stability and functionality of distributed training capabilities.
transformersdockerscalelarge-scalehuggingface
A codebase implementing a simple GPT-like model from scratch based on the Attention is All You Need paper.
Contributions:1 review, 19 commits, 1 PR in 1 month