Marc Sun is a Machine Learning Engineer with six years of experience based in Paris, currently helping democratize ML at Hugging Face. He contributes to flagship open-source projects like transformers, diffusers and accelerate, focusing on device mapping, sharded checkpoint loading and practical quantization improvements (4- and 8-bit) including GPTQ integration. His work balances performance and usability—adding MPS support, model serialization and PEFT compatibility—while previously doubling Transformer inference speed on GPU and securing two patents for AI algorithms. Trained in applied mathematics and management science at CentraleSupélec and Columbia, he blends rigorous math with product-minded engineering to move cutting-edge models from research into production.
🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
Role in this project:
ML Engineer
Contributions:1 release, 478 reviews, 207 PRs in 1 year 10 months
Contributions summary:Marc primarily contributed to the `accelerate` repository by fixing issues related to PyTorch model quantization, specifically addressing bugs in 4-bit and 8-bit model loading and dispatching. Their work included adding support for MPS (Metal Performance Shaders) on macOS, which involved modifying big model inference, adding new tests, and improving overall model performance. They also implemented a `save_model` function for model serialization and integrated Peft (Parameter-Efficient Fine-Tuning) compatibility, showcasing their focus on practical deployment and usability enhancements.
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Role in this project:
ML Engineer
Contributions:1044 reviews, 291 PRs, 427 pushes in 1 year 10 months
Contributions summary:Marc primarily contributed to the core functionality of the `huggingface/transformers` repository. Their work involved modifying the behavior of device mapping when loading and training models. The commits show a focus on quantization aspects, specifically, improving support for 4-bit and 8-bit models, incorporating new GPTQ integration, and updating the code related to the torchao optimizer. They have also made corrections and improvements to the underlying code for enhanced system performance.
pythonbertspeech-recognitionstate-of-the-artflax
Find and Hire Top DevelopersWe’ve analyzed the programming source code of over 60 million software developers on GitHub and scored them by 50,000 skills. Sign-up on Prog,AI to search for software developers.
Request Free Trial
Marc Sun - Machine Learning Engineer at Hugging Face