Jongsoo Park

Menlo Park, California, United States
email-iconphone-icongithub-logolinkedin-logotwitter-logostackoverflow-logofacebook-logo
Join Prog.AI to see contacts
email-iconphone-icongithub-logolinkedin-logotwitter-logostackoverflow-logofacebook-logo
Join Prog.AI to see contacts

Summary

🤩
Rockstar
Jongsoo Park is a Member of Technical Staff at OpenAI and a software/processor-architecture co-design specialist with 10 years of experience optimizing ML, HPC, and graph analytics workloads. He led SW/HW/model co-design at Meta — co-design lead for the Meta Training and Inference Accelerator and a core contributor to Llama3’s pretrain scalability and performance decisions. An active open-source performance engineer, his high-impact contributions include SIMD-optimized GEMM work in Facebook’s FBGEMM, bfloat16 and flash-attention fixes in PyTorch, and low-level AVX512 and Winograd tuning in libxsmm. His earlier research produced an LLVM back-end for a low-power microprocessor and code that powered a top HPCG benchmark and earned a Supercomputing best-paper award for low-communication FFTs. Based in Palo Alto with a Stanford PhD in Electrical Engineering, he blends deep systems research with production-focused engineering at scale.
code10 years of coding experience
github-logo-circle

Github Skills (29)

pytorch10
c-language10
caffe10
back-end-development10
matrix-multiplication10
machine-learning10
avx51210
deep-learning10
gpu10
performance-optimization10
ai10
caffe210
convolution10
simd10
jit10

Programming languages (5)

JuliaShellC++CPython

Github contributions (5)

github-logo-circle
pytorch/FBGEMM

Nov 2018 - Jan 2023

FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/
Role in this project:
userBack-end Developer & Performance Engineer
Contributions:8 reviews, 316 commits, 344 PRs in 4 years 2 months
Contributions summary:Jongsoo focused on the implementation and optimization of core functionality within the fbgemm library, specifically contributing to matrix-matrix multiplication (GEMM) operations. Their work involved the addition of new methods, such as `equals` and `metaEquals`, to the `PackBMatrix` class, as well as significant refactoring of transpose code, optimizing it with SIMD instructions. Additionally, the user addressed rounding consistency issues and adapted the code to support group convolutions, highlighting their dedication to performance improvements and feature enhancements.
matrix-multiplicationfacebookmultiplicationmatrixml-applications
pytorch/pytorch

Apr 2018 - Dec 2022

Tensors and Dynamic neural networks in Python with strong GPU acceleration
Role in this project:
userML Engineer
Contributions:11 reviews, 217 commits, 299 PRs in 4 years 8 months
Contributions summary:Jongsoo primarily focused on improving and optimizing the performance of machine learning models and related infrastructure within the PyTorch ecosystem. Their contributions included fixing issues in the inductor compiler, a component used for optimizing model performance, and addressing problems in the transformer benchmark, particularly with scaled dot-product attention. They also added bfloat16 support in erfinv and made changes to the flash attention implementation.
pythongpu-accelerationdeep-learninggpunumpy
Find and Hire Top DevelopersWe’ve analyzed the programming source code of over 60 million software developers on GitHub and scored them by 50,000 skills. Sign-up on Prog,AI to search for software developers.
Request Free Trial