30meng Meng

Software Engineer at 华清远见

Shanghai, Shanghai, China
email-iconphone-icongithub-logolinkedin-logotwitter-logostackoverflow-logofacebook-logo
Join Prog.AI to see contacts
email-iconphone-icongithub-logolinkedin-logotwitter-logostackoverflow-logofacebook-logo
Join Prog.AI to see contacts

Summary

🤩
Rockstar
30meng Meng is a software engineer with seven years of experience specializing in high-performance ML infrastructure and backend development, currently based in Shanghai and working at Intel. He contributes to prominent open-source projects like intel/neural-compressor, ggml, and llama.cpp, focusing on low-bit quantization, sparsity, and SYCL-enabled GPU acceleration to squeeze performance from modern hardware. His work spans TensorFlow, PyTorch, ONNX Runtime and C/C++ inference stacks, with practical contributions that include folding constants, fixing test frameworks, and implementing optimized GEMM and matrix-vector kernels. He has a knack for porting and tuning compute kernels for heterogeneous devices (notably Intel GPUs) and bringing quantized data types into efficient matrix operations. Beyond coding, he also teaches (讲师) and bridges hands-on engineering with knowledge-sharing, indicating strong communication and mentoring instincts. Colleagues can expect a pragmatic engineer who pairs low-level optimization skills with a systems view of ML model deployment.
code7 years of coding experience
github-logo-circle

Github Skills (28)

c-language10
python10
back-end-development10
gpu-programming10
machine-learning10
llm10
model-compression10
sycl10
gm10
tensorflow10
quantization10
gml10
backend10
cprogramming-language10
intel9

Programming languages (5)

C++CGoMLIRPython

Github contributions (5)

github-logo-circle
ggml-org/llama.cpp

May 2023 - Mar 2025

LLM inference in C/C++
Role in this project:
userBack-end Developer
Contributions:130 reviews, 62 PRs, 97 pushes in 1 year 10 months
Contributions summary:30meng primarily contributed to the SYCL (Single-source, Heterogeneous Compute Language) implementation within the llama.cpp repository, focusing on integrating and optimizing the project for SYCL-enabled devices. Their work involved modifying and extending existing SYCL code, with changes related to device memory allocation, and optimization of GEMM and other low-level kernel implementations. The user also added support for SYCL backend and optimized the code for Intel hardware.
ggmlllama
intel/neural-compressor

Aug 2020 - May 2022

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
Role in this project:
userBack-end Developer & ML Engineer
Contributions:54 commits, 2 comments in 1 year 10 months
Contributions summary:30meng's contributions primarily focus on optimizing and extending the `intel/neural-compressor` repository, which is centered on low-bit quantization and sparsity for large language models. The commits involve developing and improving the functionality related to folding constant sequences within TensorFlow graphs. Furthermore, the user fixed bugs and enhanced the testing framework.
knowledge-distillationauto-tuningcompressorsparsityintel
Find and Hire Top DevelopersWe’ve analyzed the programming source code of over 60 million software developers on GitHub and scored them by 50,000 skills. Sign-up on Prog,AI to search for software developers.
Request Free Trial
30meng Meng - Software Engineer at 华清远见