30meng Meng - Software Engineer at 华清远见

30meng Meng

Software Engineer at 华清远见

Shanghai, Shanghai, China

Join Prog.AI to see contacts

Summary

🤩

Rockstar

30meng Meng is a software engineer with seven years of experience specializing in high-performance ML infrastructure and backend development, currently based in Shanghai and working at Intel. He contributes to prominent open-source projects like intel/neural-compressor, ggml, and llama.cpp, focusing on low-bit quantization, sparsity, and SYCL-enabled GPU acceleration to squeeze performance from modern hardware. His work spans TensorFlow, PyTorch, ONNX Runtime and C/C++ inference stacks, with practical contributions that include folding constants, fixing test frameworks, and implementing optimized GEMM and matrix-vector kernels. He has a knack for porting and tuning compute kernels for heterogeneous devices (notably Intel GPUs) and bringing quantized data types into efficient matrix operations. Beyond coding, he also teaches (讲师) and bridges hands-on engineering with knowledge-sharing, indicating strong communication and mentoring instincts. Colleagues can expect a pragmatic engineer who pairs low-level optimization skills with a systems view of ML model deployment.

7 years of coding experience

Github Skills (28)

c-language10

python10

back-end-development10

gpu-programming10

machine-learning10

llm10

model-compression10

sycl10

gm10

tensorflow10

quantization10

gml10

backend10

cprogramming-language10

intel9

Programming languages (5)

C++CGoMLIRPython

Github contributions (5)

ggml-org/llama.cpp

May 2023 - Mar 2025

LLM inference in C/C++

Role in this project:

Back-end Developer

Contributions:130 reviews, 62 PRs, 97 pushes in 1 year 10 months

Contributions summary:30meng primarily contributed to the SYCL (Single-source, Heterogeneous Compute Language) implementation within the llama.cpp repository, focusing on integrating and optimizing the project for SYCL-enabled devices. Their work involved modifying and extending existing SYCL code, with changes related to device memory allocation, and optimization of GEMM and other low-level kernel implementations. The user also added support for SYCL backend and optimized the code for Intel hardware.

ggmlllama

intel/neural-compressor

Aug 2020 - May 2022

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

Role in this project:

Back-end Developer & ML Engineer

Contributions:54 commits, 2 comments in 1 year 10 months

Contributions summary:30meng's contributions primarily focus on optimizing and extending the `intel/neural-compressor` repository, which is centered on low-bit quantization and sparsity for large language models. The commits involve developing and improving the functionality related to folding constant sequences within TensorFlow graphs. Furthermore, the user fixed bugs and enhanced the testing framework.

knowledge-distillationauto-tuningcompressorsparsityintel

Find and Hire Top DevelopersWe’ve analyzed the programming source code of over 60 million software developers on GitHub and scored them by 50,000 skills. Sign-up on Prog,AI to search for software developers.

Request Free Trial