Shaoting Feng is an advisor and engineer focused on drastically improving AI inference efficiency, with five years of experience building production-grade systems used by enterprises like NVIDIA and IBM Cloud. At Tensormesh he shipped high-impact optimizations—20× faster KV cache transmission, 2.29× TTFT improvement via dynamic CPU offloading, and 5.49× TTFT gains for multimodal workloads—helping operate >300TB of KV cache and serve over a billion weekly hit tokens. A University of Chicago pre-doc CS student and former EPFL exchange, he blends rigorous research—e.g., practical fairness metrics for network allocation—with hands-on systems engineering. Active on the LMCache team, he’s comfortable taking ideas from prototype to large-scale deployment and balancing GPU/CPU trade-offs in latency-sensitive inference.
5 years of coding experience
Semester Exchange Computer Science, Semester Exchange Computer Science at EPFL
Pre-Doc Master Computer Science, Pre-Doc Master Computer Science at University of Chicago
Bachelor of Engineering - BE Information Engineering, Bachelor of Engineering - BE Information Engineering at Shanghai Jiao Tong University
Making Long-Context LLM Inference 10x Faster and 10x Cheaper
Contributions:69 PRs, 112 pushes, 25 branches in 3 months
Find and Hire Top DevelopersWe’ve analyzed the programming source code of over 60 million software developers on GitHub and scored them by 50,000 skills. Sign-up on Prog,AI to search for software developers.