Balaji Subramaniam is an Engineering Manager at Google based in the Greater Seattle Area with nine years of industry experience building cloud-native and ML infrastructure. He blends hands-on DevOps and backend engineering with leadership, focusing on Kubernetes orchestration, node-level resource discovery, and production-ready ML workflows. His open-source work includes enhancing the kubernetes-sigs/node-feature-discovery to detect and label RDT (Resource Director Technology) capabilities and contributing NFS/S3 backends and Kubernetes orchestration to Intel Labs' Coach for distributed reinforcement learning. With a PhD in computer science and a track record of turning research-grade systems into scalable production services, he bridges academic depth and pragmatic engineering execution.
9 years of coding experience
PhD, Computer Science, PhD, Computer Science at Virginia Tech
Contributions:22 commits, 49 PRs, 19 pushes in 1 year 9 months
Contributions summary:Balaji primarily focused on enhancing the node feature discovery process within a Kubernetes environment. Their contributions involved enabling RDT (Resource Director Technology) discovery and integrating it into the node labeling system. This included modifying the main application and creating scripts to detect and label nodes based on CPU features and RDT capabilities. Furthermore, they added demo scripts and templates to streamline the testing and demonstration of these features.
Reinforcement Learning Coach by Intel AI Lab enables easy experimentation with state of the art Reinforcement Learning algorithms
Role in this project:
Back-end & DevOps Engineer
Contributions:11 commits, 4 PRs, 4 pushes in 1 month
Contributions summary:Balaji contributed significantly to the development of data store backends, adding NFS and S3 implementations for data storage within the reinforcement learning environment. They integrated Kubernetes orchestration for these data stores. The commits also reflect the setup and integration of distributed Coach functionalities, including the configuration and deployment of trainers and rollout workers using Kubernetes. Furthermore, they refactored the code by making improvements on save checkpoint secs arg in distributed Coach and also updated the way how to handle both Environment Steps and Episodes on the subscriber side.
Find and Hire Top DevelopersWeāve analyzed the programming source code of over 60 million software developers on GitHub and scored them by 50,000 skills. Sign-up on Prog,AI to search for software developers.