Siddharth Dalmia

Member Of Technical Staff at WaveForms AI

New York, New York, United States
email-iconphone-icongithub-logolinkedin-logotwitter-logostackoverflow-logofacebook-logo
Join Prog.AI to see contacts
email-iconphone-icongithub-logolinkedin-logotwitter-logostackoverflow-logofacebook-logo
Join Prog.AI to see contacts

Summary

🤩
Rockstar
🎓
Top School
Siddharth Dalmia is a Member of Technical Staff at WaveForms AI with eight years of experience building audio LLMs and multimodal long-context systems. He was previously a Research Scientist at Google DeepMind, where he worked on multimodal audio and long-context capabilities for Gemini. Siddharth holds a Ph.D. from Carnegie Mellon University’s Language Technologies Institute, where his research made sequence models more practical in resource-constrained settings by applying compositional principles like task simplification, reusability, transferability, and data pooling. He pairs research depth with production engineering—contributing to high-profile open-source speech tooling (notably backend and DevOps fixes for espnet) and automating tasks such as audio resampling and model logging. Based in New York, he specializes in translating advanced speech and language research into robust, deployable systems.
code8 years of coding experience
job9 years of employment as a software developer
bookBachelor’s Degree, Computer Science, Bachelor’s Degree, Computer Science at Birla Institute of Technology and Science
bookDoctor of Philosophy - PhD Language Technologies Computer Science, Doctor of Philosophy - PhD Language Technologies Computer Science at Carnegie Mellon University
github-logo-circle

Github Skills (17)

pytorch10
python10
scripting9
machine-learning9
shell9
script9
sh9
speech-recognition8
deep-learning8
text-to-speech8
speech-synthesis8
machine-translation8
dockerce4
kubernetes4
kubernetes-pod4

Programming languages (7)

CSSShellC++SCSSJavaScriptPythonCuda

Github contributions (5)

github-logo-circle
espnet/espnet

Oct 2020 - Mar 2022

End-to-End Speech Processing Toolkit
Role in this project:
userBackend & DevOps Engineer
Contributions:26 reviews, 53 commits, 30 PRs in 1 year 4 months
Contributions summary:Siddharth primarily focused on improving the code's compatibility, debugging, and expanding the functionality of the core components. They addressed issues related to PyTorch versions, run script errors, and automated the process of resampling audio files. Additionally, the user contributed to improving model parameter logging and made several modifications to the configuration options of the language model.
speech-recognitionspeech-separationchainerspoken-language-understandingspeech-processing
siddalmia/espnet

Feb 2021 - Aug 2022

End-to-End Speech Processing Toolkit
Contributions:5 PRs, 125 pushes, 24 branches in 1 year 6 months
end-to-endspeech-to-textspeech-recognitionspeech-synthesisspeech
Find and Hire Top DevelopersWe’ve analyzed the programming source code of over 60 million software developers on GitHub and scored them by 50,000 skills. Sign-up on Prog,AI to search for software developers.
Request Free Trial
Siddharth Dalmia - Member Of Technical Staff at WaveForms AI