Naoyuki Kanda

AI Research Scientist at Meta

Seattle, Washington, United States
email-iconphone-icongithub-logolinkedin-logotwitter-logostackoverflow-logofacebook-logo
Join Prog.AI to see contacts
email-iconphone-icongithub-logolinkedin-logotwitter-logostackoverflow-logofacebook-logo
Join Prog.AI to see contacts

Summary

🤩
Rockstar
🎓
Top School
Naoyuki Kanda is an AI Research Scientist based in Seattle with over 15 years of R&D experience in spoken language technologies and more than a decade focused on deep-learning speech systems. He joined Meta in 2024 after leading research teams at Microsoft and holds a PhD in Informatics from Kyoto University. Naoyuki's work spans ASR, TTS, speaker diarization, speech separation, dialogue systems and spoken document retrieval, and includes top-ranked papers and editorial/review roles at IEEE Trans., ICASSP and Interspeech. His systems have won major challenges (IWSLT 2014 ASR, CHiME separation/recognition, VoxSRC-20 diarization), reflecting strong empirical results on real-world benchmarks. An active open-source contributor to espnet, he has enhanced ASR pipelines with BPE/character tokenization, LM integration and speaker-aware scoring to help move research into production.
code9 years of coding experience
job19 years of employment as a software developer
bookDoctor of Philosophy (Ph.D.) Informatics, Doctor of Philosophy (Ph.D.) Informatics at Kyoto University
languagesJapanese, English
github-logo-circle

Github Skills (7)

sh10
script10
bash10
shell10
asr10
scripting10
language-modeling8

Programming languages (6)

C++ShellCCythonPythonCuda

Github contributions (5)

github-logo-circle
espnet/espnet

Oct 2018 - Oct 2022

End-to-End Speech Processing Toolkit
Role in this project:
userBack-end Developer
Contributions:230 reviews, 1679 commits, 759 PRs in 4 years
Contributions summary:Naoyuki's commits primarily involved modifications to the asr.sh script and related utilities, focusing on enhancements to the data preparation and decoding stages. The changes introduce functionality for incorporating text cleaning and tokenization, including options for BPE and character-level tokenization, and the integration of language model paths. Additionally, they modified the generation and utilization of datasets to provide speaker id and clean speech file path to audio-related scoring processes and created the related test scripts for it. These modifications suggest a focus on enhancing the ASR pipeline.
speech-recognitionspeech-separationchainerspoken-language-understandingspeech-processing
CTC segmentation python package
Contributions:4 PRs, 30 pushes, 5 branches in 2 years
ctcpython-packagepythonsegmentation
Find and Hire Top DevelopersWe’ve analyzed the programming source code of over 60 million software developers on GitHub and scored them by 50,000 skills. Sign-up on Prog,AI to search for software developers.
Request Free Trial
Naoyuki Kanda - AI Research Scientist at Meta