Naoyuki Kanda - AI Research Scientist at Meta

Naoyuki Kanda

AI Research Scientist at Meta

Seattle, Washington, United States

Join Prog.AI to see contacts

Summary

🤩

Rockstar

🎓

Top School

Naoyuki Kanda is an AI Research Scientist based in Seattle with over 15 years of R&D experience in spoken language technologies and more than a decade focused on deep-learning speech systems. He joined Meta in 2024 after leading research teams at Microsoft and holds a PhD in Informatics from Kyoto University. Naoyuki's work spans ASR, TTS, speaker diarization, speech separation, dialogue systems and spoken document retrieval, and includes top-ranked papers and editorial/review roles at IEEE Trans., ICASSP and Interspeech. His systems have won major challenges (IWSLT 2014 ASR, CHiME separation/recognition, VoxSRC-20 diarization), reflecting strong empirical results on real-world benchmarks. An active open-source contributor to espnet, he has enhanced ASR pipelines with BPE/character tokenization, LM integration and speaker-aware scoring to help move research into production.

9 years of coding experience

19 years of employment as a software developer

Doctor of Philosophy (Ph.D.) Informatics, Doctor of Philosophy (Ph.D.) Informatics at Kyoto University

Japanese, English

Github Skills (7)

sh10

script10

bash10

shell10

asr10

scripting10

language-modeling8

Programming languages (6)

C++ShellCCythonPythonCuda

Github contributions (5)

espnet/espnet

Oct 2018 - Oct 2022

End-to-End Speech Processing Toolkit

Role in this project:

Back-end Developer

Contributions:230 reviews, 1679 commits, 759 PRs in 4 years

Contributions summary:Naoyuki's commits primarily involved modifications to the asr.sh script and related utilities, focusing on enhancements to the data preparation and decoding stages. The changes introduce functionality for incorporating text cleaning and tokenization, including options for BPE and character-level tokenization, and the integration of language model paths. Additionally, they modified the generation and utilization of datasets to provide speaker id and clean speech file path to audio-related scoring processes and created the related test scripts for it. These modifications suggest a focus on enhancing the ASR pipeline.

speech-recognitionspeech-separationchainerspoken-language-understandingspeech-processing

kamo-naoyuki/ctc-segmentation

Aug 2020 - Aug 2022

CTC segmentation python package

Contributions:4 PRs, 30 pushes, 5 branches in 2 years

ctcpython-packagepythonsegmentation

Find and Hire Top DevelopersWe’ve analyzed the programming source code of over 60 million software developers on GitHub and scored them by 50,000 skills. Sign-up on Prog,AI to search for software developers.

Request Free Trial