Training Keyword Spotters with Limited and Synthesized Speech Data

机译：培训具有有限和合成语音数据的关键字特色

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

With the rise of low power speech-enabled devices, there is a growing demand to quickly produce models for recognizing arbitrary sets of keywords. As with many machine learning tasks, one of the most challenging parts in the model creation process is obtaining a sufficient amount of training data. In this paper, we explore the effectiveness of synthesized speech data in training small, spoken term detection models of around 400k parameters. Instead of training such models directly on the audio or low level features such as MFCCs, we use a pre-trained speech embedding model trained to extract useful features for keyword spotting models. Using this speech embedding, we show that a model which detects 10 keywords when trained on only synthetic speech is equivalent to a model trained on over 500 real examples. We also show that a model without our speech embeddings would need to be trained on over 4000 real examples to reach the same accuracy.

机译：随着低功耗支持的设备的兴起，需求不断增长的需求，可以快速生产用于识别任意一组关键字的模型。与许多机器学习任务一样，模型创建过程中最具挑战性的部分之一是获得足够量的训练数据。在本文中，我们探讨了大约400K参数的训练中综合语音数据的有效性。而不是直接培训此类模型，例如MFCCS等音频或低级功能，而不是培训的预先训练的语音嵌入模型，以提取用于关键字点发现模型的有用功能。使用此语音嵌入，我们表明，在仅在合成语音上培训时检测到10个关键字的模型相当于在500多个真实示例上培训的模型。我们还表明，在4000多个实际示例中需要培训未经我们的语音嵌入的模型，以达到相同的准确性。

著录项

来源
《IEEE International Conference on Acoustics, Speech and Signal Processing》|2020年|p7444-8063|共5页
会议地点
作者
James Lin; Kevin Kilgour; Dominik Roblek; Matthew Sharifi;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TN912-53;
关键词
keyword spotting; spoken term detection; limited data; speech synthesis;

机译：关键词斑点;口语术语检测;有限的数据;语音合成;

相似文献

外文文献
中文文献
专利

1. HarkMan--A Vocabulary-Independent Keyword Spotter for Spontaneous Chinese Speech [J] . ZHENG Fang, XU Mingxing, MOU Xiaolong, Journal of Computer Science & Technology . 1999,第1期

机译：HarkMan-自发中文语音的独立于单词的关键词搜寻器
2. HarkMan-A Vocabulary-Independent Keyword Spotter for Spontaneous Chinese Speech [J] . ZHENG Fang, XU Mingxing, MOU Xiaolong, 计算机科学技术学报（英文版） . 1999,第001期

机译：HarkMan-独立于词汇的中文单词语音发现者
3. Complete recognition of continuous Mandarin speech for Chinese language with very large vocabulary using limited training data [J] . Hsin-Min Wang, Tai-Hsuan Ho IEEE Transactions on Speech and Audio Proceeding . 1997,第2期

机译：使用有限的训练数据就可以完全识别具有很大词汇量的连续汉语普通话语音
4. Training Keyword Spotters with Limited and Synthesized Speech Data [C] . James Lin, Kevin Kilgour, Dominik Roblek, IEEE International Conference on Acoustics, Speech and Signal Processing . 2020

机译：使用有限和综合的语音数据来训练关键字发现者
5. Improving Keywords Spotting Performance in Noise with Augmented Dataset from Vocoded Speech and Speech Denoising [D] . Li, Ruohao. 2021

机译：从声音语音和语音去噪带来的噪声中的噪声中的关键字
6. Limited Pre-Speech Auditory Modulation in Individuals Who Stutter: Data and Hypotheses [O] . Ludo Max, Ayoub Daliri -1

机译：口吃者的有限语音前听觉调节：数据和假设
7. Prototypical Metric Transfer Learning for Continuous Speech Keyword Spotting with Limited Training Data [O] . Harshita Seth, Pulkit Kumar, Muktabh Mayank Srivastava 2019

机译：具有有限培训数据的连续语音关键字的原型公制传输学习

Training Keyword Spotters with Limited and Synthesized Speech Data

摘要

著录项

相似文献

相关主题

期刊订阅