首页> 外文会议>IEEE International Conference on Acoustics, Speech and Signal Processing >Training Keyword Spotters with Limited and Synthesized Speech Data
【24h】

Training Keyword Spotters with Limited and Synthesized Speech Data

机译:培训具有有限和合成语音数据的关键字特色

获取原文

摘要

With the rise of low power speech-enabled devices, there is a growing demand to quickly produce models for recognizing arbitrary sets of keywords. As with many machine learning tasks, one of the most challenging parts in the model creation process is obtaining a sufficient amount of training data. In this paper, we explore the effectiveness of synthesized speech data in training small, spoken term detection models of around 400k parameters. Instead of training such models directly on the audio or low level features such as MFCCs, we use a pre-trained speech embedding model trained to extract useful features for keyword spotting models. Using this speech embedding, we show that a model which detects 10 keywords when trained on only synthetic speech is equivalent to a model trained on over 500 real examples. We also show that a model without our speech embeddings would need to be trained on over 4000 real examples to reach the same accuracy.
机译:随着低功耗支持的设备的兴起,需求不断增长的需求,可以快速生产用于识别任意一组关键字的模型。 与许多机器学习任务一样,模型创建过程中最具挑战性的部分之一是获得足够量的训练数据。 在本文中,我们探讨了大约400K参数的训练中综合语音数据的有效性。 而不是直接培训此类模型,例如MFCCS等音频或低级功能,而不是培训的预先训练的语音嵌入模型,以提取用于关键字点发现模型的有用功能。 使用此语音嵌入,我们表明,在仅在合成语音上培训时检测到10个关键字的模型相当于在500多个真实示例上培训的模型。 我们还表明,在4000多个实际示例中需要培训未经我们的语音嵌入的模型,以达到相同的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号