首页> 外文会议>IEEE International Conference on Acoustics, Speech and Signal Processing >Training Keyword Spotters with Limited and Synthesized Speech Data
【24h】

Training Keyword Spotters with Limited and Synthesized Speech Data

机译:使用有限和综合的语音数据来训练关键字发现者

获取原文

摘要

With the rise of low power speech-enabled devices, there is a growing demand to quickly produce models for recognizing arbitrary sets of keywords. As with many machine learning tasks, one of the most challenging parts in the model creation process is obtaining a sufficient amount of training data. In this paper, we explore the effectiveness of synthesized speech data in training small, spoken term detection models of around 400k parameters. Instead of training such models directly on the audio or low level features such as MFCCs, we use a pre-trained speech embedding model trained to extract useful features for keyword spotting models. Using this speech embedding, we show that a model which detects 10 keywords when trained on only synthetic speech is equivalent to a model trained on over 500 real examples. We also show that a model without our speech embeddings would need to be trained on over 4000 real examples to reach the same accuracy.
机译:随着低功耗支持的设备的兴起,需求不断增长的需求,可以快速生产用于识别任意一组关键字的模型。与许多机器学习任务一样,模型创建过程中最具挑战性的部分之一是获得足够量的训练数据。在本文中,我们探讨了训练中的综合语音数据的有效性,在400K参数约为400K参数的训练中的训练中。我们使用预先接受训练的语音嵌入式模型来提取关键字发现模型的预先训练的语音嵌入模型来提取训练的语音嵌入模型,而不是直接培训这些模型。使用此语音嵌入,我们表明,在仅在合成语音培训时检测到10个关键字的模型相当于在500多个真实示例上培训的模型。我们还表明,没有演讲嵌入的模型需要在4000多个实际示例中培训,以达到相同的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号