首页> 外文会议>Spoken Language Technology Workshop >End-to-End Whispered Speech Recognition with Frequency-Weighted Approaches and Pseudo Whisper Pre-training
【24h】

End-to-End Whispered Speech Recognition with Frequency-Weighted Approaches and Pseudo Whisper Pre-training

机译:以频率加权方法和伪耳语预培训的端到端低语的语音识别

获取原文

摘要

Whispering is an important mode of human speech, but no end-to-end recognition results for it were reported yet, probably due to the scarcity of available whispered speech data. In this paper, we present several approaches for end-to-end (E2E) recognition of whispered speech considering the special characteristics of whispered speech and the scarcity of data. This includes a frequency-weighted SpecAugment policy and a frequency-divided CNN feature extractor for better capturing the high-frequency structures of whispered speech, and a layer-wise transfer learning approach to pre-train a model with normal or normal-to-whispered converted speech then fine-tune it with whispered speech to bridge the gap between whispered and normal speech. We achieve an overall relative reduction of 19.8% in PER and 44.4% in CER on a relatively small whispered TIMIT corpus. The results indicate as long as we have a good E2E model pre-trained on normal or pseudo-whispered speech, a relatively small set of whispered speech may suffice to obtain a reasonably good E2E whispered speech recognizer.
机译:窃窃私语是一种人类演讲的重要模式,但尚未报告其目前的识别结果,可能是由于可用的窃窃私语数据的稀缺。在本文中,考虑到言语言论的特殊特征和数据稀缺,我们展示了几种终点(E2E)识别的近端(E2E)识别。这包括频率加权的分类策略和频率分开的CNN特征提取器,用于更好地捕获低声语音的高频结构,以及通过正常或正常到低声预示的模型预先培训模型的层面传输学习方法转换语音然后用耳语的言语进行微调,以弥合低语与正常演讲之间的差距。在相对较小的低声蒸发毒品证中,我们在CER中实现了19.8%的总体相对减少19.8%。结果表明,只要我们在正常或伪低声语言上预培训的良好E2E模型,相对较小的低声语言可能足以获得合理的良好的E2E低语的语音识别器。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号