首页> 外文会议>Spoken Language Technology Workshop >End-to-End Whispered Speech Recognition with Frequency-Weighted Approaches and Pseudo Whisper Pre-training

【24h】

End-to-End Whispered Speech Recognition with Frequency-Weighted Approaches and Pseudo Whisper Pre-training

机译：以频率加权方法和伪耳语预培训的端到端低语的语音识别

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Whispering is an important mode of human speech, but no end-to-end recognition results for it were reported yet, probably due to the scarcity of available whispered speech data. In this paper, we present several approaches for end-to-end (E2E) recognition of whispered speech considering the special characteristics of whispered speech and the scarcity of data. This includes a frequency-weighted SpecAugment policy and a frequency-divided CNN feature extractor for better capturing the high-frequency structures of whispered speech, and a layer-wise transfer learning approach to pre-train a model with normal or normal-to-whispered converted speech then fine-tune it with whispered speech to bridge the gap between whispered and normal speech. We achieve an overall relative reduction of 19.8% in PER and 44.4% in CER on a relatively small whispered TIMIT corpus. The results indicate as long as we have a good E2E model pre-trained on normal or pseudo-whispered speech, a relatively small set of whispered speech may suffice to obtain a reasonably good E2E whispered speech recognizer.

机译：窃窃私语是一种人类演讲的重要模式，但尚未报告其目前的识别结果，可能是由于可用的窃窃私语数据的稀缺。在本文中，考虑到言语言论的特殊特征和数据稀缺，我们展示了几种终点（E2E）识别的近端（E2E）识别。这包括频率加权的分类策略和频率分开的CNN特征提取器，用于更好地捕获低声语音的高频结构，以及通过正常或正常到低声预示的模型预先培训模型的层面传输学习方法转换语音然后用耳语的言语进行微调，以弥合低语与正常演讲之间的差距。在相对较小的低声蒸发毒品证中，我们在CER中实现了19.8％的总体相对减少19.8％。结果表明，只要我们在正常或伪低声语言上预培训的良好E2E模型，相对较小的低声语言可能足以获得合理的良好的E2E低语的语音识别器。

著录项

来源
《Spoken Language Technology Workshop》|2021年|186-193|共8页
会议地点
作者
Heng-Jui Chang; Alexander H. Liu; Hung-yi Lee; Lin-shan Lee;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Conferences; Transfer learning; Speech recognition; Frequency conversion; Feature extraction; Decoding; Character recognition;

机译：会议;转移学习;语音识别;变频;特征提取;解码;字符识别;

相似文献

外文文献
中文文献
专利

1. Generative Modeling of Pseudo-Whisper for Robust Whispered Speech Recognition [J] . Shabnam Ghaffarzadegan, Hynek Bořil, John H. L. Hansen Audio, Speech, and Language Processing, IEEE/ACM Transactions on . 2016,第10期

机译：伪耳语的生成模型用于鲁棒耳语识别
2. Application of Teager Energy Operator on Linear and Mel Scales for Whispered Speech Recognition [J] . Markovic Branko R., Galic Jovan, Mijic Miomir Archives of acoustics . 2018,第1期

机译：Teager能量算子在线性和梅尔音阶上用于耳语识别的应用
3. The whisper test and speech recognition tests [J] . FinlayDick Occupational medicine . 2018,第7期

机译：耳语测试和语音识别测试
4. Generative modeling of pseudo-target domain adaptation samples for whispered speech recognition [C] . Ghaffarzadegan Shabnam, Boril Hynek, Hansen John H.L. IEEE International Conference on Acoustics, Speech and Signal Processing . 2015

机译：耳语识别的伪目标域自适应样本的生成模型
5. Analysis and modeling for robust whispered speech recognition. [D] . Ghaffarzadegan, Shabnam. 2016

机译：强大的耳语识别功能的分析和建模。
6. Intelligibility of whispered speech in stationary and modulated noise maskers [O] . Richard L. Freyman, Amanda M. Griffin, Andrew J. Oxenham -1

机译：固定和调制噪声掩蔽器中的低语语音的清晰度
7. End-to-End Whispered Speech Recognition with Frequency-Weighted Approaches and Pseudo Whisper Pre-training [O] . Heng-Jui Chang, Alexander H. Liu, Hung-yi Lee, 2021

机译：以频率加权方法和伪耳语预培训的端到端低语的语音识别
8. Segregation of Whispered Speech Interleaved with Noise or Speech Maskers [R] . Iyer, N., Brungart, D. S., Simpson, B. D. 2011

机译：与噪声或语音掩码交织的语音语音的分离

End-to-End Whispered Speech Recognition with Frequency-Weighted Approaches and Pseudo Whisper Pre-training

摘要

著录项

相似文献

相关主题

期刊订阅