Estimation of Speech Intelligibility Using Speech Recognition Systems

Yusuke TAKANO; Kazuhiro KONDO

首页> 外文期刊>IEICE Transactions on Information and Systems >Estimation of Speech Intelligibility Using Speech Recognition Systems

【24h】

Estimation of Speech Intelligibility Using Speech Recognition Systems

机译：使用语音识别系统估算语音清晰度

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

We attempted to estimate subjective scores of the Japanese Diagnostic Rhyme Test (DRT), a two-to-one forced selection speech intelligibility test. We used automatic speech recognizers with language models that force one of the words in the word-pair, mimicking the human recognition process of the DRT. Initial testing was done using speaker-independent models, and they showed significantly lower scores than subjective scores. The acoustic models were then adapted to each of the speakers in the corpus, and then adapted to noise at a specified SNR. Three different types of noise were tested: white noise, multi-talker (babble) noise, and pseudo-speech noise. The match between subjective and estimated scores improved significantly with noise-adapted models compared to speaker-independent models and the speaker-adapted models, when the adapted noise level and the tested level match. However, when SNR conditions do not match, the recognition scores degraded especially when tested SNR conditions were higher than the adapted noise level. Accordingly, we adapted the models to mixed levels of noise, i.e., multi-condition training. The adapted models now showed relatively high intelligibility matching subjective intelligibility performance over all levels of noise. The correlation between subjective and estimated intelligibility scores increased to 0.94 with multi-talker noise, 0.93 with white noise, and 0.89 with pseudo-speech noise, while the root mean square error (RMSE) reduced from more than 40 to 13.10,13.05 and 16.06, respectively.

机译：我们试图估计日语诊断韵测验（DRT）的主观分数，这是一对二的强制选择语音清晰度测验。我们将自动语音识别器与语言模型配合使用，该模型可在单词对中强制使用其中一个单词，从而模仿了DRT的人类识别过程。最初的测试是使用独立于说话者的模型完成的，它们的得分明显低于主观得分。然后，将声学模型适应于语料库中的每个扬声器，然后适应于指定SNR的噪声。测试了三种不同类型的噪声：白噪声，多讲话者（ba嗒声）噪声和伪语音噪声。与噪声无关的模型和说话者自适应的模型相比，当自适应噪声水平和测试水平相匹配时，与噪声无关的模型相比，主观得分与估计得分之间的匹配度显着提高。但是，当SNR条件不匹配时，尤其是当测试的SNR条件高于适应的噪声水平时，识别分数就会降低。因此，我们将模型调整为混合噪声水平，即多条件训练。现在，经过改编的模型显示出相对较高的清晰度，可在所有噪声水平上匹配主观清晰度。多说话者噪声时主观和可懂度得分之间的相关性增加到0.94，白噪声时为0.93，伪语音中为0.89，而均方根误差（RMSE）从40降低到13.10、13.05和16.06。，分别。

著录项

来源
《IEICE Transactions on Information and Systems》 |2010年第12期|p.3368-3376|共9页
作者
Yusuke TAKANO; Kazuhiro KONDO;
展开▼
作者单位

Graduate School of Science and Engineering, Yamagata University, Yonezawa-shi, 992-8510 Japan;

Graduate School of Science and Engineering, Yamagata University, Yonezawa-shi, 992-8510 Japan;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
objective estimation; speech intelligibility; speech recognition; japanese diagnostic rhyme test; noise adaptation;

机译：客观估计;语音清晰度语音识别;日本诊断韵测;噪声适应;
入库时间 2022-08-18 00:27:01

相似文献

外文文献
中文文献
专利

1. Estimation of Speech Intelligibility Using Speech Recognition Systems [J] . Yusuke TAKANO, Kazuhiro KONDO IEICE transactions on information and systems . 2010,第12期

机译：使用语音识别系统估算语音清晰度
2. Predicting Speech Recognition Using the Speech Intelligibility Index and Other Variables for Cochlear Implant Users [J] . Lee Sungmin, Mendel Lisa Lucks, Bidelman Gavin M. Journal of speech, language, and hearing research: JSLHR . 2019,第5期

机译：使用语音可智能性索引和用于脚音植入用户的其他变量来预测语音识别
3. Automatic Speech Recognition Predicts Speech Intelligibility and Comprehension for Listeners With Simulated Age-Related Hearing Loss [J] . Fontan Lionel, Ferrane Isabelle, Farinas Jerome, Journal of speech, language, and hearing research: JSLHR . 2017,第9期

机译：自动语音识别预测具有模拟年龄相关的听力损失的侦听器的语音可懂性和理解
4. Intelligibility Assessment of the De-Identified Speech Obtained Using Phoneme Recognition and Speech Synthesis Systems [C] . Tadej Justin, France Mihelic, Simon Dobrisek International conference on text, speech and dialogue . 2014

机译：使用音素识别和语音合成系统获得的去识别语音的清晰度评估
5. Objective speech intelligibility assessment using speech recognition and bigram statistics with application to low bit-rate codec evaluation [D] . Teng, Yan 2006

机译：使用语音识别和双字母组统计的客观语音清晰度评估及其在低比特率编解码器评估中的应用
6. The benefit of combining a deep neural network architecture with ideal ratio mask estimation in computational speech segregation to improve speech intelligibility [O] . Thomas Bentsen, Tobias May, Abigail A. Kressner, 2012

机译：在计算语音隔离中将深度神经网络架构与理想比率掩码估计相结合的好处，可以提高语音清晰度
7. Interactions among talker sex, masker number, and masker intelligibility in speech-on-speech recognition [O] . Mathew Thomas, John J. Galvin, Qian-Jie Fu 2021

机译：演讲者性交，掩蔽者号码和语音识别中的屏蔽者可懂度之间的互动
8. SPEECH-INTELLIGIBILITY AND TALKER-RECOGNITION TESTS OF AIR FORCE VOICE COMMUNICATION SYSTEMS [R] . Stephen E. Stunts 1963

机译：空军声音通信系统的语音智能和通话识别测试

Estimation of Speech Intelligibility Using Speech Recognition Systems

摘要

著录项

相似文献

相关主题

期刊订阅