Predicting Speech Perception in Older Listeners with Sensorineural Hearing Loss Using Automatic Speech Recognition

Lionel Fontan; Tom Cretin-Maitenaz; Christian Füllgrabe

摘要

The objective of this study was to provide proof of concept that the speech intelligibility in quiet of unaided older hearingimpaired (OHI) listeners can be predicted by automatic speech recognition (ASR). Twenty-four OHI listeners completed three speech-identification tasks using speech materials of varying linguistic complexity and predictability (i.e., logatoms, words, and sentences). An ASR system was first trained on different speech materials and then used to recognize the same speech stimuli presented to the listeners but processed to mimic some of the perceptual consequences of agerelated hearing loss experienced by each of the listeners: the elevation of hearing thresholds (by linear filtering), the loss of frequency selectivity (by spectrally smearing), and loudness recruitment (by raising the amplitude envelope to a power). Independently of the size of the lexicon used in the ASR system, strong to very strong correlations were observed between human and machine intelligibility scores. However, large root-mean-square errors (RMSEs) were observed for all conditions. The simulation of frequency selectivity loss had a negative impact on the strength of the correlation and the RMSE. Highest correlations and smallest RMSEs were found for logatoms, suggesting that the prediction system reflects mostly the functioning of the peripheral part of the auditory system. In the case of sentences, the prediction of human intelligibility was significantly improved by taking into account cognitive performance. This study demonstrates for the first time that ASR, even when trained on intact independent speech material, can be used to estimate trends in speech intelligibility of OHI listeners.

机译：本研究的目的是提供概念证明，即通过自动语音识别（ASR）可以预测典型的较旧的听觉人员（OCHI）听众的语音可懂度。二十四个OCI听众使用不同语言复杂性和可预测性的语音材料完成了三个语音识别任务（即，Logatoms，Lore，Lock和句子）。 ASR系统首先在不同的语音材料上培训，然后用于识别给听众呈现的相同语音刺激，而是用于模仿每个听众经历的剧情听力损失的一些感知后果：听力阈值的提升（通过线性滤波），频率选择性的损失（通过光谱涂抹）和响度招募（通过将幅度包络提高到电力）。独立于ASR系统中使用的词典的大小，在人工和机器可懂度分数之间观察到具有非常强烈的相关性的强度。然而，对于所有条件，观察到大的根平均方误差（RMSE）。频率选择性损失的模拟对相关性和RMSE的强度产生负面影响。找到最高的相关性和最小的RMSE，表明预测系统主要反映了听觉系统的外围部分的功能。在句子的情况下，通过考虑认知性能，对人类可懂度的预测得到了显着改善。这项研究首次展示了ASR，即使在完整的独立语音材料上训练，也可用于估算OHI听众的语音清晰度的趋势。

Predicting Speech Perception in Older Listeners with Sensorineural Hearing Loss Using Automatic Speech Recognition

摘要

著录项

相关主题

期刊订阅