首页> 外文期刊>Computer speech and language >Acoustic model adaptation using in-domain background models for dysarthric speech recognition
【24h】

Acoustic model adaptation using in-domain background models for dysarthric speech recognition

机译:使用域内背景模型的听觉异常语音识别的声学模型自适应

获取原文
获取原文并翻译 | 示例

摘要

Speech production errors characteristic of dysarthria are chiefly responsible for the low accuracy of automatic speech recognition (ASR) when used by people diagnosed with it. A person with dysarthria produces speech in a rather reduced acoustic working space, causing typical measures of speech acoustics to have values in ranges very different from those characterizing unimpaired speech. It is unlikely then that models trained on unimpaired speech will be able to adjust to this mismatch when acted on by one of the currently well-studied adaptation algorithms (which make no attempt to address this extent of mismatch in population characteristics). In this work, we propose an interpolation-based technique for obtaining a prior acoustic model from one trained on unimpaired speech, before adapting it to the dysarthric talker. The method computes a 'background' model of the dysarthric talker's general speech characteristics and uses it to obtain a more suitable prior model for adaptation (compared to the speaker-independent model trained on unimpaired speech). The approach is tested with a corpus of dysarthric speech acquired by our research group, on speech of sixteen talkers with varying levels of dysarthria severity (as quantified by their intelligibility). This interpolation technique is tested in conjunction with the well-known maximum a posteriori (MAP) adaptation algorithm, and yields improvements of up to 8% absolute and up to 40% relative, over the standard MAP adapted baseline.
机译:具有构音障碍特征的语音产生错误主要是由被诊断为自动语音识别(ASR)的人使用准确性低引起的。具有构音障碍的人在相当小的声学工作空间中产生语音,从而导致典型的语音声学测量值的范围与表征无障碍语音的范围非常不同。那么,当采用当前研究充分的自适应算法之一(未尝试解决人口特征的这种不匹配程度)进行操作时,不太可能在无损语音上训练的模型就能够适应这种不匹配。在这项工作中,我们提出了一种基于插值的技术,可以从训练有素的语音中获得一个先验的声学模型,然后再将其适应于反调性说话者。该方法计算了构音障碍说话者一般语音特征的“背景”模型,并使用它来获得更合适的先验模型以进行适应(与在无损语音上训练的独立于说话者的模型相比)。我们的研究小组获得了一组构音障碍语音对这种方法进行了测试,该构想是针对16个说话者的构音障碍严重程度有所不同(以其可懂度来量化)的。此插值技术与众所周知的最大后验(MAP)适应算法一起进行了测试,与MAP适应标准基线相比,绝对值提高了8%,相对值提高了40%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号