首页> 外文期刊>Computers & mathematics with applications >Speech confusion index (Φ): A confusion-based speech quality indicator and recognition rate prediction for dysarthria
【24h】

Speech confusion index (Φ): A confusion-based speech quality indicator and recognition rate prediction for dysarthria

机译:语音混淆指数(Φ):基于混淆的语音质量指标和构音障碍的识别率预测

获取原文
获取原文并翻译 | 示例

摘要

This paper presents an automated method to help us assess the speech quality of a dysarthric speaker, in place of laborious and subjective manual methods. The assessment result can be used as a good indicator for predicting the accuracy of speech recognition. The so-called speech confusion index (Φ) is proposed to measure the speech disorder severity of a speaker in terms of how easily his/her speech signal may be misrecognized to other unintended words. Based on signal processing without any high-level information, the dynamic-time-warping technique incorporated with adaptive slope constraint and accumulative mismatch score is used to measure a distance between any two speech signals of a same word or two different words. Compared to the articulatory and intelligibility tests, the proposed indicator was shown to have more predictability on the recognition rates obtained from the Hidden Markov Model (HMM) and Artificial Neural Networks (ANN). Based on three evaluation criteria, namely root-mean-square difference, correlation coefficient and rank-order inconsistency, the experimental results on a phoneme-balance set showed that Φ achieved better prediction than both articulatory and intelligibility tests. Another experiment on a reduced training set is made to investigate the robustness of the proposed indicator. Finally, a detailed analysis of speech confusion is done at the phoneme level.
机译:本文介绍了一种自动方法,可以代替费力和主观的手动方法来帮助我们评估发音异常的说话者的语音质量。评估结果可以作为预测语音识别准确性的良好指标。提出了所谓的语音混乱指数(Φ),以测量说话者的语音障碍严重程度,以了解他/她的语音信号可能容易被误识别为其他非预期单词的程度。基于没有任何高级信息的信号处理,结合了自适应斜率约束和累积失配得分的动态时间扭曲技术可用于测量同一单词或两个不同单词的任意两个语音信号之间的距离。与发音和清晰度测试相比,该拟议指标显示出对从隐马尔可夫模型(HMM)和人工神经网络(ANN)获得的识别率具有更高的可预测性。基于三个均方根差,相关系数和等级顺序不一致的评估标准,在音素平衡集上的实验结果表明,Φ比语音清晰度测试和可清晰度测试都具有更好的预测能力。在减少的训练集上进行了另一个实验,以研究所提出指标的鲁棒性。最后,在音素级别上对语音混乱进行了详细的分析。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号