首页> 外文期刊>Signal, Image and Video Processing >New single-ended objective measure for non-intrusive speech quality evaluation
【24h】

New single-ended objective measure for non-intrusive speech quality evaluation

机译:用于非侵入式语音质量评估的新的单端客观度量

获取原文
获取原文并翻译 | 示例

摘要

This article proposes a new output-based method for non-intrusive assessment of speech quality of voice communication systems and evaluates its performance. The method requires access to the processed (degraded) speech only, and is based on measuring perception-motivated objective auditory distances between the voiced parts of the output speech to appropriately matching references extracted from a pre-formulated codebook. The codebook is formed by optimally clustering a large number of parametric speech vectors extracted from a database of clean speech records. The auditory distances are then mapped into objective Mean Opinion listening quality scores. An efficient data-mining tool known as the self-organizing map (SOM) achieves the required clustering and mapping/reference matching processes. In order to obtain a perception-based, speaker-independent parametric representation of the speech, three domain transformation techniques have been investigated. The first technique is based on a perceptual linear prediction (PLP) model, the second utilises a bark spectrum (BS) analysis and the third utilises mel-frequency cepstrum coefficients (MFCC). Reported evaluation results show that the proposed method provides high correlation with subjective listening quality scores, yielding accuracy similar to that of the ITU-T P.563 while maintaining a relatively low computational complexity. Results also demonstrate that the method outperforms the PESQ in a number of distortion conditions, such as those of speech degraded by channel impairments.
机译:本文提出了一种新的基于输出的语音通信系统语音质量的非侵入式评估方法,并对其性能进行了评估。该方法仅需要访问已处理(降级)的语音,并且该方法基于测量输出语音的有声部分之间的感知动机客观听觉距离与从预先编写的代码本提取的适当匹配参考。通过最佳地聚类从干净语音记录数据库中提取的大量参量语音向量来形成码本。然后将听觉距离映射到客观的平均意见听音质量得分中。一种称为自组织映射(SOM)的有效数据挖掘工具可实现所需的聚类和映射/引用匹配过程。为了获得语音的基于感知的,与说话者无关的参数表示,已经研究了三种域变换技术。第一种技术基于感知线性预测(PLP)模型,第二种技术利用树皮频谱(BS)分析,第三种技术利用梅尔频率倒谱系数(MFCC)。报告的评估结果表明,所提出的方法与主观收听质量得分具有很高的相关性,在保持相对较低的计算复杂度的同时,产生的精度与ITU-T P.563相似。结果还证明,该方法在许多失真条件下(例如,由于信道受损而导致语音退化的情况)均优于PESQ。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号