...
首页> 外文期刊>Journal of Enterprise Information Management >Perceptual non-intrusive speech quality assessment using a self-organizing map
【24h】

Perceptual non-intrusive speech quality assessment using a self-organizing map

机译:使用自组织映射的感知性非侵入性语音质量评估

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Purpose - This paper seeks to propose a new non-intrusive method for the assessment of speech quality of voice communication systems and evaluate its performance. Design/methodology/approach - The method is based on measuring perception-based objective auditory distances between the voiced parts of the output speech to appropriately matching references extracted from a pre-formulated codebook. The codebook is formed by optimally clustering a large number of parametric speech vectors extracted from a database of clean speech records. The auditory distances are then mapped into equivalent subjective mean opinion scores (MOSs). The required clustering and matching processes are achieved by an efficient data-mining tool known as the self-organizing map (SOM). The proposed method was examined using a wide range of distortion including speech compression, wireless channel impairments, VoIP channel impairments, and modifications to the signal from features such as AGC. Findings - The experimental results reported indicate that the proposed method provides a high level of accuracy in predicting the actual subjective quality of the speech. Specifically, the second version of the method, which is based on the use of bark spectrum (BS) analysis, is more accurate in predicting the MOS scores compared with its first and third versions (which are based on BS analysis and mel frequency cepstrum coefficients (MFCC), respectively), and outperforms the ITU-T PESQ in a large number of test cases, particularly those related to distortion caused by channel impairments and signal level modifications. Research limitations/implications - It is believed that the prototype developed of the proposed objective speech quality measure is sufficiently accurate and robust against speaker, utterance and distortion type variations. Nevertheless, there are still possible directions for further improvements and enhancement. In general there are three areas that could be pursued for further improvements: widening the coverage of speaker variations of the system's codebook; formulating and using a perceptual speech model that provides true speaker-independent representation of speech; and implementing the proposed measure as a stand-alone system, preferably for real-time applications. Practical implications - Being an output-based method, the proposed method can be employed for monitoring and assessing telecommunications networks under both live traffic conditions and off-line evaluation. Originality/value - The main contribution of this paper is the introduction of a new output-based, non-intrusive method for the assessment of speech quality that is sufficiently accurate and robust. To the best of the author's knowledge, no reliable output-based objective speech quality assessment method has to date been reported or formally recognised.
机译:目的-本文旨在提出一种新的非侵入式方法,用于评估语音通信系统的语音质量并评估其性能。设计/方法/方法-该方法基于测量输出语音的有声部分与从预先编写的代码本中提取的适当匹配参考之间的基于感知的客观听觉距离。通过最佳地聚类从干净语音记录数据库中提取的大量参量语音向量来形成码本。然后将听觉距离映射到等效的主观平均意见分数(MOS)中。所需的聚类和匹配过程是通过称为自组织图(SOM)的高效数据挖掘工具实现的。使用包括语音压缩,无线信道损害,VoIP信道损害以及对诸如AGC之类的信号的修改在内的各种失真对所提出的方法进行了检查。发现-报告的实验结果表明,该方法在预测语音的实际主观质量方面具有很高的准确性。具体来说,该方法的第二种版本(基于树皮频谱(BS)分析)比其第一和第三种版本(基于BS分析和梅尔频率倒谱系数)更准确地预测MOS得分(分别为MFCC)(MFCC),并且在许多测试案例中,其表现都优于ITU-T PESQ,尤其是那些与信道损伤和信号电平修改引起的失真有关的测试案例。研究的局限性/意义-可以相信,所提出的客观语音质量测度的原型足以准确,鲁棒地抵抗说话者,发声和失真类型的变化。尽管如此,仍然存在进一步改进和增强的可能方向。总的来说,可以在三个方面进行进一步的改进:扩大系统代码本的说话人变化范围;制定和使用感知语音模型,以提供独立于说话者的真实语音表示;并将建议的措施实现为独立系统,最好用于实时应用。实际意义-作为一种基于输出的方法,提出的方法可用于在实时交通状况和离线评估下监视和评估电信网络。原创性/价值-本文的主要贡献是引入了一种新的基于输出的,非侵入性的方法来评估语音质量,该方法足够准确且可靠。据作者所知,迄今为止尚未报告或正式认可基于可靠的基于输出的客观语音质量评估方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号