...
首页> 外文期刊>IETE Journal of Research >Investigation of Automatic Speech Recognition Performance and Mean Opinion Scores for Different Standard Speech and Audio Codecs
【24h】

Investigation of Automatic Speech Recognition Performance and Mean Opinion Scores for Different Standard Speech and Audio Codecs

机译:不同标准语音和音频编解码器的自动语音识别性能和平均意见得分的调查

获取原文
获取原文并翻译 | 示例
           

摘要

Usage of Automatic Speech Recognition (ASR) systems is increasing day-by-day for voice centric applications in mobile handheld and Voice over Internet Protocol (VoIP) devices. The necessity is also increasing to find out the ASR performance under different network impediments. Among them, speech and audio coding standards is the one, which affects the ASR performance greatly, when, using them with different sampling and bit rates in the practical systems. Another common impediment which influences the ASR accuracy is the bit errors in the wireless networks and packet drop conditions in the VoIP networks. ASR performance with some of the speech coding standards under noise conditions for the wireless networks is reported in the literature. However, each study is reporting the ASR performance for few narrowband codecs with different speech databases and different ASR toolkits like RAPHEL, HTK, SPHINX, etc. In this paper, the analysis on ASR performance while using both narrowband and wideband speech and audio coding standards, which are currently accepted for GSM mobile and VoIP networks, using the common speech database-TIMIT, and using ASR toolkit-SPHINX, is presented. The Mean Opinion Score (MOS), which is the generally accepted speech quality measurement technique, is also analyzed for all the speech and audio coding standards, using the same speech database. The results of the studies carried out for the ASR word accuracies and MOS values for different narrowband and wideband speech and audio codecs under no-loss conditions are presented. Results for different rates of packet drop condition which is the common noise scenario in wired networks such as VoIP (which is also merging with wireless networks) are also presented. The observation is that though some of the codecs are showing poor MOS performance at lower bit rates, the corresponding ASR performance is comparable with other codecs at higher bit rates.
机译:对于移动手持设备和Internet语音协议(VoIP)设备中以语音为中心的应用程序,自动语音识别(ASR)系统的使用正在日益增加。寻找不同网络障碍下的ASR性能的必要性也越来越高。其中,语音和音频编码标准是一种,在实际系统中使用具有不同采样率和比特率的语音和音频编码标准时,它们会极大地影响ASR性能。影响ASR准确性的另一个常见障碍是无线网络中的误码和VoIP网络中的丢包情况。文献报道了在无线网络的噪声条件下,某些语音编码标准的ASR性能。但是,每项研究都报告了几种具有不同语音数据库和不同ASR工具包(如RAPHEL,HTK,SPHINX等)的窄带编解码器的ASR性能。本文分析了同时使用窄带和宽带语音和音频编码标准的ASR性能分析介绍了使用通用语音数据库TIMIT和使用ASR工具包SPHINX的GSM移动和VoIP网络当前接受的。使用相同的语音数据库,还分析了所有语音和音频编码标准的平均意见得分(MOS),这是公认的语音质量测量技术。给出了在无损条件下针对不同的窄带和宽带语音和音频编解码器进行的ASR字准确度和MOS值的研究结果。还介绍了不同丢包率速率的结果,这是有线网络(例如VoIP)(也正在与无线网络合并)中常见的噪声场景。观察结果是,尽管某些编解码器在较低的比特率下显示出较差的MOS性能,但相应的ASR性能却与其他在较高比特率下的编解码器相当。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号