Investigation of Automatic Speech Recognition Performance and Mean Opinion Scores for Different Standard Speech and Audio Codecs

A. V. Ramana; Laxminarayana Parayitam; Mythili Sharan Pala

首页> 外文期刊>IETE Journal of Research >Investigation of Automatic Speech Recognition Performance and Mean Opinion Scores for Different Standard Speech and Audio Codecs

【24h】

Investigation of Automatic Speech Recognition Performance and Mean Opinion Scores for Different Standard Speech and Audio Codecs

机译：不同标准语音和音频编解码器的自动语音识别性能和平均意见得分的调查

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Usage of Automatic Speech Recognition (ASR) systems is increasing day-by-day for voice centric applications in mobile handheld and Voice over Internet Protocol (VoIP) devices. The necessity is also increasing to find out the ASR performance under different network impediments. Among them, speech and audio coding standards is the one, which affects the ASR performance greatly, when, using them with different sampling and bit rates in the practical systems. Another common impediment which influences the ASR accuracy is the bit errors in the wireless networks and packet drop conditions in the VoIP networks. ASR performance with some of the speech coding standards under noise conditions for the wireless networks is reported in the literature. However, each study is reporting the ASR performance for few narrowband codecs with different speech databases and different ASR toolkits like RAPHEL, HTK, SPHINX, etc. In this paper, the analysis on ASR performance while using both narrowband and wideband speech and audio coding standards, which are currently accepted for GSM mobile and VoIP networks, using the common speech database-TIMIT, and using ASR toolkit-SPHINX, is presented. The Mean Opinion Score (MOS), which is the generally accepted speech quality measurement technique, is also analyzed for all the speech and audio coding standards, using the same speech database. The results of the studies carried out for the ASR word accuracies and MOS values for different narrowband and wideband speech and audio codecs under no-loss conditions are presented. Results for different rates of packet drop condition which is the common noise scenario in wired networks such as VoIP (which is also merging with wireless networks) are also presented. The observation is that though some of the codecs are showing poor MOS performance at lower bit rates, the corresponding ASR performance is comparable with other codecs at higher bit rates.

机译：对于移动手持设备和Internet语音协议（VoIP）设备中以语音为中心的应用程序，自动语音识别（ASR）系统的使用正在日益增加。寻找不同网络障碍下的ASR性能的必要性也越来越高。其中，语音和音频编码标准是一种，在实际系统中使用具有不同采样率和比特率的语音和音频编码标准时，它们会极大地影响ASR性能。影响ASR准确性的另一个常见障碍是无线网络中的误码和VoIP网络中的丢包情况。文献报道了在无线网络的噪声条件下，某些语音编码标准的ASR性能。但是，每项研究都报告了几种具有不同语音数据库和不同ASR工具包（如RAPHEL，HTK，SPHINX等）的窄带编解码器的ASR性能。本文分析了同时使用窄带和宽带语音和音频编码标准的ASR性能分析介绍了使用通用语音数据库TIMIT和使用ASR工具包SPHINX的GSM移动和VoIP网络当前接受的。使用相同的语音数据库，还分析了所有语音和音频编码标准的平均意见得分（MOS），这是公认的语音质量测量技术。给出了在无损条件下针对不同的窄带和宽带语音和音频编解码器进行的ASR字准确度和MOS值的研究结果。还介绍了不同丢包率速率的结果，这是有线网络（例如VoIP）（也正在与无线网络合并）中常见的噪声场景。观察结果是，尽管某些编解码器在较低的比特率下显示出较差的MOS性能，但相应的ASR性能却与其他在较高比特率下的编解码器相当。

著录项

来源
《IETE Journal of Research》 |2012年第2期|p.121-129|共9页
作者
A. V. Ramana; Laxminarayana Parayitam; Mythili Sharan Pala;
展开▼
作者单位

Department of ECE Research and Training Unit for Navigational Electronics, Osmania University, Hyderabad, Andhra Pradesh, India;

Department of ECE Research and Training Unit for Navigational Electronics, Osmania University, Hyderabad, Andhra Pradesh, India;

Department of ECE Research and Training Unit for Navigational Electronics, Osmania University, Hyderabad, Andhra Pradesh, India;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
automatic speech recognition performance; GSM wireless networks; mean opinion score; voice over internet protocol; word error rate;

机译：自动语音识别性能;GSM无线网络;平均意见分数;互联网协议语音;字错误率;

相似文献

外文文献
中文文献
专利

1. Performance Evaluation of Automatic Speech Recognition with Wideband Speech Codecs [J] . D. Nagajyothi, P. Siddaiah Journal of Engineering & Applied Sciences . 2017,第23期

机译：宽带语音编解码器自动语音识别性能评估
2. Speech Acquisition and Automatic Speech Recognition for Integrated Spacesuit Audio Systems, Version Ⅱ [J] . NASA Tech Briefs . 2014,第6期

机译：集成航天服音频系统的语音采集和自动语音识别，第二版
3. Speech Acquisition and Automatic Speech Recognition for Integrated Spacesuit Audio Systems [J] . John H. Glenn NASA Tech Briefs . 2010,第11期

机译：集成航天服音频系统的语音采集和自动语音识别
4. Improving Automatic Speech Recognition Utilizing Audio-codecs for Data Augmentation [C] . Nirayo Hailu, Ingo Siegert, Andreas Nürnberger International Workshop on Multimedia Signal Processing . 2020

机译：利用音频 - 编解码器进行数据增强的自动语音识别
5. Advances in Audiovisual Speech Processing for Robust Voice Activity Detection and Automatic Speech Recognition [D] . Tao, Fei. 2018

机译：用于鲁棒语音活动检测和自动语音识别的视听语音处理方面的进展
6. The relationship between perceptual disturbances in dysarthric speech and automatic speech recognition performance [O] . Ming Tu, Alan Wisler, Visar Berisha, -1

机译：构音障碍性听觉障碍与自动语音识别性能的关系
7. The effects of speakers' gender, age, and region on overall performance of Arabic automatic speech recognition systems using the phonetically rich and balanced Modern Standard Arabic speech corpus [O] . Sawalha M, Abu Shariah M 2013

机译：发言者的性别，年龄和地区对使用语音丰富和平衡的现代标准阿拉伯语言语料库的阿拉伯语自动语音识别系统整体表现的影响

Investigation of Automatic Speech Recognition Performance and Mean Opinion Scores for Different Standard Speech and Audio Codecs

摘要

著录项

相似文献

相关主题

期刊订阅