首页> 外文期刊>International journal of speech technology >Efficient Noise Robust Feature Extraction Algorithms for Distributed Speech Recognition (DSR) Systems
【24h】

Efficient Noise Robust Feature Extraction Algorithms for Distributed Speech Recognition (DSR) Systems

机译:分布式语音识别(DSR)系统的高效噪声稳健特征提取算法

获取原文
获取原文并翻译 | 示例
           

摘要

The evolution of robust speech recognition systems that maintain a high level of recognition accuracy in difficult and dynamically-varying acoustical environments is becoming increasingly important as speech recognition technology becomes a more integral part of mobile applications. In distributed speech recognition (DSR) architecture the recogniser's front-end is located in the terminal and is connected over a data network to a remote back-end recognition server. The terminal performs the feature parameter extraction, or the front-end of the speech recognition system. These features are transmitted over a data channel to the remote back-end recogniser. DSR provides particular benefits for the applications of mobile devices such as improved recognition performance compared to using the voice channel and ubiquitous access from different networks with a guaranteed level of recognition performance. A feature extraction algorithm integrated into the DSR system is required to operate in real-time as well as with the lowest possible computational costs. In this paper, two innovative front-end processing techniques for noise robust speech recognition are presented and compared, time-domain based frame-attenuation (TD-FrAtt) and frequency-domain based frame-attenuation (FD-FrAtt). These techniques include different forms of frame-attenuation, improvement of spectral subtraction based on minimum statistics, as well as a mel-cepstrum feature extraction procedure. Tests are performed using the Slovenian SpeechDat II fixed telephone database and the Aurora 2 database together with the HTK speech recognition toolkit. The results obtained are especially encouraging for mobile DSR systems with limited sizes of available memory and processing power.
机译:随着语音识别技术成为移动应用程序中不可或缺的一部分,在困难和动态变化的声学环境中保持高识别精度的强大语音识别系统的发展变得越来越重要。在分布式语音识别(DSR)架构中,识别器的前端位于终端中,并通过数据网络连接到远程后端识别服务器。终端执行特征参数提取或语音识别系统的前端。这些功能通过数据通道传输到远程后端识别器。 DSR为移动设备的应用程序提供了特殊的好处,例如,与使用语音通道相比,识别性能得到了提高,并且来自不同网络的无处不在的访问具有保证的识别性能。需要将集成到DSR系统中的特征提取算法实时运行,并以最低的计算成本进行操作。在本文中,提出并比较了两种用于噪声鲁棒语音识别的创新前端处理技术:基于时域的帧衰减(TD-FrAtt)和基于频域的帧衰减(FD-FrAtt)。这些技术包括不同形式的帧衰减,基于最小统计量的光谱减法改进以及梅尔-倒谱特征提取程序。使用斯洛文尼亚SpeechDat II固定电话数据库和Aurora 2数据库以及HTK语音识别工具包进行测试。对于可用内存和处理能力有限的移动DSR系统而言,获得的结果尤其令人鼓舞。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号