首页> 外文期刊>Journal of signal processing systems for signal, image, and video technology >Improvements in IITG Assamese Spoken Query System: Background Noise Suppression and Alternate Acoustic Modeling
【24h】

Improvements in IITG Assamese Spoken Query System: Background Noise Suppression and Alternate Acoustic Modeling

机译:IITG阿萨姆语口语查询系统的改进:背景噪声抑制和交替声学建模

获取原文
获取原文并翻译 | 示例

摘要

In this work, we present the recent improvements incorporated in the earlier developed Assamese spoken query (SQ) system for accessing the price of agricultural commodities. The SQ system consists of interactive voice response (IVR) and automatic speech recognition (ASR) modules developed using open source resources. The speech data used for training the ASR system has a high level of background noise since it is collected in field conditions. In the earlier version of the SQ system, this background noise had an adverse effect on the recognition performance. In the improved version, a background noise suppression module based on zero frequency filtering is added before feature extraction. In addition to this, we have also explored the recently reported subspace Gaussian mixture (SGMM) and deep neural network (DNN) based acoustic modeling approaches. These techniques have been reported to be more powerful than the GMM-HMM approach which was employed in the previous version. Further, the foreground separated speech data is used while learning the acoustic models for all systems. The amalgamation of noise removal and SGMM/DNN-based acoustic modeling is found to result in a relative improvement of 39 % in word error rate in comparison to the earlier reported GMM-HMM-based ASR system. The on-line testing of the developed SQ system (done with the help of real farmers) is also presented in this work. Some efforts are made to quantify the usability of the developed SQ system and the explored enhancements are noted to be helpful on that front too.
机译:在这项工作中,我们介绍了在早期开发的阿萨姆语口语查询(SQ)系统中并入的最新改进,以获取农产品的价格。 SQ系统由使用开源资源开发的交互式语音响应(IVR)和自动语音识别(ASR)模块组成。用于训练ASR系统的语音数据具有很高的背景噪声,因为它是在现场条件下收集的。在早期版本的SQ系统中,此背景噪声对识别性能产生不利影响。在改进版本中,在特征提取之前添加了基于零频率滤波的背景噪声抑制模块。除此之外,我们还研究了最近报道的基于高斯混合(SGMM)和深度神经网络(DNN)的声学建模方法。据报道,这些技术比以前版本中使用的GMM-HMM方法更强大。此外,在学习所有系统的声学模型的同时,使用前景分离的语音数据。与早期报道的基于GMM-HMM的ASR系统相比,噪声消除和基于SGMM / DNN的声学模型的融合可导致单词错误率相对提高39%。这项工作还介绍了开发的SQ系统的在线测试(在真正的农民的帮助下完成)。已做出一些努力来量化已开发的SQ系统的可用性,并且所探索的增强功能在该方面也有所帮助。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号