首页> 外文会议>National Conference on Communications >Enhancements in Assamese spoken query system: Enabling background noise suppression and flexible queries
【24h】

Enhancements in Assamese spoken query system: Enabling background noise suppression and flexible queries

机译:阿萨姆语口语查询系统的增强功能:启用背景噪声抑制和灵活的查询

获取原文
获取外文期刊封面目录资料

摘要

In the work presented in this paper, the recent improvements incorporated in the earlier developed Assamese spoken query (SQ) system for accessing the price of agricultural commodities are discussed. The developed SQ system consists of interactive voice response (IVR) and automatic speech recognition (ASR) modules. These are developed using open source resources. The speech data used for developing the ASR system was collected in the field conditions, thus contained significantly high level of background noise. On account of the background noise, the recognition performance of earlier version of the SQ system was severely affected. In order to deal with that, a front-end noise suppression module-based on zero frequency filtering has been added in the current version. Furthermore, we have also incorporated the subspace Gaussian mixture (SGMM) and deep neural network (DNN)-based acoustic modeling approaches. These techniques are found to be more effective than the Gaussian mixture model (GMM)-based approach which was employed in the previous version. The combination of noise removal and DNN-based acoustic modeling is found to result in a relative improvement of almost 32% in word error rate in comparison to the earlier reported GMM-HMM-based ASR system. The earlier SQ system was designed expecting the users' queries in form of isolated words only and, therefore, a high degraded recognition performance was noted whenever the queries were in the form of continuous sentences. In order to overcome that, we present a simple technique exploiting the inherent patterns in the user queries. These patterns are then incorporated in the employed language model. The modified language model is observed to result in significant improvements in the recognition performances in case of continuous queries.
机译:在本文介绍的工作中,讨论了早期开发的阿萨姆语口语查询(SQ)系统中用于获取农产品价格的最新改进。开发的SQ系统由交互式语音响应(IVR)和自动语音识别(ASR)模块组成。这些都是使用开源资源开发的。用于开发ASR系统的语音数据是在现场条件下收集的,因此包含很高水平的背景噪声。由于背景噪声,早期版本的SQ系统的识别性能受到严重影响。为了解决这个问题,当前版本中增加了基于零频率滤波的前端噪声抑制模块。此外,我们还结合了基于子空间的高斯混合(SGMM)和基于深度神经网络(DNN)的声学建模方法。发现这些技术比以前版本中使用的基于高斯混合模型(GMM)的方法更有效。与早期报道的基于GMM-HMM的ASR系统相比,噪声消除和基于DNN的声学建模相结合可导致字错误率几乎提高32%。较早的SQ系统被设计为仅期望用户以孤立词的形式进行查询,因此,只要查询以连续句子的形式出现,就会注意到较高的降级识别性能。为了克服这一点,我们提出了一种利用用户查询中固有模式的简单技术。然后将这些模式并入所采用的语言模型中。观察到修改后的语言模型可以在连续查询的情况下显着提高识别性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号