首页> 外文OA文献 >Voice Activity Detection and Garbage Modelling for a Mobile Automatic Speech Recognition Application
【2h】

Voice Activity Detection and Garbage Modelling for a Mobile Automatic Speech Recognition Application

机译:移动自动语音识别应用程序的语音活动检测和垃圾建模

摘要

Recently, state-of-the-art automatic speech recognition systems are used in various industries all over the world. Most of them are using a customized version of speech recognition system. The need for different versions arise due to different speech commands, lexicon, language and distinct work environment. It is essential for a speech recognizer to provide accurate and precise outputs in every working environment. However, the performance of a speech recognizer degrades quickly when noise intermingles with a work environment and also when out-of-vocabulary (OOV) words are spoken to the speech recognizer.This thesis consists of three different tasks which improve an automatic speech recognition application for mobile devices. The three tasks include building of a new acoustic model, improving the current voice activity detection and garbage modelling of OOV words.In this thesis, firstly, a Finnish acoustic model is trained for a company called Devoca Oy. The training data was recorded from different warehouse environments to improve the real-world speech recognition accuracy. Secondly, the Gammatone and Gabor features are extracted from the input speech frame to improve the voice activity detection (VAD). These features are applied to the VAD decision module of Pocketsphinx and a new neural-network classifier, to be classified as speech or non-speech. Lastly, a garbage model is developed for the OOV words. This model recognizes the words from outside the grammar and marks them as unknown on the application interface.This thesis evaluates the success of these three tasks with Finnish audio database and reports the overall improvement in the word error rate.
机译:最近,最先进的自动语音识别系统被用于世界各地的各个行业。他们中的大多数都使用语音识别系统的定制版本。由于不同的语音命令,词典,语言和不同的工作环境,因此需要使用不同的版本。对于语音识别器来说,在每个工作环境中提供准确而精确的输出至关重要。然而,当噪声与工作环境混杂在一起时,以及当向语音识别器说出非语音(OOV)单词时,语音识别器的性能会迅速下降。本论文包含三个不同的任务,它们改进了自动语音识别应用程序用于移动设备。这三个任务包括建立新的声学模型,改进当前的语音活动检测以及对OOV单词的垃圾建模。本文首先为一家名为Devoca Oy的公司培训了芬兰的声学模型。记录了来自不同仓库环境的培训数据,以提高实际语音识别的准确性。其次,从输入语音帧中提取Gammatone和Gabor特征,以改善语音活动检测(VAD)。这些功能应用于Pocketsphinx的VAD决策模块和新的神经网络分类器,可分为语音或非语音。最后,为OOV单词开发了垃圾模型。该模型可以识别来自语法外的单词,并在应用程序界面上将其标记为未知单词。本文利用芬兰音频数据库评估了这三个任务的成功性,并报告了单词错误率的总体改善。

著录项

  • 作者

    Ishaq Muhammad;

  • 作者单位
  • 年度 2017
  • 总页数
  • 原文格式 PDF
  • 正文语种 en
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号