首页> 外文OA文献 >Using Blind Source Separation and a Compact Microphone Array to Improve the Error Rate of Speech Recognition
【2h】

Using Blind Source Separation and a Compact Microphone Array to Improve the Error Rate of Speech Recognition

机译:利用盲源分离和紧凑型麦克风阵列提高语音识别的误码率

摘要

Automatic speech recognition has become a standard feature on many consumer electronics and automotive products, and the accuracy of the decoded speech has improved dramatically over time. Often, designers of these products achieve accuracy by employing microphone arrays and beamforming algorithms to reduce interference. However, beamforming microphone arrays are too large for small form factor products such as smart watches. Yet these small form factor products, which have precious little space for tactile user input (i.e. knobs, buttons and touch screens), would benefit immensely from a user interface based on reliably accurate automatic speech recognition.This thesis proposes a solution for interference mitigation that employs blind source separation with a compact array of commercially available unidirectional microphone elements. Such an array provides adequate spatial diversity to enable blind source separation and would easily fit in a smart watch or similar small form factor product. The solution is characterized using publicly available speech audio clips recorded for the purpose of testing automatic speech recognition algorithms. The proposal is modelled in different interference environments and the efficacy of the solution is evaluated. Factors affecting the performance of the solution are identified and their influence quantified. An expectation is presented for the quality of separation as well as the resulting improvement in word error rate that can be achieved from decoding the separated speech estimate versus the mixture obtained from a single unidirectional microphone element. Finally, directions for future work are proposed, which have the potential to improve the performance of the solution thereby making it a commercially viable product.
机译:自动语音识别已成为许多消费类电子产品和汽车产品的标准功能,并且随着时间的推移,解码语音的准确性也得到了显着提高。通常,这些产品的设计人员通过采用麦克风阵列和波束成形算法来减少干扰,从而达到了精度。但是,波束成形麦克风阵列对于诸如智能手表之类的小型产品而言太大了。然而,这些体积小巧的产品却几乎没有触觉用户输入空间(例如旋钮,按钮和触摸屏),它们将从基于可靠准确的自动语音识别的用户界面中受益匪浅。本文提出了一种减轻干扰的解决方案,该解决方案它采用盲源分离技术和紧凑的市售单向麦克风元件阵列。这样的阵列提供了足够的空间多样性,以实现盲源分离,并且很容易安装在智能手表或类似的小尺寸产品中。该解决方案的特点是使用公开录制的语音音频剪辑录制,以测试自动语音识别算法。该提案在不同的干扰环境中建模,并评估了解决方案的有效性。确定影响解决方案性能的因素,并量化其影响。提出了对分离质量以及由此产生的字错误率改善的期望,这可以通过解码分离的语音估计相对于从单个单向麦克风元件获得的混合来实现。最后,提出了未来工作的方向,这些方向可能会改善解决方案的性能,从而使其成为商业上可行的产品。

著录项

  • 作者

    Hoffman Jeffrey Dean;

  • 作者单位
  • 年度 2016
  • 总页数
  • 原文格式 PDF
  • 正文语种
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号