...
首页> 外文期刊>Journal of VLSI signal processing >Simultaneous Recognition of Distant-Talking Speech of Multiple Talkers Based on the 3-D N-Best Search Method
【24h】

Simultaneous Recognition of Distant-Talking Speech of Multiple Talkers Based on the 3-D N-Best Search Method

机译:基于3-D N-最佳搜索方法的多个通话者远距离语音的同时识别

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

This paper describes a novel method for hands-free speech recognition and in particular for simultaneous recognition of distant-talking speech of multiple sound sources (talkers or noise sources). Our method is based on the 3-D Viterbi search extended to a 3-D N-best search method to allow simultaneous speech recognition of multiple talkers. The baseline system integrates two existing technologies-3-D Viterbi search and conventional N-best search-into a complete system. However, initial evaluation of the 3-D N-best search-based system showed that new ideas were needed in order to build a system to simultaneously recognize multiple sound sources. Two factors were found to have an important role in system performance. Those two factors are the different likelihood ranges of the talkers and the direction-based separation of the hypotheses. More specifically, since we have to compare hypotheses originating from different talkers, an accurate comparison of these hypotheses cannot be made due to the different likelihood dynamic range of the talkers. Moreover, the hypotheses originated from talkers are located in different directions and therefore separating them based on their direction provides an efficient method for accurate recognition. To solve these problems, we implemented a likelihood normalization technique and a path distance-based clustering technique into the baseline 3-D N-best search-based system. The performance of our system was evaluated by experiments for recognizing the distant-talking speech of two talkers. The experiments were carried out on simulated (with only time delay) data and on reverberated (simulated and real) data. In this paper, we evaluated the proposed method in reverberant environments, and we introduced results obtained by experiments at several reverberation times and results obtained in a real environment. The experiments showed that implementing the two techniques described above produced significant improvements. Best results for simulated data were obtained by implementing the two techniques and using a microphone array composed of 32 channels. In that case in particular, the Simultaneous Word Accuracy (where both talkers are correctly recognized simultaneously) in the 'top 1' hypothesis was 72.49%, and in the 'top 3' hypotheses was 86.25%, which were very promising results.
机译:本文介绍了一种用于免提语音识别的新颖方法,尤其是用于同时识别多个声源(通话者或噪声源)的远距离讲话语音的方法。我们的方法基于3-D维特比搜索,扩展到3-D N-最佳搜索方法,可以同时识别多个讲话者的语音。基准系统将两个现有技术3D维特比搜索和常规的N最佳搜索集成到一个完整的系统中。但是,对基于3D N最佳搜索的系统的初步评估表明,需要新的想法才能构建可同时识别多个声源的系统。发现有两个因素在系统性能中具有重要作用。这两个因素是讲话者的不同可能性范围和假设的基于方向的分离。更具体地说,由于我们必须比较源自不同讲话者的假设,因此由于讲话者的似然动态范围不同,因此无法对这些假设进行准确比较。此外,源自讲话者的假设位于不同的方向,因此根据其方向将其分离提供了一种有效的方法来进行准确识别。为了解决这些问题,我们在基线3-D N最佳搜索系统中实现了似然归一化技术和基于路径距离的聚类技术。我们通过识别两个讲话者的远距离讲话的实验对我们系统的性能进行了评估。实验是在模拟(只有时间延迟)数据和混响(模拟和真实)数据上进行的。在本文中,我们评估了在混响环境中提出的方法,并介绍了在几次混响时间下通过实验获得的结果以及在真实环境中获得的结果。实验表明,实施上述两种技术产生了重大改进。通过实施两种技术并使用由32个通道组成的麦克风阵列,可以获得最佳的模拟数据结果。特别是在这种情况下,“前1个”假设中的同时单词准确度(同时正确识别两个讲话者)为72.49%,而在“前3个”假设中为86.25%,这是非常有希望的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号