首页> 外文会议>IEEE International Conference on Multimodal Interfaces >3-D N-best search for simultaneous recognition of distant-talking speech of multiple talkers
【24h】

3-D N-best search for simultaneous recognition of distant-talking speech of multiple talkers

机译:3-D N-BEST搜索同时识别多个讲话者的遥远谈话讲话

获取原文

摘要

A microphone array is the promising solution for realizing hands-free speech recognition in real environments. Accurate talker localization is very important for speech recognition using the microphone array. However localization of a moving talker is difficult in noisy reverberant environments. The talker localization errors degrade the performance of speech recognition. To solve the problem, we proposed a new speech recognition algorithm which considers multiple talker direction hypotheses simultaneously[2]. The proposed algorithm performs Viterbi search in 3-dimensional trellis space composed of talker directions, input frames, and HMM states. In this paper we describe a new simultaneous recognition algorithm of distant-talking speech of multiple talkers using the extended 3-D N-best search algorithm. The algorithm exploits a path distance-based clustering and a likelihood normalization technique appeared to be necessary in order to build an efficient system for our purpose. We evaluated the proposed method using reverberated data, which are those simulated by the image method and recorded in a real room. The image method was used to know the accuracy-reverberation time relationship, and the real data was used to evaluate the real performance of our algorithm. The obtained Top 3 results of the Simultaneous Word Accuracy was 73.02% under 162ms reverberation time and using the image method.
机译:麦克风阵列是在真实环境中实现免提语音识别的有希望的解决方案。准确的谈话者本地化对于使用麦克风阵列的语音识别非常重要。然而,在嘈杂的混响环境中,移动讲话者的本地化很难。 Talker本地化错误降低了语音识别的性能。为了解决问题,我们提出了一种新的语音识别算法,其同时考虑多个讲话者方向假设[2]。所提出的算法在由Talker方向,输入帧和HMM状态组成的三维网格空间中执行Viterbi搜索。在本文中,我们使用扩展的3-D N-BEST搜索算法描述了一种新的同时识别多个讲话者的遥远谈话言论。该算法利用基于路径距离的聚类,并且似乎是必要的似然归一化技术,以便为我们的目的构建有效的系统。我们评估了使用混响数据的所提出的方法,这些方法是由图像方法模拟的数据,并记录在真实房间。使用图像方法用于了解精度混响时间关系,并且使用实际数据来评估我们算法的实际性能。在162ms混响时间和使用图像方法的162ms时,获得的前3个结果的同时词精度为73.02%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号