3-D N-best search for simultaneous recognition of distant-talking speech of multiple talkers

机译：3-D N-BEST搜索同时识别多个讲话者的遥远谈话讲话

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

A microphone array is the promising solution for realizing hands-free speech recognition in real environments. Accurate talker localization is very important for speech recognition using the microphone array. However localization of a moving talker is difficult in noisy reverberant environments. The talker localization errors degrade the performance of speech recognition. To solve the problem, we proposed a new speech recognition algorithm which considers multiple talker direction hypotheses simultaneously[2]. The proposed algorithm performs Viterbi search in 3-dimensional trellis space composed of talker directions, input frames, and HMM states. In this paper we describe a new simultaneous recognition algorithm of distant-talking speech of multiple talkers using the extended 3-D N-best search algorithm. The algorithm exploits a path distance-based clustering and a likelihood normalization technique appeared to be necessary in order to build an efficient system for our purpose. We evaluated the proposed method using reverberated data, which are those simulated by the image method and recorded in a real room. The image method was used to know the accuracy-reverberation time relationship, and the real data was used to evaluate the real performance of our algorithm. The obtained Top 3 results of the Simultaneous Word Accuracy was 73.02% under 162ms reverberation time and using the image method.

机译：麦克风阵列是在真实环境中实现免提语音识别的有希望的解决方案。准确的谈话者本地化对于使用麦克风阵列的语音识别非常重要。然而，在嘈杂的混响环境中，移动讲话者的本地化很难。 Talker本地化错误降低了语音识别的性能。为了解决问题，我们提出了一种新的语音识别算法，其同时考虑多个讲话者方向假设[2]。所提出的算法在由Talker方向，输入帧和HMM状态组成的三维网格空间中执行Viterbi搜索。在本文中，我们使用扩展的3-D N-BEST搜索算法描述了一种新的同时识别多个讲话者的遥远谈话言论。该算法利用基于路径距离的聚类，并且似乎是必要的似然归一化技术，以便为我们的目的构建有效的系统。我们评估了使用混响数据的所提出的方法，这些方法是由图像方法模拟的数据，并记录在真实房间。使用图像方法用于了解精度混响时间关系，并且使用实际数据来评估我们算法的实际性能。在162ms混响时间和使用图像方法的162ms时，获得的前3个结果的同时词精度为73.02％。

著录项

来源
《IEEE International Conference on Multimodal Interfaces》|2002年||共5页
会议地点
作者
Satoshi Nakamura; Panikos Heracleous;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TN911-53;
关键词

相似文献

外文文献
中文文献
专利

1. Simultaneous Recognition of Distant-Talking Speech of Multiple Talkers Based on the 3-D N-Best Search Method [J] . PANIKOS HERACLEOUS, SATOSHI NAKAMURA, KIYOHIRO SHIKANO Journal of VLSI signal processing . 2004,第2a3期

机译：基于3-D N-最佳搜索方法的多个通话者远距离语音的同时识别
2. Large vocabulary continuous speech recognition using N-best linear lexicon search and tree lexicon search with 1-best approximation [J] . Norihide Kitaoka, Nobutoshi Takahashi, Seiichi Nakagawa 電子情報通信学会技術研究報告. 音声. Speech . 2003,第94期

机译：使用N最佳线性词典搜索和具有1最佳近似的树词典搜索的大词汇量连续语音识别
3. Large vocabulary continuous speech recognition using N-best linear lexicon search and tree lexicon search with 1-best approximation [J] . Norihide Kitaoka, Nobutoshi Takahashi, Seiichi Nakagawa 電子情報通信学会技術研究報告. 音声. Speech . 2003,第94期

机译：使用N-Best Linear Lexicon搜索和Tree Lexicon搜索的大词汇连续语音识别，使用1-最近似
4. 3-D N-best search for simultaneous recognition of distant-talking speech of multiple talkers [C] . Satoshi Nakamura, Panikos Heracleous IEEE International Conference on Multimodal Interfaces . 2002

机译：3-D N-BEST搜索同时识别多个讲话者的遥远谈话讲话
5. Explicit N-best formant features for segment-based speech recognition. [D] . Schmid, Philipp Heinz. 1996

机译：基于段的语音识别的显式N最佳共振峰特征。
6. Comparison of bimodal and bilateral cochlear implant users on speech recognition with competing talker music perception affective prosody discrimination and talker identification [O] . Helen E Cullington, Fan-Gang Zeng -1

机译：双峰和双边耳蜗植入用户在演讲识别与竞争对手音乐感知情感韵律歧视与谈话者识别的比较
7. Simultaneous Recognition of Multiple Sound Sources based on 3-D N-best Search Using a Microphone Array [O] . Panikos Heracleous, Takeshi Yamada, Satoshi Nakamura, 1999

机译：基于麦克风阵列的3-D N最佳搜索的多个声源同时识别
8. Improving State-of-the-Art Continuous Speech Recognition System Using the N-Best Paradigm with Neural Networks. [R] . Austin, S., Zavaliagkost, G., Makhoul, J., 1992

机译：利用神经网络的N-Best范式改进最先进的连续语音识别系统。

3-D N-best search for simultaneous recognition of distant-talking speech of multiple talkers

摘要

著录项

相似文献

相关主题

期刊订阅