Simultaneous Recognition of Distant-Talking Speech of Multiple Talkers Based on the 3-D N-Best Search Method

PANIKOS HERACLEOUS; SATOSHI NAKAMURA; KIYOHIRO SHIKANO

首页> 外文期刊>Journal of VLSI signal processing >Simultaneous Recognition of Distant-Talking Speech of Multiple Talkers Based on the 3-D N-Best Search Method

【24h】

Simultaneous Recognition of Distant-Talking Speech of Multiple Talkers Based on the 3-D N-Best Search Method

机译：基于3-D N-最佳搜索方法的多个通话者远距离语音的同时识别

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

This paper describes a novel method for hands-free speech recognition and in particular for simultaneous recognition of distant-talking speech of multiple sound sources (talkers or noise sources). Our method is based on the 3-D Viterbi search extended to a 3-D N-best search method to allow simultaneous speech recognition of multiple talkers. The baseline system integrates two existing technologies-3-D Viterbi search and conventional N-best search-into a complete system. However, initial evaluation of the 3-D N-best search-based system showed that new ideas were needed in order to build a system to simultaneously recognize multiple sound sources. Two factors were found to have an important role in system performance. Those two factors are the different likelihood ranges of the talkers and the direction-based separation of the hypotheses. More specifically, since we have to compare hypotheses originating from different talkers, an accurate comparison of these hypotheses cannot be made due to the different likelihood dynamic range of the talkers. Moreover, the hypotheses originated from talkers are located in different directions and therefore separating them based on their direction provides an efficient method for accurate recognition. To solve these problems, we implemented a likelihood normalization technique and a path distance-based clustering technique into the baseline 3-D N-best search-based system. The performance of our system was evaluated by experiments for recognizing the distant-talking speech of two talkers. The experiments were carried out on simulated (with only time delay) data and on reverberated (simulated and real) data. In this paper, we evaluated the proposed method in reverberant environments, and we introduced results obtained by experiments at several reverberation times and results obtained in a real environment. The experiments showed that implementing the two techniques described above produced significant improvements. Best results for simulated data were obtained by implementing the two techniques and using a microphone array composed of 32 channels. In that case in particular, the Simultaneous Word Accuracy (where both talkers are correctly recognized simultaneously) in the 'top 1' hypothesis was 72.49%, and in the 'top 3' hypotheses was 86.25%, which were very promising results.

机译：本文介绍了一种用于免提语音识别的新颖方法，尤其是用于同时识别多个声源（通话者或噪声源）的远距离讲话语音的方法。我们的方法基于3-D维特比搜索，扩展到3-D N-最佳搜索方法，可以同时识别多个讲话者的语音。基准系统将两个现有技术3D维特比搜索和常规的N最佳搜索集成到一个完整的系统中。但是，对基于3D N最佳搜索的系统的初步评估表明，需要新的想法才能构建可同时识别多个声源的系统。发现有两个因素在系统性能中具有重要作用。这两个因素是讲话者的不同可能性范围和假设的基于方向的分离。更具体地说，由于我们必须比较源自不同讲话者的假设，因此由于讲话者的似然动态范围不同，因此无法对这些假设进行准确比较。此外，源自讲话者的假设位于不同的方向，因此根据其方向将其分离提供了一种有效的方法来进行准确识别。为了解决这些问题，我们在基线3-D N最佳搜索系统中实现了似然归一化技术和基于路径距离的聚类技术。我们通过识别两个讲话者的远距离讲话的实验对我们系统的性能进行了评估。实验是在模拟（只有时间延迟）数据和混响（模拟和真实）数据上进行的。在本文中，我们评估了在混响环境中提出的方法，并介绍了在几次混响时间下通过实验获得的结果以及在真实环境中获得的结果。实验表明，实施上述两种技术产生了重大改进。通过实施两种技术并使用由32个通道组成的麦克风阵列，可以获得最佳的模拟数据结果。特别是在这种情况下，“前1个”假设中的同时单词准确度（同时正确识别两个讲话者）为72.49％，而在“前3个”假设中为86.25％，这是非常有希望的结果。

著录项

来源
《Journal of VLSI signal processing》 |2004年第3期|p.105-116|共12页
作者
PANIKOS HERACLEOUS; SATOSHI NAKAMURA; KIYOHIRO SHIKANO;
展开▼
作者单位

ATR Spoken Language Translation Research Labs, 2-2-2 Hikaridai Seika-Cho Soraku-gun, Kyoto 619-0288, Japan;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类大规模集成电路、超大规模集成电路;
关键词
speech recognition; distant-talking speech; multiple sound sources; microphone array;

机译：语音识别;远距离讲话;多种声源;麦克风阵列;

相似文献

外文文献
中文文献
专利

1. Distant-talking speech recognition based on a 3-D Viterbi searchusing a microphone array [J] . Yamada T., Nakamura S., Shikano K. IEEE Transactions on Speech and Audio Proceessing . 2002,第2期

机译：使用麦克风阵列基于3-D维特比搜索的远距离语音识别
2. Hands-free Speech Recognition Based on 3-D Viterbi Search Using Adaptive Beamforming [J] . TAKESHI YAMADA, SATOSHI NAKAMURA, KIYOHIRO SHIKANO 情報処理学会論文誌 . 1999,第2期

机译：基于3-D维特比搜索的自适应波束形成免提语音识别
3. Noisy Speech Recognition Based on Integration/Selection of Multiple Noise Suppression Methods Using Noise GMMs [J] . Norihide KITAOKA, Souta HAMAGUCHI, Seiichi NAKAGAWA IEICE Transactions on Information and Systems . 2008,第3期

机译：基于使用噪声GMM的多种噪声抑制方法的集成/选择的噪声语音识别
4. 3-D N-best search for simultaneous recognition of distant-talking speech of multiple talkers [C] . Satoshi Nakamura, Panikos Heracleous IEEE International Conference on Multimodal Interfaces . 2002

机译：3-D N-BEST搜索同时识别多个讲话者的遥远谈话讲话
5. Explicit N-best formant features for segment-based speech recognition. [D] . Schmid, Philipp Heinz. 1996

机译：基于段的语音识别的显式N最佳共振峰特征。
6. Comparison of bimodal and bilateral cochlear implant users on speech recognition with competing talker music perception affective prosody discrimination and talker identification [O] . Helen E Cullington, Fan-Gang Zeng -1

机译：双峰和双边耳蜗植入用户在演讲识别与竞争对手音乐感知情感韵律歧视与谈话者识别的比较
7. Simultaneous Recognition of Multiple Sound Sources based on 3-D N-best Search Using a Microphone Array [O] . Panikos Heracleous, Takeshi Yamada, Satoshi Nakamura, 1999

机译：基于麦克风阵列的3-D N最佳搜索的多个声源同时识别
8. Improving State-of-the-Art Continuous Speech Recognition System Using the N-Best Paradigm with Neural Networks. [R] . Austin, S., Zavaliagkost, G., Makhoul, J., 1992

机译：利用神经网络的N-Best范式改进最先进的连续语音识别系统。

Simultaneous Recognition of Distant-Talking Speech of Multiple Talkers Based on the 3-D N-Best Search Method

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅