Distant-talking speech recognition based on a 3-D Viterbi searchusing a microphone array

Yamada T.; Nakamura S.; Shikano K.

首页> 外文期刊>IEEE Transactions on Speech and Audio Proceessing >Distant-talking speech recognition based on a 3-D Viterbi searchusing a microphone array

【24h】

Distant-talking speech recognition based on a 3-D Viterbi searchusing a microphone array

机译：使用麦克风阵列基于3-D维特比搜索的远距离语音识别

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

This paper focuses on microphone arrays to realize distant-talking speech recognition in real environments. In distant-talking situations, users can speak at arbitrary positions while moving. Therefore, it,is very important for high quality speech acquisition using microphone arrays to localize a talker accurately. However, it is very difficult to localize a moving talker in noisy and reverberant environments. The talker localization errors result in performance degradation of speech recognition. One way to solve this problem is to integrate the speech recognition process and the talker localization into a unified framework. This paper proposes a new speech recognition algorithm based on a three-dimensional (3-D) Viterbi search. The 3-D Viterbi method extracts a direction-time sequence of parameter vectors by steering a beam to every direction in every frame, then finds the most likely path in a 3-D trellis space composed of talker directions, input frames and HMM states. This means that speech recognition and talker localization are performed simultaneously within a statistical framework. To evaluate the performance of the 3-D Viterbi method, recognition experiments for real environment data were carried out. The results confirmed that the 3-D Viterbi method drastically improves the recognition performance for the moving talker case as well as for the fixed-position talker case

机译：本文重点讨论了麦克风阵列，以在真实环境中实现远距离语音识别。在远距离交谈的情况下，用户可以在移动时在任意位置讲话。因此，这对于使用麦克风阵列准确定位讲话者的高质量语音获取非常重要。但是，很难在嘈杂和混响的环境中定位移动的讲话者。讲话者定位错误会导致语音识别性能下降。解决此问题的一种方法是将语音识别过程和讲话者本地化集成到一个统一的框架中。本文提出了一种新的基于三维（3-D）维特比搜索的语音识别算法。 3-D维特比方法是通过将光束转向每一帧中的每个方向来提取参数向量的方向-时间序列，然后在3-D格状空间中找到最可能的路径，该空间由发话者方向，输入帧和HMM状态组成。这意味着语音识别和讲话者本地化是在统计框架内同时执行的。为了评估3-D维特比方法的性能，对真实环境数据进行了识别实验。结果证实，3-D维特比方法极大地提高了对移动讲话者壳体和固定位置讲话者壳体的识别性能。

著录项

来源
《IEEE Transactions on Speech and Audio Proceessing》 |2002年第2期|p.48-56|共9页
作者
Yamada T.; Nakamura S.; Shikano K.;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类电声技术和语音信号处理;
关键词
acoustic transducer arrays; direction-of-arrival estimation; hidden Markov models; maximum likelihood estimation; microphones; search problems; speech recognition; 3-D Viterbi search; 3-D trellis space; HMM states; direction-time sequence; distant-talking speech re;

机译：声换能器阵列;到达方向估计;隐马尔可夫模型;最大似然估计;麦克风;搜索问题;语音识别;3-D维特比搜索;3-D格状空间;HMM状态;方向-时间序列;远距离通话演讲稿;

相似文献

外文文献
中文文献
专利

1. Simultaneous Recognition of Distant-Talking Speech of Multiple Talkers Based on the 3-D N-Best Search Method [J] . PANIKOS HERACLEOUS, SATOSHI NAKAMURA, KIYOHIRO SHIKANO Journal of VLSI signal processing . 2004,第2a3期

机译：基于3-D N-最佳搜索方法的多个通话者远距离语音的同时识别
2. Hands-free Speech Recognition Based on 3-D Viterbi Search Using Adaptive Beamforming [J] . TAKESHI YAMADA, SATOSHI NAKAMURA, KIYOHIRO SHIKANO 情報処理学会論文誌 . 1999,第2期

机译：基于3-D维特比搜索的自适应波束形成免提语音识别
3. Closely Coupled Array Processing and Model-Based Compensation for Microphone Array Speech Recognition [J] . Xianyu Zhao, Zhijian Ou IEEE transactions on audio, speech and language processing . 2007,第3期

机译：麦克风阵列语音识别的紧密耦合阵列处理和基于模型的补偿
4. Hands-free speech recognition based on 3-D Viterbi search using a microphone array [C] . Yamada, T., Nakamura, . 1998

机译：使用麦克风阵列基于3-D维特比搜索的免提语音识别
5. Robust speech processing based on microphone array, audio-visual, and frame selection for in-vehicle speech recognition and in-set speaker recognition. [D] . Zhang, Xianxian. 2005

机译：基于麦克风阵列，视听和帧选择的强大语音处理功能，可实现车载语音识别和内置说话人识别。
6. A Real-Time Speech Separation Method Based on Camera and Microphone Array Sensors Fusion Approach [O] . Ching-Feng Liu, Wei-Siang Ciou, Peng-Ting Chen, 2020

机译：基于摄像头和麦克风阵列传感器融合方法的实时语音分离方法
7. Recognition of Distant-Talking Speech based on 3-D Trellis Search using a Microphone Array and Adaptive Beamforming [O] . Satoshi Nakamura, Takeshi Yamada, Kiyohiro Shikano 1999

机译：基于麦克风阵列和自适应波束形成的3-D网格搜索的远距离语音识别

Distant-talking speech recognition based on a 3-D Viterbi searchusing a microphone array

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅