首页> 外文会议> >Multi-modal temporal asynchronicity modeling by product HMMs for robust audio-visual speech recognition

【24h】

Multi-modal temporal asynchronicity modeling by product HMMs for robust audio-visual speech recognition

机译：产品HMM进行的多模式时间异步建模，用于强大的视听语音识别

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The demand for audio-visual speech recognition (AVSR) has increased in order to make speech recognition systems robust to acoustic noise. There are two kinds of research issue in audio-visual speech recognition, such as integration modeling considering asynchronicity between modalities and adaptive information weighting according information reliability. This paper proposes a method to effectively integrate audio and visual information. Such integration, inevitably, necessitates modeling the synchronization and asynchronization of audio and visual information. To address the time lag and correlation problems in individual features between speech and lip movements, we introduce a type of integrated HMM modeling of audio-visual information based on a family of a product HMM. The proposed model can represent state synchronicity not only within a phoneme, but also between phonemes. Furthermore, we also propose a rapid stream weight optimization based on the GPD algorithm for noisy, bimodal speech recognition. Evaluation experiments show that the proposed method improves the recognition accuracy for noisy speech. When SNR=0 dB our proposed method attained 16% higher performance compared to a product HMM without synchronicity re-estimation.

机译：为了使语音识别系统对声学噪声具有鲁棒性，对视听语音识别（AVSR）的需求已经增加。视听语音识别中有两种研究问题，例如考虑模态之间异步性的集成建模和根据信息可靠性进行自适应信息加权的问题。本文提出了一种有效整合视听信息的方法。这种集成不可避免地需要对音频和视频信息的同步和异步建模。为了解决语音和嘴唇运动之间各个特征的时间滞后和相关性问题，我们基于产品HMM系列介绍了一种集成的HMM视听信息建模。所提出的模型不仅可以表示音素内部的状态同步，而且还可以表示音素之间的状态同步性。此外，我们还提出了一种基于GPD算法的快速流权重优化方法，用于嘈杂的双峰语音识别。评估实验表明，该方法提高了噪声语音的识别精度。当SNR = 0 dB时，与不进行同步重新估计的产品HMM相比，我们提出的方法可获得16％的更高性能。

著录项

来源
《》|2002年|p.305-309|共5页
会议地点
作者
Nakamura; S.; Kumatani; K.; Tamura; S.;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类无线电电子学、电信技术;
关键词
speech recognition; hidden Markov models; synchronisation; audio-visual systems; audio signal processing; multi-modal temporal asynchronicity modeling; product HMM; robust audio-visual speech recognition; acoustic noise robustness; integration modeli;

机译：语音识别;隐马尔可夫模型;同步;视听系统;音频信号处理;多模态时间异步建模;产品HMM;鲁棒的视听语音识别;声学噪声鲁棒性;集成模型;

相似文献

外文文献
中文文献
专利

1. Constructing accurate and robust HMM/GMM models for an Arabic speech recognition system [J] . Khelifa Mohamed O.M., Elhadj Yahya Mohamed, Abdellah Yousfi, International journal of speech technology . 2017,第4期

机译：为阿拉伯语语音识别系统构建准确而健壮的HMM / GMM模型
2. A hybrid SVM/DDBHMM declsion fusion modeling for robust continuous digital speech recognition [J] . Jingwei Liu, Zuoying Wang, Xi Xiao Pattern recognition letters . 2007,第8期

机译：鲁棒的连续数字语音识别的混合SVM / DDBHMM偏差融合建模
3. A hybrid SVM/DDBHMM decision fusion modeling for robust continuous digital speech recognition [J] . Jingwei Liu, Zuoying Wang, Xi Xiao Pattern recognition letters . 2007,第8期

机译：鲁棒的连续数字语音识别的混合SVM / DDBHMM决策融合建模
4. Multi-modal temporal asynchronicity modeling by product HMMs for robust audio-visual speech recognition [C] . Nakamura S., Kumatani K., Tamura S. IEEE International Conference on Multimodal Interfaces . 2002

机译：用于强大的视听语音语音识别的产品HMMS多模态时间asynchronicity建模
5. Robust speech processing based on microphone array, audio-visual, and frame selection for in-vehicle speech recognition and in-set speaker recognition. [D] . Zhang, Xianxian. 2005

机译：基于麦克风阵列，视听和帧选择的强大语音处理功能，可实现车载语音识别和内置说话人识别。
6. Neural oscillations in the temporal pole for a temporally congruent audio-visual speech detection task [O] . Takefumi Ohki, Atsuko Gunji, Yuichi Takei, -1

机译：颞极视听语音检测任务在颞极的神经振荡
7. Product Hmms For Audio-Visual Continuous Speech Recognition Using Facial Animation Parameters [O] . Petar S. Aleksic et al. 2003

机译：用于使用面部动画参数的视听连续语音识别的产品Hmms

Multi-modal temporal asynchronicity modeling by product HMMs for robust audio-visual speech recognition

摘要

著录项

相似文献

相关主题

期刊订阅