首页> 外文会议>Signals, Systems and Computers >A probabilistic principal component analysis based hidden Markov model for audio-visual speech recognition

【24h】

A probabilistic principal component analysis based hidden Markov model for audio-visual speech recognition

机译：基于概率主成分分析的隐马尔可夫模型用于视听语音识别

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Lipreading is an efficient method among those proposed to improve the performance of speech recognition systems, especially in acoustic noisy environments. This paper proposes a simple audio-visual speech recognition (AVSR) system, which could improve the robustness and accuracy of audio speech recognition by integrating the synchronous audio and visual information. We propose a hidden Markov model (HMM) based on the probabilistic principal component analysis (PCA) for the visual-only speech recognition and the visual modality of the audio-visual speech recognition. The probabilistic PCA based HMM directly uses the images which only contain the speaker's mouth region without pre-processing (mouth corner detection, contour marking, etc), and takes probabilistic PCA as the observation probability density function (PDF). Then we integrate these two modalities information (audio and visual) together and obtain a multi-stream hidden Markov model (MSHMM). We found that, without extracting the specialized features before processing, probabilistic PCA could capture the principal components during the training and describe the visual part of the materials. It is also verified by the experiments that the integration of the audio and visual information could help to improve the recognition accuracy even at a low acoustic signal-to-noisy ratio (SNR).

机译：唇读是被提议用来改善语音识别系统性能的一种有效方法，尤其是在声学嘈杂的环境中。本文提出了一种简单的视听语音识别（AVSR）系统，该系统可以通过整合同步的视听信息来提高语音识别的鲁棒性和准确性。我们提出基于概率主成分分析（PCA）的隐马尔可夫模型（HMM），用于仅视觉的语音识别和视听语音识别的视觉模式。基于概率PCA的HMM直接使用仅包含讲话者嘴巴区域的图像，而无需进行预处理（嘴角检测，轮廓标记等），并将概率PCA用作观察概率密度函数（PDF）。然后，我们将这两种模态信息（音频和视频）集成在一起，并获得了多流隐藏马尔可夫模型（MSHMM）。我们发现，概率PCA可以在训练过程中捕获主要成分并描述材料的可视部分，而无需在处理之前提取特殊功能。实验还证明，即使在低声信噪比（SNR）的情况下，音频和视频信息的集成也可以帮助提高识别精度。

著录项

来源
《Signals, Systems and Computers》|2008年|2170-2173|共4页
会议地点
作者
Zhanyu Ma; Leijon A.;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词
audio-visual speech recognition; multi-stream hidden Markov model; probabilistic PCA;

机译：视听语音识别;多流隐马尔可夫模型;概率PCA;

相似文献

外文文献
中文文献
专利

1. Evolutionary structure of hidden Markov models for audio-visual Arabic speech recognition [J] . Amina Makhlouf, Lilia Lazli, Bachir Bensaker International Journal of Signal and Imaging Systems Engineering . 2016,第1期

机译：视听阿拉伯语音识别的隐马尔可夫模型的演化结构
2. Characteristics of the use of coupled hidden Markov models for audio-visual Polish speech recognition [J] . M. KUBANEK, J. BOBULSKI, L. ADRJANOWICZ Bulletin of the Polish Academy of Sciences. Technical Sciences . 2012,第2期

机译：使用耦合隐马尔可夫模型进行波兰语视听语音识别的特征
3. Characteristics of the use of coupled hidden Markov models for audio-visual polish speech recognition [J] . J. Bobulski, L. Adrjanowicz, M. Kubanek Bulletin of the Polish Academy of Sciences. Technical Sciences . 2012,第2期

机译：使用耦合隐马尔可夫模型进行视听波兰语语音识别的特征
4. A PROBABILISTIC PRINCIPAL COMPONENT ANALYSIS BASED HIDDEN MARKOV MODEL FOR AUDIO-VISUAL SPEECH RECOGNITION [C] . Zhanyu Ma, Arne Leijon Asilomar Conference on Signals, Systems and Computers . 2008

机译：基于概率的视听语音识别隐马尔可夫模型的概率主体成分分析
5. Online Learning of Large Margin Hidden Markov Models for Automatic Speech Recognition. [D] . Cheng, Chih-Chieh. 2011

机译：在线学习大余量隐马尔可夫模型以进行自动语音识别。
6. Enhancing Speech Recognition Using Improved Particle Swarm Optimization Based Hidden Markov Model [O] . Lokesh Selvaraj, Balakrishnan Ganesan -1

机译：基于隐马尔可夫模型的改进粒子群算法增强语音识别
7. Analysis and Design of Principal Component Analysis and Hidden Markov Model for Face Recognition [O] . Kumar D.S. Dinesh, Rao P.V. 2015

机译：人脸识别的主成分分析与隐马尔可夫模型分析与设计
8. Improving on hidden Markov models: An articulatorily constrained, maximum likelihood approach to speech recognition and speech coding [R] . Hogden, J. 1996

机译：改进隐马尔可夫模型：语音识别和语音编码的语义约束，最大似然方法

A probabilistic principal component analysis based hidden Markov model for audio-visual speech recognition

摘要

著录项

相似文献

相关主题

期刊订阅