Normalized Training for HMM-Based Visual Speech Recognition

Yoshihiko Nankaku; Keiichi Tokuda; Tadashi Kitamura; Takao Kobayashi

首页> 外文期刊>Electronics and Communications in Japan. Part 3, Fundamental Electronic Science >Normalized Training for HMM-Based Visual Speech Recognition

【24h】

Normalized Training for HMM-Based Visual Speech Recognition

机译：基于HMM的视觉语音识别的规范化训练

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper discusses parameter estimation for a continuous density HMM (hidden Markov model) method of visual speech recognition. Past studies of visual speech recognition can be broadly divided into two approaches, the image-based method and the model-based method. The image-based method is a method in which some preprocessing, such as subsampling and principal component analysis, is applied to the pixel values of the original image, and the result is used as the feature vector. In this approach, the position and size of the lips and the illumination conditions have a direct effect on the recognition rate. Thus, the normalization of these factors is the basic technique. The ordinary approach in the conventional normalization is to provide a criterion independently of the HMM, and to apply normalization before learning. In this paper, normalization by the ML (maximum likelihood) criterion is considered. Normalized training is proposed in which the normalization processes for elements such as the position, size, inclination, mean brightness, and contrast of the lips are integrated with the training of the model. The proposed method is formulated on the basis of an EM (expectation maximization) algorithm in which monotonically increasing behavior of the likelihood of the training data is guaranteed, by iteration of normalized training. The effectiveness of the proposed method is demonstrated in a word recognition experiment using the M2VTS database.

机译：本文讨论了视觉语音识别的连续密度HMM（隐马尔可夫模型）方法的参数估计。过去对视觉语音识别的研究可以大致分为两种方法，基于图像的方法和基于模型的方法。基于图像的方法是将一些预处理（例如子采样和主成分分析）应用于原始图像的像素值，并将结果用作特征向量的方法。在这种方法中，嘴唇的位置和大小以及照明条件直接影响识别率。因此，这些因素的归一化是基本技术。常规归一化中的常规方法是提供独立于HMM的准则，并在学习之前应用归一化。在本文中，考虑了通过ML（最大似然）准则进行归一化。提出了归一化训练，其中将元素的位置，大小，倾斜度，平均亮度和嘴唇对比度等元素的归一化过程与模型的训练相结合。提出的方法是在EM（期望最大化）算法的基础上制定的，该算法通过归一化训练的迭代来保证训练数据的似然性的单调递增行为。使用M2VTS数据库的单词识别实验证明了该方法的有效性。

著录项

来源
《Electronics and Communications in Japan. Part 3, Fundamental Electronic Science》 |2006年第11期|p.40-50|共11页
作者
Yoshihiko Nankaku; Keiichi Tokuda; Tadashi Kitamura; Takao Kobayashi;
展开▼
作者单位

Department of Computer Science, Nagoya Institute of Technology, Nagoya, 466-8555 Japan;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类一般性问题;
关键词
visual speech recognition; bimodal speech recognition; hidden Markov model; normalized training; EM algorithm;

机译：视觉语音识别;双峰语音识别;隐马尔可夫模型;归一化训练;EM算法;
入库时间 2022-08-17 23:52:11

相似文献

外文文献
中文文献
专利

1. Lip Location Normalized Training for Visual Speech Recognition [J] . Oscar Vanegas, Keiichi Tokuda, Tadashi Kitamura IEICE Transactions on Information and Systems . 2000,第11期

机译：嘴唇位置归一化训练用于视觉语音识别
2. Confidence scoring for accurate HMM-based speech recognition by using monophone-level normalization based on subspace method [J] . Muhammad Ghulam, Takaharu Sato, Takashi Fukuda, 電子情報通信学会技術研究報告. 音声. Speech . 2002,第159期

机译：通过基于子空间方法的单电话级归一化，对基于HMM的语音识别进行准确的置信度评分
3. Confidence scoring for accurate HMM-based speech recognition by using monophone-level normalization based on subspace method [J] . Muhammad Ghulam, Takaharu Sato, Takashi Fukuda, 電子情報通信学会技術研究報告. 音声. Speech . 2002,第159期

机译：基于子空间法的单声级归一化，基于赫姆的语音识别的信心评分
4. Normalized training for HMM-BASED visual speech recognition [C] . Nankaku Y., Tokuda K., Kitamura T., Image Processing, 2000. Proceedings. 2000 International Conference on . 2000

机译：基于HMM的视觉语音识别的规范化训练
5. HMM-based non-intrusive speech quality and implementation of Viterbi score distribution and hiddenness based measures to improve the performance of speech recognition [D] . Talwar, Gaurav 2006

机译：基于HMM的非侵入式语音质量以及基于Viterbi分数分布和隐蔽性的措施的实施，以提高语音识别的性能
6. The Self-Advantage in Visual Speech Processing Enhances Audiovisual Speech Recognition in Noise [O] . Nancy Tye-Murray, Brent P. Spehar, Joel Myerson, -1

机译：视觉语音处理中的自我优势增强了噪声中的视听语音识别
7. Normalized Training for HMM-Based Visual Speech Recognition [O] . Yoshihiko Nankaku, Keiichi Tokuda, Tadashi Kitamura, 2000

机译：基于HMM的视觉语音识别的规范化训练

Normalized Training for HMM-Based Visual Speech Recognition

摘要

著录项

相似文献

相关主题

期刊订阅