Integration strategies for audio-visual speech processing: applied to text-dependent speaker recognition

Lucey S.; Chen T.; Sridharan S.; Chandran V.

首页> 外文期刊>IEEE transactions on multimedia >Integration strategies for audio-visual speech processing: applied to text-dependent speaker recognition

【24h】

Integration strategies for audio-visual speech processing: applied to text-dependent speaker recognition

机译：视听语音处理的集成策略：应用于与文本相关的说话人识别

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, an in-depth analysis is undertaken into effective strategies for integrating the audio-visual speech modalities with respect to two major questions. Firstly, at what level should integration occur? Secondly, given a level of integration how should this integration be implemented? Our work is based around the well-known hidden Markov model (HMM) classifier framework for modeling speech. A novel framework for modeling the mismatch between train and test observation sets is proposed, so as to provide effective classifier combination performance between the acoustic and visual HMM classifiers. From this framework, it can be shown that strategies for combining independent classifiers, such as the weighted product or sum rules, naturally emerge depending on the influence of the mismatch. Based on the assumption that poor performance in most audio-visual speech processing applications can be attributed to train/test mismatches we propose that the main impetus of practical audio-visual integration is to dampen the independent errors, resulting from the mismatch, rather than trying to model any bimodal speech dependencies. To this end a strategy is recommended, based on theory and empirical evidence, using a hybrid between the weighted product and weighted sum rules in the presence of varying acoustic noise for the task of text-dependent speaker recognition.

机译：在本文中，针对整合两个主要问题的视听语音模态的有效策略进行了深入分析。首先，应该在什么级别进行整合？其次，给定一个集成级别，应该如何实现这种集成？我们的工作基于众所周知的隐马尔可夫模型（HMM）分类器框架来对语音建模。提出了一种用于训练和测试观测集之间不匹配的建模新框架，以便在声学和视觉HMM分类器之间提供有效的分类器组合性能。从这个框架中可以看出，根据不匹配的影响，自然会出现组合独立分类器的策略，例如加权乘积或求和规则。基于大多数视听语音处理应用程序性能不佳可归因于训练/测试不匹配的假设，我们建议实际视听集成的主要动力是抑制由不匹配导致的独立错误，而不是尝试建模任何双峰语音依存关系。为此，建议一种基于理论和经验证据的策略，在存在变化的噪声的情况下，使用加权乘积和加权和规则之间的混合来完成与文本相关的说话者识别任务。

著录项

来源
《IEEE transactions on multimedia》 |2005年第3期|p.495-506|共12页
作者
Lucey S.; Chen T.; Sridharan S.; Chandran V.;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词
hidden Markov models; multimedia communication; pattern classification; speaker recognition; speech processing; speech synthesis; HMM classifier framework; audio-visual speech processing; hidden Markov model; speech modeling; text-dependent speaker recognition; Aud;

机译：隐马尔可夫模型;多媒体通信;模式分类;说话人识别;语音处理;语音合成;HMM分类器框架;视听语音处理;隐马尔可夫模型;语音建模;与文本相关的说话人识别;Aud;

相似文献

外文文献
中文文献
专利

1. Audio-visual speech recognition integrating 3D lip information obtained from the Kinect [J] . Wang Jianrong, Zhang Ju, Honda Kiyoshi, Multimedia Systems . 2016,第3期

机译：整合从Kinect获得的3D嘴唇信息的视听语音识别
2. Optimum integration weight for decision fusion audio-visual speech recognition [J] . R. Rajavel, P. S. Sathidevi International Journal of Computational Science and Engineering . 2015,第1a2期

机译：决策融合视听语音识别的最佳集成权重
3. Adaptive Reliability Measure and Optimum Integration Weight for Decision Fusion Audio-visual Speech Recognition [J] . R. Rajavel, P. S. Sathidevi Journal of signal processing systems for signal, image, and video technology . 2012,第1期

机译：决策融合视听语音识别的自适应可靠性度量和最佳集成权
4. The 'Audio-Visual Face Cover Corpus': Investigations into audio-visual speech and speaker recognition when the speaker's face is occluded by facewear [C] . Natalie Fecher Annual conference of the International Speech Communication Association . 2012

机译：“视听面罩语料库”：当说话人的脸被面部服饰遮挡时，进行视听语音和说话人识别的调查
5. Robust speech processing based on microphone array, audio-visual, and frame selection for in-vehicle speech recognition and in-set speaker recognition. [D] . Zhang, Xianxian. 2005

机译：基于麦克风阵列，视听和帧选择的强大语音处理功能，可实现车载语音识别和内置说话人识别。
6. Speech Perception for Adult Cochlear Implant Recipients in a Realistic Background Noise: Effectiveness of Preprocessing Strategies and External Options for Improving Speech Recognition in Noise [O] . René H. Gifford, Lawrence J. Revit -1

机译：成人耳蜗植入者在现实背景噪声中的言语感知：预处理策略和外部选择改善噪声语音识别的有效性
7. Integration strategies for audiovisual speech processing: Applied to text-dependent speaker recognition [O] . Simon Lucey, Tsuhan Chen, Sridha Sridharan, 2005

机译：视听语音处理的集成策略：应用于依赖于文本的说话人识别
8. Robust Speech Processing & Recognition: Speaker ID, Language ID, Speech Recognition/Keyword Spotting, Diarization/Co-Channel/Environmental Characterization, Speaker State Assessment. [R] . Hansen, J. H. 2015

机译：强大的语音处理和识别：说话者ID，语言ID，语音识别/关键字识别，Diarization / Co-Channel /环境表征，说话者状态评估。

Integration strategies for audio-visual speech processing: applied to text-dependent speaker recognition

摘要

著录项

相似文献

相关主题

期刊订阅