首页> 外文期刊>IEEE transactions on multimedia >Integration strategies for audio-visual speech processing: applied to text-dependent speaker recognition
【24h】

Integration strategies for audio-visual speech processing: applied to text-dependent speaker recognition

机译:视听语音处理的集成策略:应用于与文本相关的说话人识别

获取原文
获取原文并翻译 | 示例
           

摘要

In this paper, an in-depth analysis is undertaken into effective strategies for integrating the audio-visual speech modalities with respect to two major questions. Firstly, at what level should integration occur? Secondly, given a level of integration how should this integration be implemented? Our work is based around the well-known hidden Markov model (HMM) classifier framework for modeling speech. A novel framework for modeling the mismatch between train and test observation sets is proposed, so as to provide effective classifier combination performance between the acoustic and visual HMM classifiers. From this framework, it can be shown that strategies for combining independent classifiers, such as the weighted product or sum rules, naturally emerge depending on the influence of the mismatch. Based on the assumption that poor performance in most audio-visual speech processing applications can be attributed to train/test mismatches we propose that the main impetus of practical audio-visual integration is to dampen the independent errors, resulting from the mismatch, rather than trying to model any bimodal speech dependencies. To this end a strategy is recommended, based on theory and empirical evidence, using a hybrid between the weighted product and weighted sum rules in the presence of varying acoustic noise for the task of text-dependent speaker recognition.
机译:在本文中,针对整合两个主要问题的视听语音模态的有效策略进行了深入分析。首先,应该在什么级别进行整合?其次,给定一个集成级别,应该如何实现这种集成?我们的工作基于众所周知的隐马尔可夫模型(HMM)分类器框架来对语音建模。提出了一种用于训练和测试观测集之间不匹配的建模新框架,以便在声学和视觉HMM分类器之间提供有效的分类器组合性能。从这个框架中可以看出,根据不匹配的影响,自然会出现组合独立分类器的策略,例如加权乘积或求和规则。基于大多数视听语音处理应用程序性能不佳可归因于训练/测试不匹配的假设,我们建议实际视听集成的主要动力是抑制由不匹配导致的独立错误,而不是尝试建模任何双峰语音依存关系。为此,建议一种基于理论和经验证据的策略,在存在变化的噪声的情况下,使用加权乘积和加权和规则之间的混合来完成与文本相关的说话者识别任务。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号