首页> 外文期刊>Pattern Analysis and Machine Intelligence, IEEE Transactions on >Multistream Articulatory Feature-Based Models for Visual Speech Recognition
【24h】

Multistream Articulatory Feature-Based Models for Visual Speech Recognition

机译:基于多流发音特征的视觉语音识别模型

获取原文
获取原文并翻译 | 示例

摘要

We study the problem of automatic visual speech recognition (VSR) using dynamic Bayesian network (DBN)-based models consisting of multiple sequences of hidden states, each corresponding to an articulatory feature (AF) such as lip opening (LO) or lip rounding (LR). A bank of discriminative articulatory feature classifiers provides input to the DBN, in the form of either virtual evidence (VE) (scaled likelihoods) or raw classifier margin outputs. We present experiments on two tasks, a medium-vocabulary word-ranking task and a small-vocabulary phrase recognition task. We show that articulatory feature-based models outperform baseline models, and we study several aspects of the models, such as the effects of allowing articulatory asynchrony, of using dictionary-based versus whole-word models, and of incorporating classifier outputs via virtual evidence versus alternative observation models.
机译:我们研究了使用基于动态贝叶斯网络(DBN)的模型的自动视觉语音识别(VSR)问题,该模型由多个隐藏状态序列组成,每个隐藏状态序列都对应一个发音特征(AF),例如嘴唇张开(LO)或嘴唇倒圆( LR)。一堆区分性发音特征分类器以虚拟证据(VE)(按比例缩放的可能性)或原始分类器余量输出的形式向DBN提供输入。我们提出了两项​​任务的实验,即中等词汇量词排名任务和小型词汇短语识别任务。我们显示出基于发音特征的模型优于基线模型,并且我们研究了模型的几个方面,例如允许发音异步,使用基于字典的模型与全字词模型以及通过虚拟证据与替代观察模型。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号