首页> 外文会议>IEEE International Conference on Image Processing >Can DNNs Learn to Lipread Full Sentences?
【24h】

Can DNNs Learn to Lipread Full Sentences?

机译:DNN可以学会Lipread完整句子吗?

获取原文

摘要

Finding visual features and suitable models for lipreading tasks that are more complex than a well-constrained vocabulary has proven challenging. This paper explores state-of-the-art Deep Neural Network architectures for lipreading based on a Sequence to Sequence Recurrent Neural Network. We report results for both hand-crafted and 2D/3D Convolutional Neural Network visual front-ends, online monotonic attention, and a joint Connectionist Temporal Classification-Sequence-to-Sequence loss. The system is evaluated on the publicly available TCD-TIMIT dataset, with 59 speakers and a vocabulary of over 6000 words. Results show a major improvement on a Hidden Markov Model framework. A fuller analysis of performance across visemes demonstrates that the network is not only learning the language model, but actually learning to lipread.
机译:寻找比受良好受限的词汇更复杂的Lipreading任务的视觉功能和合适的模型已经证明有挑战性。本文探讨了基于序列序列复发性神经网络的序列的Lipreading最先进的深神经网络架构。我们向手工制作和2D / 3D卷积神经网络视觉前端,在线单调关注的结果报告结果,以及联合连接主体分类序列到序列丢失。该系统在公开的TCD-Timit数据集上进行评估,具有59个扬声器和超过6000字的词汇。结果表明隐藏马尔可夫模型框架的重大改进。对遭受鼠标的性能的更全面分析表明网络不仅学习语言模型,而且实际上学习Lipread。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号