首页> 外文会议>International Conference on Automatic Face and Gesture Recognition >Deformation Flow Based Two-Stream Network for Lip Reading
【24h】

Deformation Flow Based Two-Stream Network for Lip Reading

机译:基于变形流的唇读的两流网络

获取原文

摘要

Lip reading is the task of recognizing speech content by analyzing movements in the lip region when people are speaking. Based on the continuity in adjacent frames in the speaking process, and the consistency in motion patterns among different people when they pronounce the same phoneme, we model lip movements as a sequence of apparent deformations in the lip region during the speaking process. Specifically, we introduce a Deformation Flow Network (DFN) to learn the deformation flow between adjacent frames, which directly captures the motion information within the lip region. The learned deformation flow is then combined with the original grayscale frames with a two-stream network to perform lip reading. To make the two streams learn from each other in the learning process, we introduce a bidirectional knowledge distillation loss to train the two branches jointly. Owing to the complementary cues provided by different branches, the two-stream network shows substantial improvement over using either single branch. A thorough experimental evaluation on two large-scale lip reading benchmarks is presented with detailed analysis. The results accord with our motivation, and show that our method achieves state-of-the-art or comparable performance on these two challenging datasets.
机译:唇读是通过分析人们在说话时通过分析唇部区域的运动来识别语音内容的任务。基于讲话过程中相邻帧的连续性,以及在不同人中发音时的运动模式的一致性,我们在言语过程中模拟唇部运动作为唇部区域中的表观变形序列。具体地,我们引入变形流量网络(DFN)以学习相邻帧之间的变形流,这直接捕获唇部区域内的运动信息。然后将学习的变形流与具有双流网络的原始灰度帧组合以执行唇读。为了使两条流在学习过程中彼此学习,我们引入了双向知识蒸馏损失,以共同培训两个分支。由于不同分支提供的互补线索,双流网络显示使用单个分支的大量改善。对两个大型唇唇读数基准进行了彻底的实验评估,并进行了详细的分析。结果符合我们的动机,并表明我们的方法在这两个具有挑战性的数据集中实现了最先进的或类似的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号