首页> 外文会议>International Conference on Automatic Face and Gesture Recognition >Pseudo-Convolutional Policy Gradient for Sequence-to-Sequence Lip-Reading
【24h】

Pseudo-Convolutional Policy Gradient for Sequence-to-Sequence Lip-Reading

机译:伪卷积政策梯度序列序列唇读

获取原文

摘要

Lip-reading aims to infer the speech content from the lip movement sequence and can be seen as a typical sequence-to-sequence (seq2seq) problem which translates the input image sequence of lip movements to the text sequence of the speech content. However, the traditional learning process of seq2seq models always suffers from two problems: the exposure bias resulted from the strategy of “teacher-forcing”, and the inconsistency between the discriminative optimization target (usually the cross-entropy loss) and the final evaluation metric (usually the character/word error rate). In this paper, we propose a novel pseudo-convolutional policy gradient (PCPG) based method to address these two problems. On the one hand, we introduce the evaluation metric (refers to the character error rate in this paper) as a form of reward to optimize the model together with the original discriminative target. On the other hand, inspired by the local perception property of convolutional operation, we perform a pseudo-convolutional operation on the reward and loss dimension, so as to take more context around each time step into account to generate a robust reward and loss for the whole optimization. Finally, we perform a thorough comparison and evaluation on both the word-level and sentence-level benchmarks. The results show a significant improvement over other related methods, and report either a new state-of-the-art performance or a competitive accuracy on all these challenging benchmarks, which clearly proves the advantages of our approach.
机译:唇读旨在从唇部运动序列推断语音内容,并且可以被视为典型的序列到序列(SEQ2Seq)问题,其将唇部运动的输入图像序列转换为语音内容的文本序列。然而,SEQ2Seq模型的传统学习过程始终存在两个问题:曝光偏差由“教师迫使”的策略以及歧视优化目标(通常是跨熵损失)和最终评估度量之间的不一致(通常是字符/单词错误率)。在本文中,我们提出了一种基于新的伪卷积政策梯度(PCPG)的方法来解决这两个问题。一方面,我们介绍评估度量(指本文中的字符错误率)作为一种与原始鉴别目标一起优化模型的奖励形式。另一方面,受到卷积作业的本地感知性的启发,我们对奖励和损失维度进行了伪卷积操作,以便在每次步骤中考虑更多的背景,以产生强大的奖励和损失整体优化。最后,我们对词语级和句子级基准进行彻底的比较和评估。结果表现出对其他相关方法的显着改进,并在所有这些具有挑战性的基准上报告了新的最先进的性能或竞争准确性,这明确证明了我们的方法的优势。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号