首页> 外文期刊>Journal of electronic imaging >Lipreading model based on a two-way convolutional neural network and feature fusion
【24h】

Lipreading model based on a two-way convolutional neural network and feature fusion

机译:基于双向卷积神经网络和特征融合的Lipreading模型

获取原文
获取原文并翻译 | 示例
       

摘要

Lipreading feature extraction is essentially the feature extraction of continuous video frame sequences. A lipreading model based on a two-way convolutional neural network and features is proposed to obtain more reasonable visual-spatial-temporal characteristics. Unlike other lipreading methods based on deep learning, the rank pooling method transforms lip video into a standard RGB image that can be directly input into the convolutional neural network, which effectively reduces the input dimension. In addition, to compensate for the lack of spatial information, the apparent shape and depth features are fused, and then the joint cost function is used to guide the network model learning to obtain more distinguishing features. The experimental results were evaluated on the public GRID database and OuluVS2 database. It shows that the accuracy of the proposed method can reach more than 93%, which validates the effectiveness of the method. (C) 2021 SPIE and IS&T
机译:Lipreading特征提取基本上是连续视频帧序列的特征提取。 提出了一种基于双向卷积神经网络和特征的Lileding模型,以获得更合理的视觉空间时间特征。 与基于深度学习的其他LIPREADING方法不同,等级汇集方法将唇视频转换为可以直接输入到卷积神经网络的标准RGB图像中,这有效地降低了输入维度。 另外,为了补偿空间信息缺乏,表观形状和深度特征融合,然后联合成本函数用于引导网络模型学习以获得更具区别的特征。 在公共网格数据库和Ouluvs2数据库上评估了实验结果。 它表明,所提出的方法的准确性可以达到93%以上,验证该方法的有效性。 (c)2021个SPIE和IS&T

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号