Research on Video Captioning Based on Multifeature Fusion

Hong Zhao; Lan Guo; ZhiWen ChenHouZe Zheng

首页> 外文期刊>Computational intelligence and neuroscience >Research on Video Captioning Based on Multifeature Fusion

【24h】

Research on Video Captioning Based on Multifeature Fusion

机译：Research on Video Captioning Based on Multifeature Fusion

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
相关主题

摘要

Aiming at the problems that the existing video captioning models pay attention to incomplete information and the generation of expression text is not accurate enough, a video captioning model that integrates image, audio, and motion optical flow is proposed. A variety of large-scale dataset pretraining models are used to extract video frame features, motion information, audio features, and video sequence features. An embedded layer structure based on self-attention mechanism is designed to embed single-mode features and learn single-mode feature parameters. Then, two schemes of joint representation and cooperative representation are used to fuse the multimodal features of the feature vectors output by the embedded layer, so that the model can pay attention to different targets in the video and their interactive relationships, which effectively improves the performance of the video captioning model. The experiment is carried out on large datasets MSR-VTT and LSMDC. Under the metrics BLEU4, METEOR, ROUGEL, and CIDEr, the MSR-VTT benchmark dataset obtained scores of 0.443, 0.327,0.619, and 0.521, respectively. The result shows that the proposed method can effectively improve the performance of the video captioning model, and the evaluation indexes are improved compared with comparison models.

著录项

来源
《Computational intelligence and neuroscience 》 |2022年第17期| ArticleID1204909-ArticleID1204909| 共1页
作者
Hong Zhao; Lan Guo; ZhiWen ChenHouZe Zheng;
展开▼
作者单位

School of Computer and Communication, Lanzhou University of Technology, Lanzhou, Gansu, China;

展开▼
收录信息
原文格式 PDF
正文语种英语
中图分类寄生生物学 ;
关键词

Research on Video Captioning Based on Multifeature Fusion

摘要

著录项

相关主题

期刊订阅