首页> 外文会议>European conference on computer vision >Data Augmentation Techniques for the Video Question Answering Task
【24h】

Data Augmentation Techniques for the Video Question Answering Task

机译:视频问题应答任务的数据增强技术

获取原文

摘要

Video Question Answering (VideoQA) is a task that requires a model to analyze and understand both the visual content given by the input video and the textual part given by the question, and the interaction between them in order to produce a meaningful answer. In our work we focus on the Egocentric VideoQA task, which exploits first-person videos, because of the importance of such task which can have impact on many different fields, such as those pertaining the social assistance and the industrial training. Recently, an Egocentric VideoQA dataset, called EgoVQA, has been released. Given its small size, models tend to overfit quickly. To alleviate this problem, we propose several augmentation techniques which give us a +5.5% improvement on the final accuracy over the considered baseline.
机译:视频问题应答(VideoQA)是一个任务,需要模型来分析和理解输入视频和问题给出的文本部分的视觉内容,以及它们之间的交互,以便产生有意义的答案。 在我们的工作中,我们专注于Egocentric VideoQA任务,该任务利用第一人称视频,因为此类任务的重要性可能会对许多不同的领域产生影响,例如有关社会援助和工业培训。 最近,已释放了一个名为EGOVQA的Egocentric VideoQA数据集。 鉴于其体积小,型号往往会迅速过度。 为了缓解这一问题,我们提出了几种增强技术,使我们在考虑基线上的最终准确性提高了5.5%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号