首页> 外文会议>IEEE/CVF Conference on Computer Vision and Pattern Recognition >Action Modifiers: Learning From Adverbs in Instructional Videos
【24h】

Action Modifiers: Learning From Adverbs in Instructional Videos

机译:动作修饰语:从教学视频中的副词中学习

获取原文

摘要

We present a method to learn a representation for adverbs from instructional videos using weak supervision from the accompanying narrations. Key to our method is the fact that the visual representation of the adverb is highly dependent on the action to which it applies, although the same adverb will modify multiple actions in a similar way. For instance, while ‘spread quickly’ and ‘mix quickly’ will look dissimilar, we can learn a common representation that allows us to recognize both, among other actions. We formulate this as an embedding problem, and use scaled dot product attention to learn from weakly-supervised video narrations. We jointly learn adverbs as invertible transformations which operate on the embedding space, so as to add or remove the effect of the adverb. As there is no prior work on weakly supervised learning from adverbs, we gather paired action-adverb annotations from a subset of the HowTo100M dataset, for 6 adverbs: quickly/slowly, finely/coarsely and partially/completely. Our method outperforms all baselines for video-to-adverb retrieval with a performance of 0.719 mAP. We also demonstrate our model’s ability to attend to the relevant video parts in order to determine the adverb for a given action.
机译:我们提出了一种方法,该方法使用附带的旁白进行较弱的监督,从教学视频中学习副词的表示形式。我们的方法的关键在于以下事实:副词的视觉表示高度依赖于它所应用的动作,尽管同一副词将以相似的方式修改多个动作。例如,虽然“快速传播”和“快速混合”看起来并不相似,但我们可以学习一种通用的表示方式,使我们能够识别其他动作。我们将此公式化为一个嵌入问题,并使用按比例缩放的点乘产品注意力来学习弱监督的视频旁白。我们共同学习副词作为在嵌入空间上运行的可逆转换,以增加或消除副词的效果。由于目前尚无关于从副词进行弱监督学习的工作,因此我们从HowTo100M数据集的子集中为6个副词收集成对的动作副词注释:快速/缓慢,精细/粗略和部分/完全。我们的方法以0.719 mAP的性能优于视频到副词检索的所有基准。我们还展示了我们的模型可以观看相关视频部分以确定给定动作的副词的能力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号