Action Modifiers: Learning From Adverbs in Instructional Videos

机译：动作修饰语：从教学视频中的副词中学习

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We present a method to learn a representation for adverbs from instructional videos using weak supervision from the accompanying narrations. Key to our method is the fact that the visual representation of the adverb is highly dependent on the action to which it applies, although the same adverb will modify multiple actions in a similar way. For instance, while ‘spread quickly’ and ‘mix quickly’ will look dissimilar, we can learn a common representation that allows us to recognize both, among other actions. We formulate this as an embedding problem, and use scaled dot product attention to learn from weakly-supervised video narrations. We jointly learn adverbs as invertible transformations which operate on the embedding space, so as to add or remove the effect of the adverb. As there is no prior work on weakly supervised learning from adverbs, we gather paired action-adverb annotations from a subset of the HowTo100M dataset, for 6 adverbs: quickly/slowly, finely/coarsely and partially/completely. Our method outperforms all baselines for video-to-adverb retrieval with a performance of 0.719 mAP. We also demonstrate our model’s ability to attend to the relevant video parts in order to determine the adverb for a given action.

机译：我们提出了一种方法，该方法使用附带的旁白进行较弱的监督，从教学视频中学习副词的表示形式。我们的方法的关键在于以下事实：副词的视觉表示高度依赖于它所应用的动作，尽管同一副词将以相似的方式修改多个动作。例如，虽然“快速传播”和“快速混合”看起来并不相似，但我们可以学习一种通用的表示方式，使我们能够识别其他动作。我们将此公式化为一个嵌入问题，并使用按比例缩放的点乘产品注意力来学习弱监督的视频旁白。我们共同学习副词作为在嵌入空间上运行的可逆转换，以增加或消除副词的效果。由于目前尚无关于从副词进行弱监督学习的工作，因此我们从HowTo100M数据集的子集中为6个副词收集成对的动作副词注释：快速/缓慢，精细/粗略和部分/完全。我们的方法以0.719 mAP的性能优于视频到副词检索的所有基准。我们还展示了我们的模型可以观看相关视频部分以确定给定动作的副词的能力。

著录项

来源
《IEEE/CVF Conference on Computer Vision and Pattern Recognition》|2020年|865-875|共11页
会议地点
作者
Hazel Doughty; Ivan Laptev; Walterio Mayol-Cuevas; Dima Damen;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Videos; Visualization; Task analysis; Supervised learning; Motion pictures; Training; Computer vision;

机译：视频;可视化;任务分析;监督学习;运动图片;培训;计算机视觉;

相似文献

外文文献
中文文献
专利

1. Learning Macro Actions from Instructional Videos Through Integration of Multiple Modalities [J] . David O. Johnson, Arvin Agah International Journal of Social Robotics . 2013,第1期

机译：通过多种模式的集成从教学视频中学习宏动作
2. Learning Macro Actions from Instructional Videos Through Integration of Multiple Modalities [J] . Johnson David O., Agah Arvin International Journal of Social Robotics . 2013,第1期

机译：通过集成多种模式，从教学视频学习宏操作
3. Why and when does instructional video facilitate learning? A commentary to the special issue 'developments and trends in learning with instructional video' [J] . Betrancourt Mireille, Benetos Kalliopi Computers in Human Behavior . 2018,第DECa期

机译：教学视频为何以及何时促进学习？特刊“教学视频学习的发展和趋势”的评论
4. Embedding YouTube Videos and Interactions in PowerPoint Using Office Mix for Adaptive Learning in Support of a Flipped Classroom Instruction [C] . John M. Santiago, Jing Guo American Society for Engineering Education Annual Conference and Exposition . 2018

机译：使用Office Mix在PowerPoint中嵌入YouTube视频和交互，以便支持翻转课堂教学
5. Using authentic videos to enhance language and culture instruction in a formal English language learning setting: Ten videos and accompanying lessons. [D] . Norris, Rebecca Noelle. 2011

机译：在正式的英语学习环境中使用真实的视频来增强语言和文化教学：十个视频和随附的课程。
6. Learning From Instructional Videos: Learner Gender Does Matter; Speaker Gender Does Not [O] . Claudia Schrader, Tina Seufert, Steffi Zander 2021

机译：从教学视频学习：学习者性别确实很重要;扬声器性别没有
7. Embedding YouTube Videos and Interactions in PowerPoint Using Office Mix for Adaptive Learning in Support of a Flipped Classroom Instruction [O] . John Santiago, Jing Guo -1

机译：使用Office Mix在PowerPoint中嵌入YouTube视频和交互，以便支持翻转课堂教学
8. Instructional Videos for Unsupervised Harvesting and Learning of Action Examples. [R] . Yu, S., Jiang, L., Hauptmann, A. 2014

机译：无监督收获和学习行动范例的教学视频。

Action Modifiers: Learning From Adverbs in Instructional Videos

摘要

著录项

相似文献

相关主题

期刊订阅