首页> 外文会议>International Joint Conference on Artificial Intelligence >Unsupervised Alignment of Actions in Video with Text Descriptions
【24h】

Unsupervised Alignment of Actions in Video with Text Descriptions

机译:与文本描述的视频中的动作无监督对齐

获取原文

摘要

Advances in video technology and data storage have made large scale video data collections of complex activities readily accessible. An increasingly popular approach for automatically inferring the details of a video is to associate the spatiotemporal segments in a video with its natural language descriptions. Most algorithms for connecting natural language with video rely on pre-aligned supervised training data. Recently, several models have been shown to be effective for unsupervised alignment of objects in video with language. However, it remains difficult to generate good spatiotemporal video segments for actions that align well with language. This paper presents a framework that extracts higher level representations of low-level action features through hyperfeature coding from video and aligns them with language. We propose a two-step process that creates a high-level action feature codebook with temporally consistent motions, and then applies an unsupervised alignment algorithm over the action codewords and verbs in the language to identify individual activities. We show an improvement over previous alignment models of objects and nouns on videos of biological experiments, and also evaluate our system on a larger scale collection of videos involving kitchen activities.
机译:视频技术和数据存储的进步使得大规模的视频数据收集易于访问的复杂活动。一种越来越流行的方法,用于自动推断视频的细节是将时空段与其自然语言描述相关联。用于连接自然语言的大多数算法依赖于预先调整的监督培训数据。最近,已经显示出几种模型对于用语言的视频中的对象对齐对齐有效。然而,对于用语言保持良好的动作,它仍然难以生成良好的时空视频段。本文介绍了一种框架,通过从视频编码并将其与语言对齐,提取更高级别的低级动作特征的级别表示。我们提出了一个两步的过程,它创建一个具有时间一致动作的高级动作特征码本,然后在语言中以动作码字和动词应用无监督的对齐算法以识别单个活动。我们对生物实验视频的对象和名词的先前对齐模式进行了改进,并在涉及厨房活动的更大规模集中评估我们的系统。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号