Unsupervised Alignment of Actions in Video with Text Descriptions

机译：与文本描述的视频中的动作无监督对齐

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Advances in video technology and data storage have made large scale video data collections of complex activities readily accessible. An increasingly popular approach for automatically inferring the details of a video is to associate the spatiotemporal segments in a video with its natural language descriptions. Most algorithms for connecting natural language with video rely on pre-aligned supervised training data. Recently, several models have been shown to be effective for unsupervised alignment of objects in video with language. However, it remains difficult to generate good spatiotemporal video segments for actions that align well with language. This paper presents a framework that extracts higher level representations of low-level action features through hyperfeature coding from video and aligns them with language. We propose a two-step process that creates a high-level action feature codebook with temporally consistent motions, and then applies an unsupervised alignment algorithm over the action codewords and verbs in the language to identify individual activities. We show an improvement over previous alignment models of objects and nouns on videos of biological experiments, and also evaluate our system on a larger scale collection of videos involving kitchen activities.

机译：视频技术和数据存储的进步使得大规模的视频数据收集易于访问的复杂活动。一种越来越流行的方法，用于自动推断视频的细节是将时空段与其自然语言描述相关联。用于连接自然语言的大多数算法依赖于预先调整的监督培训数据。最近，已经显示出几种模型对于用语言的视频中的对象对齐对齐有效。然而，对于用语言保持良好的动作，它仍然难以生成良好的时空视频段。本文介绍了一种框架，通过从视频编码并将其与语言对齐，提取更高级别的低级动作特征的级别表示。我们提出了一个两步的过程，它创建一个具有时间一致动作的高级动作特征码本，然后在语言中以动作码字和动词应用无监督的对齐算法以识别单个活动。我们对生物实验视频的对象和名词的先前对齐模式进行了改进，并在涉及厨房活动的更大规模集中评估我们的系统。

著录项

来源
《International Joint Conference on Artificial Intelligence》|2016年|1816-2738p|共7页
会议地点
作者
Young Chol Song; Iftekhar Naim; Abdullah Al Mamun; Kaustubh Kulkarni; Parag Singla; Jiebo Luo; Daniel Gildea; Henry Kautz;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP18-53;
关键词

相似文献

外文文献
中文文献
专利

1. Unsupervised Alignment of News Video and Text Using Visual Patterns and Textual Concepts [J] . Multimedia, IEEE Transactions on . 2011,第2期

机译：使用视觉模式和文本概念的新闻视频和文本的无监督对齐
2. Tagging Webcast Text in Baseball Videos by Video Segmentation and Text Alignment [J] . Chiu C.-Y. Circuits and Systems for Video Technology, IEEE Transactions on . 2012,第7期

机译：通过视频分割和文本对齐标记棒球视频中的网络广播文本
3. An unsupervised approach to detect review spam using duplicates of images, videos and Chinese texts [J] . Jiandun Li, Pengpeng Zhang, Liu Yang Computer speech and language . 2021,第Jula期

机译：一种无监督的方法，可以使用重复的图像，视频和中文文本来检测审查垃圾邮件
4. Unsupervised Alignment of Actions in Video with Text Descriptions [C] . Young Chol Song, Iftekhar Naim, Abdullah Al Mamun, International Joint Conference on Artificial Intelligence . 2016

机译：与文本描述的视频中的动作无监督对齐
5. Unsupervised Alignment of Natural Language with Video. [D] . Naim, Iftekhar. 2015

机译：自然语言与视频的无监督对齐。
6. Kinome-wide interaction modelling using alignment-based and alignment-independent approaches for kinase description and linear and non-linear data analysis techniques [O] . Maris Lapins, Jarl ES Wikberg 2010

机译：使用基于比对和比对独立的方法进行激酶描述以及线性和非线性数据分析技术的全基因组相互作用建模
7. Fast Unsupervised Alignment of Video and Text for Indexing/Names and Faces [O] . Subhransu Maji, Ruzena Bajcsy 2009

机译：视频和文本的快速无监督对齐以建立索引/名称和面孔
8. Instructional Videos for Unsupervised Harvesting and Learning of Action Examples. [R] . Yu, S., Jiang, L., Hauptmann, A. 2014

机译：无监督收获和学习行动范例的教学视频。

Unsupervised Alignment of Actions in Video with Text Descriptions

摘要

著录项

相似文献

相关主题

期刊订阅