Grounding Spatial Language for Video Search

机译：视频搜索的空间语言基础

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

The ability to find a video clip that matches a natural language description of an event would enable intuitive search of large databases of surveillance video. We present a mechanism for connecting a spatial language query to a video clip corresponding to the query. The system can retrieve video clips matching millions of potential queries that describe complex events in video such as "people walking from the hallway door, around the island, to the kitchen sink." By breaking down the query into a sequence of independent structured clauses and modeling the meaning of each component of the structure separately, we are able to improve on previous approaches to video retrieval by finding clips that match much longer and more complex queries using a rich set of spatial relations such as "down"and "past." We present a rigorous analysis of the system's performance, based on a large corpus of task-constrained language collected from fourteen subjects. Using this corpus, we show that the system effectively retrieves clips that match natural language descriptions: 58.3% were ranked in the top two of ten in a retrieval task. Furthermore, we show that spatial relations play an important role in the system's performance.

机译：查找与事件的自然语言描述匹配的视频剪辑的能力将使直观的监视视频大型数据库搜索成为可能。我们提出了一种将空间语言查询连接到与该查询相对应的视频剪辑的机制。该系统可以检索与数百万个潜在查询匹配的视频剪辑，这些视频剪辑描述了视频中的复杂事件，例如“从岛上的走廊门走到厨房水槽的人”。通过将查询分解为一系列独立的结构化子句，并分别对结构各部分的含义进行建模，我们可以通过使用丰富的集合查找匹配更长，更复杂查询的片段，从而改进以前的视频检索方法空间关系，例如“向下”和“过去”。我们基于从十四个主题中收集到的任务受限语言的大型语料库，对系统的性能进行了严格的分析。使用该语料，我们证明了该系统有效地检索了与自然语言描述相匹配的剪辑：58.3％在检索任务中排在前十名中的前两名。此外，我们证明了空间关系在系统性能中起着重要作用。

著录项

来源
《International conference on multimodal interfaces and workshop on machine learning for multimodal interaction 2010》|2010年|p.194-201|共8页
会议地点 Beijing(CN);Beijing(CN)
作者
Stefanie Tellex; Thomas Kollar; George Shaw; Nicholas Roy; Deb Roy;
展开▼
作者单位

MIT Media Lab 75Amherst St, E14-574M Cambridge, MA 02139;

The Stata Center, MIT CSAIL 32 Vassar St, 32-331 Cambridge, MA 02139;

MIT Media Lab 75 Amherst St, E14-474 Cambridge, MA 02139;

MIT Media Lab 75Amherst St, E14-574M Cambridge, MA 02139;

MIT Media Lab 75Amherst St, E14-574M Cambridge, MA 02139;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类程序设计、软件工程;
关键词
video retrieval; spatial language;

机译：视频检索；空间语言;

相似文献

外文文献
中文文献
专利

1. Grounding spatial language in the motor system: Reciprocal interactions between spatial semantics and orienting [J] . Bradley S. Gibson, Gregory J. Davis Visual Cognition . 2011,第1期

机译：在电机系统中扎根空间语言：空间语义与定向之间的相互交互
2. A Compositional Framework for Grounding Language Inference, Generation, and Acquisition in Video [J] . Barbu Andrei, Siddharth N., Siskind Jeffrey Mark, The Journal of Artificial Intelligence Research . 2015,第4期

机译：视频中基础语言推理，生成和获取的组合框架
3. A Compositional Framework for Grounding Language Inference, Generation, and Acquisition in Video [J] . Yu Haonan, Siddharth N., Barbu Andrei, The Journal of Artificial Intelligence Research . 2015,第Null期

机译：视频中基础语言推理，生成和获取的组合框架
4. Grounding Spatial Language for Video Search [C] . Stefanie Tellex, Thomas Kollar, George Shaw, International conference on multimodal interfaces and workshop on machine learning for multimodal interaction . 2010

机译：用于视频搜索的接地空间语言
5. Grounding language in video [D] . Yu, Haonan. 2016

机译：视频中的基础语言
6. Grounding the neurobiology of language in first principles: The necessity of non-language-centric explanations for language comprehension [O] . Uri Hasson, Giovanna Egidi, Marco Marelli, -1

机译：以第一原理为基础的语言神经生物学：以非语言为中心的语言理解解释的必要性
7. Grounding Spatial Language for Video Search [O] . Stefanie Tellex, Thomas Kollar, George Shaw, 2010

机译：视频搜索的空间语言基础

Grounding Spatial Language for Video Search

摘要

著录项

相似文献

相关主题

期刊订阅