【24h】

Grounding Spatial Language for Video Search

机译:视频搜索的空间语言基础

获取原文
获取原文并翻译 | 示例

摘要

The ability to find a video clip that matches a natural language description of an event would enable intuitive search of large databases of surveillance video. We present a mechanism for connecting a spatial language query to a video clip corresponding to the query. The system can retrieve video clips matching millions of potential queries that describe complex events in video such as "people walking from the hallway door, around the island, to the kitchen sink." By breaking down the query into a sequence of independent structured clauses and modeling the meaning of each component of the structure separately, we are able to improve on previous approaches to video retrieval by finding clips that match much longer and more complex queries using a rich set of spatial relations such as "down"and "past." We present a rigorous analysis of the system's performance, based on a large corpus of task-constrained language collected from fourteen subjects. Using this corpus, we show that the system effectively retrieves clips that match natural language descriptions: 58.3% were ranked in the top two of ten in a retrieval task. Furthermore, we show that spatial relations play an important role in the system's performance.
机译:查找与事件的自然语言描述匹配的视频剪辑的能力将使直观的监视视频大型数据库搜索成为可能。我们提出了一种将空间语言查询连接到与该查询相对应的视频剪辑的机制。该系统可以检索与数百万个潜在查询匹配的视频剪辑,这些视频剪辑描述了视频中的复杂事件,例如“从岛上的走廊门走到厨房水槽的人”。通过将查询分解为一系列独立的结构化子句,并分别对结构各部分的含义进行建模,我们可以通过使用丰富的集合查找匹配更长,更复杂查询的片段,从而改进以前的视频检索方法空间关系,例如“向下”和“过去”。我们基于从十四个主题中收集到的任务受限语言的大型语料库,对系统的性能进行了严格的分析。使用该语料,我们证明了该系统有效地检索了与自然语言描述相匹配的剪辑:58.3%在检索任务中排在前十名中的前两名。此外,我们证明了空间关系在系统性能中起着重要作用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号