【24h】

Acquiring Common Sense Spatial Knowledge through Implicit Spatial Templates

机译:通过隐式空间模板获取常识空间知识

获取原文

摘要

Spatial understanding is a fundamental problem with wide-reaching real-world applications. The representation of spatial knowledge is often modeled with spatial templates, i.e., regions of acceptability of two objects under an explicit spatial relationship (e.g., "on", "below", etc.). In contrast with prior work that restricts spatial templates to explicit spatial prepositions (e.g., "glass on table"), here we extend this concept to implicit spatial language, i.e., those relationships (generally actions) for which the spatial arrangement of the objects is only implicitly implied (e.g., "man riding horse"). In contrast with explicit relationships, predicting spatial arrangements from implicit spatial language requires significant common sense spatial understanding. Here, we introduce the task of predicting spatial templates for two objects under a relationship, which can be seen as a spatial question-answering task with a (2D) continuous output ("where is the man w.r.t. a horse when the man is walking the horse?"). We present two simple neural-based models that leverage annotated images and structured text to learn this task. The good performance of these models reveals that spatial locations are to a large extent predictable from implicit spatial language. Crucially, the models attain similar performance in a challenging generalized setting, where the object-relation-object combinations (e.g., "man walking dog") have never been seen before. Next, we go one step further by presenting the models with unseen objects (e.g., "dog"). In this scenario, we show that leveraging word embeddings enables the models to output accurate spatial predictions, proving that the models acquire solid common sense spatial knowledge allowing for such generalization.
机译:空间理解是一个宽阔的现实世界应用的根本问题。空间知识的表示通常用空间模板,即两个物体的可接受性区域的建模,即在明确的空间关系(例如“,”下方“等)下。与事后的工作相比,将空间模板限制在明确的空间介词(例如,“桌上”)中,这里我们将此概念扩展到隐式的空间语言,即,这些关系(通常是动作)对象的空间排列所在的那些关系(通常是动作)。只有隐含暗示(例如,“男子骑马”)。与显式关系相比,预测隐含空间语言的空间布置需要重大常识空间理解。在这里,我们介绍了在关系下预测两个对象的空间模板的任务,这可以被视为具有(2D)连续输出的空间问答任务(“当男人走路时,男人在哪里骑马马?”)。我们展示了两个简单的基于神经的模型,可以利用带注释的图像和结构化文本来学习此任务。这些模型的良好性能表明,空间位置在很大程度上可预测可预测可从隐式空间语言可预测。至关重要的是,模型在一个具有挑战性的广义环境中实现了类似的性能,其中对象关系 - 对象组合(例如,“人类步行狗”)从未见过。接下来,通过用未经看法(例如,“狗”)呈现模型来进一步逐步。在这种情况下,我们表明利用单词嵌入式使模型能够输出准确的空间预测,证明模型获取允许这种概括的实体常见意义空间知识。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号