首页> 外文期刊>IEEE Transactions on Pattern Analysis and Machine Intelligence >Modeling 4D Human-Object Interactions for Joint Event Segmentation, Recognition, and Object Localization
【24h】

Modeling 4D Human-Object Interactions for Joint Event Segmentation, Recognition, and Object Localization

机译:为联合事件分割,识别和对象本地化建模4D人与对象交互

获取原文
获取原文并翻译 | 示例

摘要

In this paper, we present a 4D human-object interaction (4DHOI) model for solving three vision tasks jointly: i) event segmentation from a video sequence, ii) event recognition and parsing, and iii) contextual object localization. The 4DHOI model represents the geometric, temporal, and semantic relations in daily events involving human-object interactions. In 3D space, the interactions of human poses and contextual objects are modeled by semantic co-occurrence and geometric compatibility. On the time axis, the interactions are represented as a sequence of atomic event transitions with coherent objects. The 4DHOI model is a hierarchical spatial-temporal graph representation which can be used for inferring scene functionality and object affordance. The graph structures and parameters are learned using an ordered expectation maximization algorithm which mines the spatial-temporal structures of events from RGB-D video samples. Given an input RGB-D video, the inference is performed by a dynamic programming beam search algorithm which simultaneously carries out event segmentation, recognition, and object localization. We collected a large multiview RGB-D event dataset which contains 3,815 video sequences and 383,036 RGB-D frames captured by three RGB-D cameras. The experimental results on three challenging datasets demonstrate the strength of the proposed method.
机译:在本文中,我们提出了一种4D人对物体交互(4DHOI)模型,用于共同解决三个视觉任务:i)从视频序列进行事件分割,ii)事件识别和解析,iii)上下文对象定位。 4DHOI模型表示涉及人与对象交互的日常事件中的几何,时间和语义关系。在3D空间中,通过语义共现和几何兼容性来建模人体姿势和上下文对象的交互。在时间轴上,交互表示为具有相干对象的一系列原子事件转换。 4DHOI模型是分层的时空图表示,可用于推断场景功能和对象提供能力。使用有序期望最大化算法学习图形结构和参数,该算法从RGB-D视频样本中挖掘事件的时空结构。给定输入的RGB-D视频,通过动态编程光束搜索算法执行推理,该算法同时执行事件分割,识别和对象定位。我们收集了一个大型多视图RGB-D事件数据集,其中包含由三个RGB-D摄像机捕获的3,815个视频序列和383,036个RGB-D帧。在三个具有挑战性的数据集上的实验结果证明了该方法的优势。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号