首页> 外文期刊>Neurocomputing >A two-level attention-based interaction model for multi-person activity recognition
【24h】

A two-level attention-based interaction model for multi-person activity recognition

机译:基于两级注意力的多人活动识别交互模型

获取原文
获取原文并翻译 | 示例

摘要

Multi-person activity recognition is a challenging task due to its elusive interactions in activities. We take into account these interactions at two levels. At the individual level, each person behaves depending on both its spatio-temporal features and interactions propagated from others in the scene. At the scene level, the multi-person activity is characterized by interactions between individuals' actions and the high-level activity. It is worth noting that interactions contribute unequally at both levels. To jointly explore these colorful interactions, we propose a two-level attention-based interaction model relying on two time-varying attention mechanisms. The individual-level attention mechanism conditioned on pose features, exploits various degrees of interactions among individuals in a scene while updating their states at each time step. The scene-level attention mechanism proposes an attention-based pooling strategy to explore various levels of interactions between individuals' actions and the high-level activity. We ground our model by a modified two-stage Gated Recurrent Units (GRUs) network to handle the long-range temporal variability and consistency. Our end-to-end trainable model takes as inputs a set of person detections in videos or image sequences and predicts labels of multi-person activities. Experimental results demonstrate comparable performance of our model and show the effectiveness of our attention mechanisms. (C) 2018 Elsevier B. V. All rights reserved.
机译:由于多人活动识别在活动中难以捉摸,因此是一项具有挑战性的任务。我们在两个层次上考虑了这些相互作用。在个人层面上,每个人的行为都取决于其时空特征和从场景中其他人传播的互动。在场景级别,多人活动的特征在于个人行为与高级活动之间的交互作用。值得注意的是,互动在两个层面上均不平等。为了共同探讨这些丰富多彩的交互,我们提出了一个基于两个级别的,基于注意力的交互模型,该模型依赖于两个随时间变化的注意力机制。以姿势特征为条件的个人级别的注意力机制在每个时间步更新其状态的同时,利用场景中各个人之间的各种程度的交互。场景级别的注意力机制提出了一种基于注意力的池化策略,以探索个人行为与高级活动之间各种级别的交互。我们通过修改后的两级门控循环单元(GRU)网络来建立模型,以处理长期的时间变异性和一致性。我们的端到端可训练模型以视频或图像序列中的一组人检测为输入,并预测多人活动的标签。实验结果证明了我们模型的可比性能,并显示了我们注意力机制的有效性。 (C)2018 Elsevier B.V.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号