...
【24h】

Multiple instance deep learning for weakly-supervised visual object tracking

机译:多实例深度学习弱监督的可视对象跟踪

获取原文
获取原文并翻译 | 示例
           

摘要

Intelligently tracking objects with varied shapes, color, lighting conditions, and backgrounds is an extremely useful application in many HCI applications, such as human body motion capture, hand gesture recognition, and virtual reality (VR) games. However, accurately tracking different objects under uncontrolled environments is a tough challenge due to the possibly dynamic object parts, varied lighting conditions, and sophisticated backgrounds. In this work, we propose a novel semantically-aware object tracking framework, wherein the key is weakly-supervised learning paradigm that optimally transfers the video-level semantic tags into various regions. More specifically, give a set of training video clips, each of which is associated with multiple video-level semantic tags, we first propose a weakly-supervised learning algorithm to transfer the semantic tags into various video regions. The key is a MIL (Zhong et al., 2020) [1]-based manifold embedding algorithm that maps the entire video regions into a semantic space, wherein the video-level semantic tags are well encoded. Afterward, for each video region, we use the semantic feature combined with the appearance feature as its representation. We designed a multi-view learning algorithm to optimally fuse the above two types of features. Based on the fused feature, we learn a probabilistic Gaussian mixture model to predict the target probability of each candidate window, where the window with the maximal probability is output as the tracking result. Comprehensive comparative results on a challenging pedestrian tracking task as well as the human hand gesture recognition have demonstrated the effectiveness of our method. Moreover, visualized tracking results have shown that non-rigid objects with moderate occlusions can be well localized by our method.
机译:智能地跟踪具有各种形状,颜色,照明条件和背景的对象是许多HCI应用中的非常有用的应用,例如人体运动捕获,手势识别和虚拟现实(VR)游戏。然而,由于可能动态的物体部件,变化的照明条件和复杂的背景,准确地跟踪不受控制的环境下的不同物体是一个艰难的挑战。在这项工作中,我们提出了一种新颖的语义感知对象跟踪框架,其中密钥是弱监管的学习范例,可将视频级语义标记最佳地传输到各种区域。更具体地,给出一组训练视频剪辑,每个训练视频剪辑与多个视频级语义标签相关联,我们首先提出了一种弱监督的学习算法将语义标签传输到各种视频区域。关键是一个MIL(zhong等,2020)[1]基础的歧管嵌入算法,其将整个视频区域映射到语义空间中,其中视频级语义标签很好地编码。之后,对于每个视频区域,我们使用语义特征与外观特征相结合作为其表示。我们设计了一种多视图学习算法,可以最佳地融合上述两种类型的功能。基于融合特征,我们学习概率性高斯混合模型,以预测每个候选窗口的目标概率,其中具有最大概率的窗口作为跟踪结果。综合对比结果对挑战行人跟踪任务以及人类手势识别表明了我们方法的有效性。此外,可视化的跟踪结果表明,通过我们的方法可以很好地定位具有中等闭塞的非刚性物体。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号