首页> 外文期刊>IEEE transactions on multimedia >Gestures In-The-Wild: Detecting Conversational Hand Gestures in Crowded Scenes Using a Multimodal Fusion of Bags of Video Trajectories and Body Worn Acceleration
【24h】

Gestures In-The-Wild: Detecting Conversational Hand Gestures in Crowded Scenes Using a Multimodal Fusion of Bags of Video Trajectories and Body Worn Acceleration

机译:野生手势:使用视频轨迹袋和身体加速加速度的多模式融合检测拥挤场景中的会话式手势

获取原文
获取原文并翻译 | 示例
           

摘要

This paper addresses the detection of hand gestures during free-standing conversations in crowded mingle scenarios. Unlike the scenarios of the previous works in gesture detection and recognition, crowded mingle scenes have additional challenges such as cross-contamination between subjects, strong occlusions, and nonstationary backgrounds. This makes them more complex to analyze using computer vision techniques alone. We propose a multimodal approach using video and wearable acceleration data recorded via smart badges hung around the neck. In the video modality, we propose to treat noisy dense trajectories as bags-of-trajectories. For a given bag, we can have good trajectories corresponding to the subject, and bad trajectories due for instance to cross-contamination. However, we hypothesize that for a given class, it should be possible to learn trajectories that are discriminative while ignoring noisy trajectories. We do this by exploiting multiple instance learning via embedded instance selection as our multiple instance learning approach. This technique also allows us to identify which instances contribute more to the classification. By fusing the decisions of the classifiers from the video and wearable acceleration modalities, we show improvements over the unimodal approaches with an AUC of 0.69. We also present a static analysis and a dynamic analysis to assess the impact of noisy data on the fused detection results, showing that the moments of high occlusion in the video are compensated by the information from the wearables. Finally, we applied our method to detect speaking status, leveraging the close relationship found in the literature between hand gestures and speech.
机译:本文讨论了在人群混杂场景中的独立对话中手势的检测。与以前的手势检测和识别工作场景不同,拥挤的混合场景具有其他挑战,例如对象之间的交叉污染,强烈的遮挡和不稳定的背景。这使得它们单独使用计算机视觉技术进行分析变得更加复杂。我们提出了一种多模式方法,该方法使用通过挂在脖子上的智能徽章记录的视频和可穿戴加速度数据。在视频模态中​​,我们建议将嘈杂的密集轨迹视为轨迹包。对于给定的包,我们可能具有与主题相对应的良好轨迹,而由于交叉污染等原因会出现不良轨迹。但是,我们假设对于给定的类,应该有可能学习区分性的轨迹,而忽略嘈杂的轨迹。为此,我们通过嵌入式实例选择来利用多实例学习作为我们的多实例学习方法。该技术还使我们能够确定哪些实例对分类的贡献更大。通过将分类器的决策从视频和可穿戴式加速度模态融合在一起,我们显示了单模方法的改进,其AUC为0.69。我们还提出了静态分析和动态分析,以评估嘈杂数据对融合检测结果的影响,表明视频中的高遮挡时刻被可穿戴设备的信息所补偿。最后,我们利用手势和言语之间在文献中发现的紧密关系,利用我们的方法来检测说话状态。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号