首页> 外文期刊>Pattern recognition letters >A comparative study on automatic audio-visual fusion for aggression detection using meta-information
【24h】

A comparative study on automatic audio-visual fusion for aggression detection using meta-information

机译:基于元信息的攻击性自动视听融合检测的比较研究

获取原文
获取原文并翻译 | 示例
           

摘要

Multimodal fusion is a complex topic. For surveillance applications audio-visual fusion is very promising given the complementary nature of the two streams. However, drawing the correct conclusion from multi-sensor data is not straightforward. In previous work we have analysed a database with audiovisual recordings of unwanted behavior in trains (Lefter et al., 2012) and focused on a limited subset of the recorded data. We have collected multi- and unimodal assessments by humans, who have given aggression scores on a 3 point scale. We showed that there are no trivial fusion algorithms to predict the multimodal labels from the unimodal labels since part of the information is lost when using the unimodal streams. We proposed an intermediate step to discover the structure in the fusion process. This step is based upon meta-features and we find a set of five which have an impact on the fusion process. In this paper we extend the findings in (Lefter et al., 2012) for the general case using the entire database. We prove that the meta-features have a positive effect on the fusion process in terms of labels. We then compare three fusion methods that encapsulate the meta-features. They are based on automatic prediction of the intermediate level variables and multimodal aggression from state of the art low level acoustic, linguistic and visual features. The first fusion method is based on applying multiple classifiers to predict intermediate level features from the low level features, and to predict the multimodal label from the intermediate variables. The other two approaches are based on probabilistic graphical models, one using (Dynamic) Bayesian Networks and the other one using Conditional Random Fields. We learn that each approach has its strengths and weaknesses in predicting specific aggression classes and using the meta-features yields significant improvements in all cases.
机译:多峰融合是一个复杂的话题。考虑到两个流的互补性,对于监视应用,视听融合非常有前途。但是,从多传感器数据得出正确的结论并不容易。在之前的工作中,我们分析了一个数据库,其中包含火车不良行为的视听记录(Lefter等人,2012),并专注于记录数据的有限子集。我们收集了人类的多模式和单模式评估,他们给出了3分制的攻击评分。我们表明,由于使用单峰流时会丢失部分信息,因此没有简单的融合算法可从单峰标签中预测多峰标签。我们提出了一个中间步骤来发现融合过程中的结构。此步骤基于元功能,我们发现了五个对融合过程有影响的集合。在本文中,我们使用整个数据库扩展了(Lefter等人,2012)针对一般情况的发现。我们证明,就标签而言,元特征对融合过程具有积极影响。然后,我们比较封装元特征的三种融合方法。它们基于对中级变量的自动预测和来自最先进的低级声学,语言和视觉功能的多模式侵略。第一种融合方法基于应用多个分类器从低层特征预测中间层特征,并从中间变量预测多峰标签。另两种方法基于概率图形模型,一种方法使用(动态)贝叶斯网络,另一种方法使用条件随机场。我们了解到,每种方法在预测特定的侵略类别方面都有其优势和劣势,并且在所有情况下使用元功能都可以带来显着的改善。

著录项

  • 来源
    《Pattern recognition letters》 |2013年第15期|1953-1963|共11页
  • 作者单位

    Section of Interactive Intelligence, Department of Intelligence Systems,Delft University of Technology, Mekelweg 4, 2628 CD, Delft, The Netherlands,Intelligent Imaging Department, TNO, Oude Waalsdorperweg 63,2597 AK The Hague, The Netherlands,Sensor Technology, SEWACO Department, Netherlands Defence Academy, Nieuwe Diep 8, 1781 AC. Den Helder, The Netherlands;

    Section of Interactive Intelligence, Department of Intelligence Systems,Delft University of Technology, Mekelweg 4, 2628 CD, Delft, The Netherlands,Sensor Technology, SEWACO Department, Netherlands Defence Academy, Nieuwe Diep 8, 1781 AC. Den Helder, The Netherlands;

    Intelligent Imaging Department, TNO, Oude Waalsdorperweg 63,2597 AK The Hague, The Netherlands;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Audio-visual fusion; Automatic surveillance; Meta-information; Context-based fusion;

    机译:视听融合;自动监控;元信息基于上下文的融合;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号