...
首页> 外文期刊>Future generation computer systems >Fusing audio vocabulary with visual features for pornographic video detection
【24h】

Fusing audio vocabulary with visual features for pornographic video detection

机译:将音频词汇与视觉功能融合以检测色情视频

获取原文
获取原文并翻译 | 示例
           

摘要

Pornographic video detection based on multimodal fusion is an effective approach for filtering pornography. However, existing methods lack accurate representation of audio semantics and pay little attention to the characteristics of pornographic audios. In this paper, we propose a novel framework of fusing audio vocabulary with visual features for pornographic video detection. The novelty of our approach lies in three aspects: an audio semantics representation method based on an energy envelope unit (EEU) and bag-of-words (BoW), a periodicity-based audio segmentation algorithm, and a periodicity-based video decision algorithm. The first one, named the EEU+BoW representation method, is proposed to describe the audio semantics via an audio vocabulary. The audio vocabulary is constructed by k-means clustering of EEUs. The latter two aspects echo with each other to make full use of the periodicities in pornographic audios. Using the periodicity-based audio segmentation algorithm, audio streams are divided into EEU sequences. After these EEUs are classified, videos are judged to be pornographic or not by the periodicity-based video decision algorithm. Before fusion, two support vector machines are respectively applied for the audio-vocabulary-based and visual-features-based methods. To fuse their results, a keyframe is selected from each EEU in terms of the beginning and ending positions, and then an integrated weighted scheme and a periodicity-based video decision algorithm are adopted to yield final detection results. Experimental results show that our approach outperforms the traditional one which is only based on visual features, and achieves satisfactory performance. The true positive rate achieves 94.44% while the false positive rate is 9.76%.
机译:基于多模式融合的色情视频检测是一种过滤色情内容的有效方法。但是,现有方法缺乏对音频语义的准确表示,并且很少关注色情音频的特性。在本文中,我们提出了一种将音频词汇与视觉特征融合以进行色情视频检测的新颖框架。我们的方法的新颖性在于三个方面:基于能量包络单元(EEU)和词袋(BoW)的音频语义表示方法,基于周期性的音频分段算法和基于周期性的视频决策算法。提出了第一种方法,称为EEU + BoW表示方法,用于通过音频词汇表描述音频语义。音频词汇表由EEU的k均值聚类构成。后两个方面相互呼应,以充分利用色情音频中的周期性。使用基于周期性的音频分段算法,音频流被分为EEU序列。在对这些EEU进行分类之后,通过基于周期性的视频决策算法将视频判断为色情内容。在融合之前,两个支持向量机分别应用于基于音频词汇和基于视觉特征的方法。为了融合它们的结果,从每个EEU的开始和结束位置中选择一个关键帧,然后采用综合加权方案和基于周期的视频决策算法来产生最终检测结果。实验结果表明,该方法优于仅基于视觉特征的传统方法,并取得了令人满意的性能。真阳性率达到94.44%,而假阳性率为9.76%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号