首页> 外文会议>IEEE International Symposium on Multimedia >An i-Vector Representation of Acoustic Environments for Audio-Based Video Event Detection on User Generated Content
【24h】

An i-Vector Representation of Acoustic Environments for Audio-Based Video Event Detection on User Generated Content

机译:用于用户生成内容的基于音频的视频事件检测的声学环境的i-Vector表示

获取原文

摘要

Audio-based video event detection (VED) on user-generated content (UGC) aims to find videos that show an observable event such as a wedding ceremony or birthday party rather than a sound, such as music, clapping or singing. The difficulty of video content analysis on UGC lies in the acoustic variability and lack of structure of the data. The UGC task has been explored mainly by computer vision, but can be benefited by the used of audio. The i-vector system is state-of-the-art in Speaker Verification, and is outperforming a conventional Gaussian Mixture Model (GMM)-based approach. The system compensates for undesired acoustic variability and extracts information from the acoustic environment, making it a meaningful choice for detection on UGC. This paper employs the i-vector-based system for audio-based VED on UGC and expands the understanding of the system on the task. It also includes a performance comparison with the conventional GMM-based and state-of-the-art Random Forest (RF)-based systems. The i-vector system aids audio-based event detection by addressing UGC audio characteristics. It outperforms the GMM-based system, and is competitive with the RF-based system in terms of the Missed Detection (MD) rate at 4% and 2.8% False Alarm (FA) rates, and complements the RF-based system by demonstrating slightly improvement in combination over the standalone systems.
机译:基于用户生成的内容(UGC)的基于音频的视频事件检测(VED)旨在查找显示可观察事件(例如婚礼或生日聚会)而不是声音(例如音乐,拍手或唱歌)的视频。在UGC上进行视频内容分析的困难在于声学可变性和数据结构的缺乏。 UGC任务主要是通过计算机视觉来探索的,但可以受益于音频的使用。 i-vector系统是说话人验证中的最新技术,其性能优于传统的基于高斯混合模型(GMM)的方法。该系统补偿了不希望的声学变化,并从声学环境中提取了信息,这使其成为在UGC上进行检测的有意义的选择。本文在UGC上将基于i向量的系统用于基于音频的VED,并扩展了对该系统在任务上的理解。它还包括与常规的基于GMM的系统和最新的基于随机森林(RF)的系统的性能比较。 i矢量系统通过解决UGC音频特征来辅助基于音频的事件检测。它的性能优于基于GMM的系统,并且与基于RF的系统相比在误检(FA)率为4%和2.8%的误报(FA)方面具有竞争力,并且通过略微演示而补充了基于RF的系统。改进了独立系统的组合。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号