首页> 外文会议>IEEE International Symposium on Multimedia >An i-Vector Representation of Acoustic Environments for Audio-Based Video Event Detection on User Generated Content

【24h】

An i-Vector Representation of Acoustic Environments for Audio-Based Video Event Detection on User Generated Content

机译：用于用户生成内容的基于音频的视频事件检测的声学环境的i-Vector表示

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Audio-based video event detection (VED) on user-generated content (UGC) aims to find videos that show an observable event such as a wedding ceremony or birthday party rather than a sound, such as music, clapping or singing. The difficulty of video content analysis on UGC lies in the acoustic variability and lack of structure of the data. The UGC task has been explored mainly by computer vision, but can be benefited by the used of audio. The i-vector system is state-of-the-art in Speaker Verification, and is outperforming a conventional Gaussian Mixture Model (GMM)-based approach. The system compensates for undesired acoustic variability and extracts information from the acoustic environment, making it a meaningful choice for detection on UGC. This paper employs the i-vector-based system for audio-based VED on UGC and expands the understanding of the system on the task. It also includes a performance comparison with the conventional GMM-based and state-of-the-art Random Forest (RF)-based systems. The i-vector system aids audio-based event detection by addressing UGC audio characteristics. It outperforms the GMM-based system, and is competitive with the RF-based system in terms of the Missed Detection (MD) rate at 4% and 2.8% False Alarm (FA) rates, and complements the RF-based system by demonstrating slightly improvement in combination over the standalone systems.

机译：基于用户生成的内容（UGC）的基于音频的视频事件检测（VED）旨在查找显示可观察事件（例如婚礼或生日聚会）而不是声音（例如音乐，拍手或唱歌）的视频。在UGC上进行视频内容分析的困难在于声学可变性和数据结构的缺乏。 UGC任务主要是通过计算机视觉来探索的，但可以受益于音频的使用。 i-vector系统是说话人验证中的最新技术，其性能优于传统的基于高斯混合模型（GMM）的方法。该系统补偿了不希望的声学变化，并从声学环境中提取了信息，这使其成为在UGC上进行检测的有意义的选择。本文在UGC上将基于i向量的系统用于基于音频的VED，并扩展了对该系统在任务上的理解。它还包括与常规的基于GMM的系统和最新的基于随机森林（RF）的系统的性能比较。 i矢量系统通过解决UGC音频特征来辅助基于音频的事件检测。它的性能优于基于GMM的系统，并且与基于RF的系统相比在误检（FA）率为4％和2.8％的误报（FA）方面具有竞争力，并且通过略微演示而补充了基于RF的系统。改进了独立系统的组合。

著录项

来源
《IEEE International Symposium on Multimedia》|2013年|114-117|共4页
会议地点
作者
Elizalde Benjamin; Friedland Gerald;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Audio; User Generated Content; Video Event Detection; i-vector;

机译：音频;用户生成的内容;视频事件检测; i矢量;

相似文献

外文文献
中文文献
专利

1. Audio-Based Event Detection in Videos - a Comprehensive Survey [J] . Rajeswari Natarajan, Chandrakala.S International Journal of Engineering and Technology . 2014,第4期

机译：视频中基于音频的事件检测-全面调查
2. Multimodal extraction of events and of information about the recording activity in user generated videos [J] . Francesco Cricri, Kostadin Dabov, Igor D. D. Curcio, Multimedia Tools and Applications . 2014,第1期

机译：对用户生成的视频中的事件和有关录制活动的信息进行多模式提取
3. Sentiment key frame extraction in user-generated micro-videos via low-rank and sparse representation [J] . Gu Xiaowei, Lu Lu, Qiu Shaojian, Neurocomputing . 2020,第Octa14期

机译：通过低级别和稀疏表示，在用户生成的微视频中的情感关键帧提取
4. An i-Vector Representation of Acoustic Environments for Audio-Based Video Event Detection on User Generated Content [C] . Elizalde Benjamin, Friedland Gerald IEEE International Symposium on Multimedia . 2013

机译：对用户生成内容的音频基于视频事件检测的声学环境的I形式表示
5. Learning, detection, representation, indexing and retrieval of multi-agent events in videos. [D] . Hakeem, Asaad. 2007

机译：视频中多主体事件的学习，检测，表示，索引和检索。
6. The Effects of User Engagements for User and Company Generated Videos on Music Sales: Empirical Evidence From YouTube [O] . JiHye Park, JooSeok Park, JaeHong Park -1

机译：用户参与和公司生成的视频的用户参与度对音乐销售的影响：来自YouTube的经验证据
7. Video Analysis Tools for Annotating User-Generated Content from Social Events [O] . Rodrigo Laiola Guimarães, Rene Kaiser, Albert Hofmann, 2011

机译：用于注释来自社交事件的用户生成内容的视频分析工具

An i-Vector Representation of Acoustic Environments for Audio-Based Video Event Detection on User Generated Content

摘要

著录项

相似文献

相关主题

期刊订阅