Compact Audio Representation for Event Detection in Consumer Media

机译：消费类媒体中用于事件检测的紧凑音频表示

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Local audio-visual descriptors are often compactly stored using representations such as the soft quantization histogram [1]. Typically, classification performance with histogram representations is improved through the use of large codeword sets. Unfortunately, this approach runs into overfitting and scalability challenges when applied to richly diverse real-world collections. A novel "i-vector" approach was recently proposed for the speaker-verification task [2]. In this work, we study the relative effectiveness of the i-vector as a compact representation of local audio descriptors (e.g., MFCC's) within a multimedia event detection system. Specifically, we model the local audio descriptors using a Guassian Mixture Model (GMM). Following [2], we constrain the GMM parameters to a low-dimensional sub-space while preserving most of the variability (i.e., information) in the descriptors. The GMM parameters in the subspace constitute a compact representation that exhibits robustness in the face of sparse data. We evaluate the method by performing the multimedia event detection (MED) task using only audio information within consumer (e.g., YouTube) videos. Experiments with the 2011 TRECVTD MED data show that the i-vector provides superior performance and lower dimensionality than the bag-of-words soft quantization histograms used in the state-of-the-art BBN VISER system in the 2011 TRECVID MED Evaluation.

机译：本地视听描述符通常使用诸如软量化直方图[1]的表示形式紧凑地存储。通常，通过使用大型码字集，可以改善直方图表示形式的分类性能。不幸的是，这种方法在应用于丰富多样的现实世界中时会遇到过拟合和可伸缩性方面的挑战。最近，针对说话人验证任务提出了一种新颖的“ i-vector”方法[2]。在这项工作中，我们研究了i-vector作为多媒体事件检测系统中本地音频描述符（例如MFCC）的紧凑表示形式的相对有效性。具体来说，我们使用高斯混合模型（GMM）对本地音频描述符进行建模。根据[2]，我们将GMM参数约束到一个低维子空间，同时保留描述符中的大部分可变性（即信息）。子空间中的GMM参数构成了一个紧凑的表示形式，它在稀疏数据面前表现出鲁棒性。我们通过仅使用消费者（例如YouTube）视频中的音频信息执行多媒体事件检测（MED）任务来评估该方法。使用2011 TRECVTD MED数据进行的实验表明，与2011 TRECVID MED评估中最新的BBN VISER系统中使用的词袋软量化直方图相比，i向量具有更高的性能和更低的维数。

著录项

来源
《Annual conference of the International Speech Communication Association》|2012年|2087-2090|共4页
会议地点
作者
Xiaodan Zhuang; Stavros Tsakalidis; Shuang Wu; Pradeep Natarajan; Rohit Prasad; Prem Natarajan;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
multimedia event detection; factor analy-sis;

机译：多媒体事件检测;因子分析;

相似文献

外文文献
中文文献
专利

1. DCAR: A Discriminative and Compact Audio Representation for Audio Processing [J] . Liping Jing, Bo Liu, Jaeyoung Choi, Multimedia, IEEE Transactions on . 2017,第12期

机译：DCAR：用于音频处理的具有区别性的紧凑型音频表示形式
2. A Content-Adaptive Analysis and Representation Framework for Audio Event Discovery from "Unscripted" Multimedia [J] . Regunathan Radhakrishnan, Ajay Divakarana, Ziyou Xiong, EURASIP journal on applied signal processing . 2006,第2期

机译：一种内容自适应的分析和表示框架，用于从“未脚本化”的多媒体中发现音频事件
3. A Content-Adaptive Analysis and Representation Framework for Audio Event Discovery from "Unscripted" Multimedia [J] . Regunathan Radhakrishnan, Ajay Divakaran, Ziyou Xiong, EURASIP journal on advances in signal processing . 2006,第1期

机译：一种内容自适应的分析和表示框架，用于从“未脚本化”的多媒体中发现音频事件
4. Compact Audio Representation for Event Detection in Consumer Media [C] . Xiaodan Zhuang, Stavros Tsakalidis, Shuang Wu Pradeep Natarajan, INTERSPEECH 2012 . 2012

机译：消费者媒体中的事件检测的紧凑音频表示
5. Managing the 21st-Century Consumer: Social Representations of Contemporary Consumers in the Business Media [D] . Rocchio, Joahna C. 2018

机译：管理21世纪的消费者：商业媒体当代消费者的社会代表性
6. Meta-Analyses Support a Taxonomic Model for Representations of Different Categories of Audio-Visual Interaction Events in the Human Brain [O] . Matt Csonka, Nadia Mardmomen, Paula J Webster, 2021

机译：Meta-Analyzes支持人类大脑中不同类别的视听相互作用事件的陈述的分类模型
7. Learning Compact Structural Representations for Audio Events Using Regressor Banks [O] . Phan, Huy, Maass, Marco, Hertel, Lars, 2016

机译：学习用于音频事件的紧凑结构表示回归银行

Compact Audio Representation for Event Detection in Consumer Media

摘要

著录项

相似文献

相关主题

期刊订阅