首页> 外文会议>Annual conference of the International Speech Communication Association >Compact Audio Representation for Event Detection in Consumer Media
【24h】

Compact Audio Representation for Event Detection in Consumer Media

机译:消费类媒体中用于事件检测的紧凑音频表示

获取原文

摘要

Local audio-visual descriptors are often compactly stored using representations such as the soft quantization histogram [1]. Typically, classification performance with histogram representations is improved through the use of large codeword sets. Unfortunately, this approach runs into overfitting and scalability challenges when applied to richly diverse real-world collections. A novel "i-vector" approach was recently proposed for the speaker-verification task [2]. In this work, we study the relative effectiveness of the i-vector as a compact representation of local audio descriptors (e.g., MFCC's) within a multimedia event detection system. Specifically, we model the local audio descriptors using a Guassian Mixture Model (GMM). Following [2], we constrain the GMM parameters to a low-dimensional sub-space while preserving most of the variability (i.e., information) in the descriptors. The GMM parameters in the subspace constitute a compact representation that exhibits robustness in the face of sparse data. We evaluate the method by performing the multimedia event detection (MED) task using only audio information within consumer (e.g., YouTube) videos. Experiments with the 2011 TRECVTD MED data show that the i-vector provides superior performance and lower dimensionality than the bag-of-words soft quantization histograms used in the state-of-the-art BBN VISER system in the 2011 TRECVID MED Evaluation.
机译:本地视听描述符通常使用诸如软量化直方图[1]的表示形式紧凑地存储。通常,通过使用大型码字集,可以改善直方图表示形式的分类性能。不幸的是,这种方法在应用于丰富多样的现实世界中时会遇到过拟合和可伸缩性方面的挑战。最近,针对说话人验证任务提出了一种新颖的“ i-vector”方法[2]。在这项工作中,我们研究了i-vector作为多媒体事件检测系统中本地音频描述符(例如MFCC)的紧凑表示形式的相对有效性。具体来说,我们使用高斯混合模型(GMM)对本地音频描述符进行建模。根据[2],我们将GMM参数约束到一个低维子空间,同时保留描述符中的大部分可变性(即信息)。子空间中的GMM参数构成了一个紧凑的表示形式,它在稀疏数据面前表现出鲁棒性。我们通过仅使用消费者(例如YouTube)视频中的音频信息执行多媒体事件检测(MED)任务来评估该方法。使用2011 TRECVTD MED数据进行的实验表明,与2011 TRECVID MED评估中最新的BBN VISER系统中使用的词袋软量化直方图相比,i向量具有更高的性能和更低的维数。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号