【24h】

N-GRAM EXTENSION FOR BAG-OF-AUDIO-WORDS

机译:用于禁止音频字的N-Gram扩展

获取原文

摘要

Bag-of-audio-words is one of the most frequently used methods for incorporating an audio component into multimedia event detection and related tasks. A main criticism of the method, however, is that it ignores context. Each "word" is considered in isolation, ignoring its neighbors. We address this issue by representing the document by its audio word N-grams. Unlike words from natural language, audio words are generated by clustering algorithms where the number of clusters is specified by the researcher. We therefore also explore how the performance of the N-gram representation varies with codebook size. With this enhanced representation, we find the average probability of miss noticeably decreases when evaluated on TRECVID 2011 and 2012 datasets, indicating clear improvements on the multimedia event detection task.
机译:袋式录音单词是将音频组件结合到多媒体事件检测和相关任务中的最常用方法之一。然而,对该方法的主要批评是它忽略了背景。每个“单词”都是孤立的,忽略其邻居。我们通过通过其音频字n-gram表示文档来解决此问题。与来自自然语言的单词不同,通过聚类算法生成音频单词,其中研究人员指定了集群的数量。因此,我们还探讨了N-GRAM表示的性能如何因码本大小而变化。通过这种增强的表示,我们发现在TrecVID 2011和2012数据集上评估时明显减少的小姐的平均概率,这表明多媒体事件检测任务的清晰改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号