首页> 外文期刊>IEEE transactions on audio, speech and language processing >Semantic Annotation and Retrieval of Music and Sound Effects
【24h】

Semantic Annotation and Retrieval of Music and Sound Effects

机译:音乐和声音效果的语义注释和检索

获取原文
获取原文并翻译 | 示例

摘要

We present a computer audition system that can both annotate novel audio tracks with semantically meaningful words and retrieve relevant tracks from a database of unlabeled audio content given a text-based query. We consider the related tasks of content-based audio annotation and retrieval as one supervised multiclass, multilabel problem in which we model the joint probability of acoustic features and words. We collect a data set of 1700 human-generated annotations that describe 500 Western popular music tracks. For each word in a vocabulary, we use this data to train a Gaussian mixture model (GMM) over an audio feature space. We estimate the parameters of the model using the weighted mixture hierarchies expectation maximization algorithm. This algorithm is more scalable to large data sets and produces better density estimates than standard parameter estimation techniques. The quality of the music annotations produced by our system is comparable with the performance of humans on the same task. Our “query-by-text” system can retrieve appropriate songs for a large number of musically relevant words. We also show that our audition system is general by learning a model that can annotate and retrieve sound effects.
机译:我们提出了一种计算机试听系统,该系统既可以使用语义上有意义的单词注释新颖的音轨,也可以在给定基于文本的查询的情况下从未标记音频内容的数据库中检索相关的音轨。我们将基于内容的音频注释和检索的相关任务视为一个受监督的多类,多标签问题,在该问题中,我们对声学特征和单词的联合概率进行建模。我们收集了1700个人工生成的注释的数据集,这些注释描述了500条西方流行音乐曲目。对于词汇表中的每个单词,我们使用此数据在音频特征空间上训练高斯混合模型(GMM)。我们使用加权混合层次结构期望最大化算法来估计模型的参数。与标准参数估计技术相比,该算法可扩展到大型数据集并产生更好的密度估计。我们系统产生的音乐注释的质量与人类在同一任务上的表现相当。我们的“按文本查询”系统可以为大量与音乐相关的单词检索适当的歌曲。通过学习可以注释和检索声音效果的模型,我们还表明我们的试听系统是通用的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号