首页> 外文会议>IEEE International Conference on Acoustics, Speech, and Signal Processing >DUAL-LAYER BAG-OF-FRAMES MODEL FOR MUSIC GENRE CLASSIFICATION
【24h】

DUAL-LAYER BAG-OF-FRAMES MODEL FOR MUSIC GENRE CLASSIFICATION

机译:用于音乐类型分类的双层框架模型

获取原文

摘要

This paper concerns the development of a music dictionary-based model for summarizing local feature descriptors computed over time. Comparing to a holistic representation, this text-like, bag-of-frames representation better captures the rich and time-varying information of music. However, the dictionary used in classical bag-of-frames model only captures frame-level elements of the music; thus, there exists a semantic gap between the dictionary element and commonly seen music description. In order to reduce the gap, a new feature representation called dual-layer bag-of-frames is proposed in this paper. It models the music with a two layer structure, where the first-layer dictionary captures the frame-level characteristics, and the second-layer dictionary captures the segment-level semantics. This hierarchical structure resembles the alphabet-word-document structure of text. Our result demonstrates that the proposed dual-layer bag-of-frames feature achieves state-of-the-art accuracy of music genre classification. The classification accuracy for the GTZAN benchmark reaches 86.7% with dictionary trained from GTZAN, and 83.6% with dictionary trained from another data set USPOP.
机译:本文涉及扩展随时间计算的本地特征描述符的基于音乐词典的模型的发展。比较与整体表示,这种形式,类似帧框架表示更好地捕获了音乐的丰富和时变的信息。但是,经典框架模型中使用的字典仅捕获音乐的帧级元素;因此,字典元素之间存在语义差距和常见的音乐描述。为了减小间隙,本文提出了一种名为双层框架的新特征表示。它用两层结构模拟音乐,其中第一层字典捕获帧级特征,第二层字典捕获段级语义。此分层结构类似于文本的字母表 - 字文档结构。我们的结果表明,所提出的双层框架特征达到了音乐类型分类的最先进准确性。 GTZAN基准的分类准确性达到86.7%,用GTZAN培训的字典达到86.7%,其中83.6%用另一个数据集中的字典培训。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号