This paper proposes a novel framework for music content indexing and retrieval. The music structure information, i.e., timing, harmony and music region content, is represented by the layers of the music structure pyramid. We begin by extracting this layered structure information. We analyze the rhythm of the music and then segment the signal proportional to the inter-beat intervals. Thus, the timing information is incorporated in the segmentation process, which we call Beat Space Segmentation. To describe Harmony Events, we propose a two-layer hierarchical approach to model the music chords. We also model the progression of instrumental and vocal content as Acoustic Events. After information extraction, we propose a vector space modeling approach which uses these events as the indexing terms. In query-by-example music retrieval, a query is represented by a vector of the statistics of the n-gram events. We then propose two effective retrieval models, a hard-indexing scheme and a soft-indexing scheme. Experiments show that the vector space modeling is effective in representing the layered music information, achieving 82.5% top-5 retrieval accuracy using 15-sec music clips as the queries. The soft-indexing outperforms hard-indexing in general.
展开▼
机译:本文提出了一种新颖的音乐内容索引和检索框架。音乐结构信息,即时间,和声和音乐区域内容,由音乐结构金字塔的层表示。我们首先提取此分层结构信息。我们分析音乐的节奏,然后按节拍间隔将信号分段。因此,时间信息被合并到分割过程中,我们称之为“节拍空间分割”。为了描述和声事件 I>,我们提出了一种两层的分层方法来对音乐和弦进行建模。我们还以声学事件 I>为模型来模拟乐器和声乐内容的发展。信息提取后,我们提出了一种向量空间建模方法,该方法将这些事件用作索引项。在示例查询 I>音乐检索中,查询由 n I> -gram事件统计信息的向量表示。然后,我们提出了两种有效的检索模型:硬索引方案和软索引方案。实验表明,向量空间建模可有效表示分层的音乐信息,使用15秒的音乐剪辑作为查询,可达到82.5%的top-5检索精度。通常,软索引优于硬索引。
展开▼