首页> 外文期刊>IEEE transactions on audio, speech and language processing >Acoustic Chord Transcription and Key Extraction From Audio Using Key-Dependent HMMs Trained on Synthesized Audio
【24h】

Acoustic Chord Transcription and Key Extraction From Audio Using Key-Dependent HMMs Trained on Synthesized Audio

机译:使用在合成音频上训练的依赖于键的HMM,从音频中获取和弦转录和键提取

获取原文
获取原文并翻译 | 示例
           

摘要

We describe an acoustic chord transcription system that uses symbolic data to train hidden Markov models and gives best-of-class frame-level recognition results. We avoid the extremely laborious task of human annotation of chord names and boundaries which must be done to provide machine learning models with ground truth by performing automatic harmony analysis on symbolic music files. In parallel, we synthesize audio from the same symbolic files and extract acoustic feature vectors which are in perfect alignment with the labels. We, therefore, generate a large set of labeled training data with a minimal amount of human labor. This allows for richer models. Thus, we build 24 key-dependent HMMs, one for each key, using the key information derived from symbolic data. Each key model defines a unique state-transition characteristic and helps avoid confusions seen in the observation vector. Given acoustic input, we identify a musical key by choosing a key model with the maximum likelihood, and we obtain the chord sequence from the optimal state path of the corresponding key model, both of which are returned by a Viterbi decoder. This not only increases the chord recognition accuracy, but also gives key information. Experimental results show the models trained on synthesized data perform very well on real recordings, even though the labels automatically generated from symbolic data are not 100% accurate. We also demonstrate the robustness of the tonal centroid feature, which outperforms the conventional chroma feature.
机译:我们描述了一种声学和弦转录系统,该系统使用符号数据来训练隐马尔可夫模型,并给出同类最佳的帧级识别结果。我们避免了人工注释和弦名称和边界这一极其费力的工作,而该工作必须通过对符号音乐文件执行自动和声分析来为机器学习模型提供真实的基础。同时,我们从相同的符号文件中合成音频,并提取与标签完美对齐的声学特征向量。因此,我们用最少的人工就可以生成大量带标签的训练数据。这允许更丰富的模型。因此,我们使用从符号数据得出的密钥信息来构建24个与密钥相关的HMM,每个密钥对应一个。每个密钥模型都定义了唯一的状态转换特性,并有助于避免在观察向量中看到的混淆。在给定声音输入的情况下,我们通过选择具有最大似然性的键模型来识别音乐键,然后从相应键模型的最佳状态路径中获得和弦序列,这两个维特比解码器均将其返回。这不仅可以提高和弦识别的准确性,而且可以提供关键信息。实验结果表明,即使从符号数据自动生成的标签并非100%准确,在合成数据上训练的模型在真实记录中的表现也非常好。我们还演示了音调质心特征的鲁棒性,它优于常规色度特征。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号