首页> 外文期刊>Audio, Speech, and Language Processing, IEEE/ACM Transactions on >An End-to-End Neural Network for Polyphonic Piano Music Transcription
【24h】

An End-to-End Neural Network for Polyphonic Piano Music Transcription

机译:复音钢琴音乐转录的端到端神经网络

获取原文
获取原文并翻译 | 示例
           

摘要

We present a supervised neural network model for polyphonic piano music transcription. The architecture of the proposed model is analogous to speech recognition systems and comprises an and a . The acoustic model is a neural network used for estimating the probabilities of pitches in a frame of audio. The language model is a recurrent neural network that models the correlations between pitch combinations over time. The proposed model is general and can be used to transcribe polyphonic music without imposing any constraints on the polyphony. The acoustic and language model predictions are combined using a probabilistic graphical model. Inference over the output variables is performed using the beam search algorithm. We perform two sets of experiments. We investigate various neural network architectures for the acoustic models and also investigate the effect of combining acoustic and music language model predictions using the proposed architecture. We compare performance of the neural network-based acoustic models with two popular unsupervised acoustic models. Results show that convolutional neural network acoustic models yield the best performance across all evaluation metrics. We also observe improved performance with the application of the music language models. Finally, we present an efficient variant of beam search that improves performance and reduces run-times by an order of magnitude, making the model suitable for real-time applications.
机译:我们提出了用于复音钢琴音乐转录的监督神经网络模型。所提出的模型的体系结构类似于语音识别系统,并包括和。声学模型是用于估计音频帧中音高概率的神经网络。语言模型是一个递归神经网络,可对音高组合之间随时间的相关性进行建模。所提出的模型是通用的,可用于转录复音音乐而不会对复音施加任何约束。声音和语言模型的预测使用概率图形模型进行组合。使用波束搜索算法对输出变量进行推断。我们执行两组实验。我们研究了用于声学模型的各种神经网络体系结构,还研究了使用所提出的体系结构结合声学和音乐语言模型预测的效果。我们将基于神经网络的声学模型的性能与两个流行的无监督声学模型进行了比较。结果表明,卷积神经网络声学模型在所有评估指标上均表现出最佳性能。我们还观察到音乐语言模型的应用提高了性能。最后,我们提出了一种有效的波束搜索变体,可以提高性能并将运行时间减少一个数量级,从而使该模型适合于实时应用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号