Tatum-Level Drum Transcription Based on a Convolutional Recurrent Neural Network with Language Model-Based Regularized Training

机译：基于语言模型的正规培训的卷积经常性神经网络的塔图级鼓转录

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

This paper describes a neural drum transcription method that detects from music signals the onset times of drums at the tatum level, where tatum times are assumed to be estimated in advance. In conventional studies on drum transcription, deep neural networks (DNNs) have often been used to take a music spectrogram as input and estimate the onset times of drums at the frame level. The major problem with such frame-to-frame DNNs, however, is that the estimated onset times do not often conform with the typical tatum-level patterns appearing in symbolic drum scores because the long-term musically meaningful structures of those patterns are difficult to learn at the frame level. To solve this problem, we propose a regularized training method for a frame-to-tatum DNN. In the proposed method, a tatum-level probabilistic language model (gated recurrent unit (GRU) network or repetition-aware bi-gram model) is trained from an extensive collection of drum scores. Given that the musical naturalness of tatum-level onset times can be evaluated by the language model, the frame-to-tatum DNN is trained with a regularizer based on the pretrained language model. The experimental results demonstrate the effectiveness of the proposed regularized training method.

机译：本文描述了一种神经鼓转录方法，其从音乐中检测信号信号在塔图水平处的滚筒的起始时间，其中假设预先估计了大仓时间。在对鼓转录的常规研究中，深神经网络（DNN）通常用于将音乐谱图作为输入和估计在帧级别的鼓的开始时间。然而，这种帧到框架DNN的主要问题是估计的开始时间通常不符合符号鼓得分中出现的典型的塔图案，因为这些模式的长期音量有意义的结构难以在帧级别学习。为了解决这个问题，我们为框架到TATUM DNN提出了一种正则训练方法。在所提出的方法中，培训了塔图级概率语言模型（门控复发单元（GRU）网络或重复感知的Bi-Gram模型）从广泛的鼓得分训练。鉴于TATUM级发作时间的音乐自然可以通过语言模型进行评估，框架到TATUM DNN与基于预先预先预先预先的语言模型的规范器培训。实验结果表明了拟议的规则训练方法的有效性。

著录项

来源
《Asia-Pacific Signal and Information Processing Association Annual Summit and Conference》|2020年|359-364|共6页
会议地点
作者
Ryoto Ishizuka; Ryo Nishikimi; Eita Nakamura; Kazuyoshi Yoshii;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Music; Hidden Markov models; Decoding; Spectrogram; Convolution; Training; Feature extraction;

机译：音乐;隐藏的马尔可夫模型;解码;谱图;卷积;训练;特征提取;

相似文献

外文文献
中文文献
专利

1. Cascade convolutional neural network-long short-term memory recurrent neural networks for automatic tonal and nontonal preclassification-based Indian language identification [J] . China Bhanja Chuya, Laskar Mohammad A., Laskar Rabul H. Expert Systems . 2020,第5期

机译：级联卷积神经网络长短期内存经常性神经网络，用于自动色调和非统计学预分配的印度语言识别
2. An Enhanced Training- Based Arabic Sign Language Virtual Interpreter Using Parallel Recurrent Neural Networks [J] . Mohamed A. Abdou Journal of computer sciences . 2018,第2期

机译：基于并行递归神经网络的增强型基于培训的阿拉伯手语虚拟口译员
3. An Enhanced Training- Based Arabic Sign Language Virtual Interpreter Using Parallel Recurrent Neural Networks [J] . Abdou Mohamed A. Journal of computer sciences . 2018,第2期

机译：基于并行递归神经网络的增强型基于培训的阿拉伯手语虚拟口译员
4. Drum transcription from polyphonic music with recurrent neural networks [C] . Richard Vogl, Matthias Dorfer, Peter Knees IEEE International Conference on Acoustics, Speech and Signal Processing . 2017

机译：使用循环神经网络从和弦音乐中转录鼓
5. Deep Neural Language Model for Text Classification Based on Convolutional and Recurrent Neural Networks [D] . Hassan, Abdalraouf. 2018

机译：基于卷积神经网络和递归神经网络的深度神经语言文本分类模型
6. Training Deep Spiking Convolutional Neural Networks With STDP-Based Unsupervised Pre-training Followed by Supervised Fine-Tuning [O] . Chankyu Lee, Priyadarshini Panda, Gopalakrishnan Srinivasan, 2018

机译：通过基于STDP的无监督预训练和有监督的微调来训练深度尖峰卷积神经网络
7. Towards cross-modal pre-training and learning tempo-spatial characteristics for audio recognition with convolutional and recurrent neural networks [O] . Shahin Amiriparian, Maurice Gerczuk, Sandra Ottl, 2020

机译：利用卷积和经常性神经网络对音频识别的跨模型预训练和学习节奏空间特征

Tatum-Level Drum Transcription Based on a Convolutional Recurrent Neural Network with Language Model-Based Regularized Training

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅