首页> 外文会议>IEEE International Conference on Acoustics, Speech and Signal Processing >Sight to Sound: An End-to-End Approach for Visual Piano Transcription
【24h】

Sight to Sound: An End-to-End Approach for Visual Piano Transcription

机译:声音的视觉:视觉钢琴转录的端到端方法

获取原文

摘要

Automatic music transcription has primarily focused on transcribing audio to a symbolic music representation (e.g. MIDI or sheet music). However, audio-only approaches often struggle with polyphonic instruments and background noise. In contrast, visual information (e.g. a video of an instrument being played) does not have such ambiguities. In this work, we address the problem of transcribing piano music from visual data alone. We propose an end-to-end deep learning framework that learns to automatically predict note onset events given a video of a person playing the piano. From this, we are able to transcribe the played music in the form of MIDI data. We find that our approach is surprisingly effective in a variety of complex situations, particularly those in which music transcription from audio alone is impossible. We also show that combining audio and video data can improve the transcription obtained from each modality alone.
机译:自动音乐转录主要集中于将音频转录为符号音乐表示形式(例如MIDI或活页乐谱)。但是,纯音频方法经常会遇到和弦乐器和背景噪声的困扰。相反,视觉信息(例如正在演奏的乐器的视频)没有这种模糊性。在这项工作中,我们解决了仅从视觉数据抄录钢琴音乐的问题。我们提出了一个端到端的深度学习框架,该框架学习在给定一个弹钢琴的人的视频的情况下自动预测音符发作事件。由此,我们能够以MIDI数据的形式抄录播放的音乐。我们发现,我们的方法在各种复杂的情况下都出乎意料地有效,尤其是在那些无法仅从音频复制音乐的情况下。我们还表明,结合音频和视频数据可以改善从每种形式获得的转录。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号