Sight to Sound: An End-to-End Approach for Visual Piano Transcription

机译：声音的视觉：视觉钢琴转录的端到端方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Automatic music transcription has primarily focused on transcribing audio to a symbolic music representation (e.g. MIDI or sheet music). However, audio-only approaches often struggle with polyphonic instruments and background noise. In contrast, visual information (e.g. a video of an instrument being played) does not have such ambiguities. In this work, we address the problem of transcribing piano music from visual data alone. We propose an end-to-end deep learning framework that learns to automatically predict note onset events given a video of a person playing the piano. From this, we are able to transcribe the played music in the form of MIDI data. We find that our approach is surprisingly effective in a variety of complex situations, particularly those in which music transcription from audio alone is impossible. We also show that combining audio and video data can improve the transcription obtained from each modality alone.

机译：自动音乐转录主要集中于将音频转录为符号音乐表示形式（例如MIDI或活页乐谱）。但是，纯音频方法经常会遇到和弦乐器和背景噪声的困扰。相反，视觉信息（例如正在演奏的乐器的视频）没有这种模糊性。在这项工作中，我们解决了仅从视觉数据抄录钢琴音乐的问题。我们提出了一个端到端的深度学习框架，该框架学习在给定一个弹钢琴的人的视频的情况下自动预测音符发作事件。由此，我们能够以MIDI数据的形式抄录播放的音乐。我们发现，我们的方法在各种复杂的情况下都出乎意料地有效，尤其是在那些无法仅从音频复制音乐的情况下。我们还表明，结合音频和视频数据可以改善从每种形式获得的转录。

著录项

来源
《IEEE International Conference on Acoustics, Speech and Signal Processing》|2020年|1838-1842|共5页
会议地点
作者
A. Sophia Koepke; Olivia Wiles; Yael Moses; Andrew Zisserman;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
visual music transcription; automatic music transcription; music information retrieval; deep learning;

机译：视觉音乐转录;自动音乐转录;音乐信息检索;深度学习;

相似文献

外文文献
中文文献
专利

1. That does not sound right: Sounds affect visual ERPs during a piano sight-reading task [J] . Delogu Franco, Brunetti Riccardo, Inuggi Alberto, Behavioural Brain Research: An International Journal . 2019,第期

机译：没有发出正确的声音：声音会影响钢琴视线读取任务期间的视觉erps
2. An End-to-End Neural Network for Polyphonic Piano Music Transcription [J] . S. Sigtia, E. Benetos, S. Dixon Audio, Speech, and Language Processing, IEEE/ACM Transactions on . 2016,第5期

机译：复音钢琴音乐转录的端到端神经网络
3. Learning Sight from Sound: Ambient Sound Provides Supervision for Visual Learning [J] . Andrew Owens, Jiajun Wu, Josh H. McDermott, International Journal of Computer Vision . 2018,第10期

机译：从声音学习视线：环境声音为视觉学习提供监督
4. Sight to Sound: An End-to-End Approach for Visual Piano Transcription [C] . A. Sophia Koepke, Olivia Wiles, Yael Moses, IEEE International Conference on Acoustics, Speech and Signal Processing . 2020

机译：声音视线：视觉钢琴转录的端到端方法
5. Clavision: Visual automatic piano music transcription. [D] . Akbari, Mohammad. 2015

机译：Clavision：视觉自动钢琴音乐转录。
6. Surmising synchrony of sound and sight: Factors explaining variance of audiovisual integration in hurdling tap dancing and drumming [O] . Nina Heins, Jennifer Pomp, Daniel S. Kluger, 2021

机译：枪杀的声音和视线同步：解释视听融合在障碍踢踏舞和鼓声中的变化
7. Hear and See: End-to-end sound classification and visualization of classified sounds [O] . Thomas Miano 2018

机译：听到并查看：端到端的声音分类和分类声音的可视化

Sight to Sound: An End-to-End Approach for Visual Piano Transcription

摘要

著录项

相似文献

相关主题

期刊订阅