首页> 外文OA文献 >Deep Cross-Modal Correlation Learning for Audio and Lyrics in Music Retrieval
【2h】

Deep Cross-Modal Correlation Learning for Audio and Lyrics in Music Retrieval

机译:音乐检索中音频和歌词的深度跨模型相关学习

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Little research focuses on cross-modal correlation learning where temporalstructures of different data modalities such as audio and lyrics are taken intoaccount. Stemming from the characteristic of temporal structures of music innature, we are motivated to learn the deep sequential correlation between audioand lyrics. In this work, we propose a deep cross-modal correlation learningarchitecture involving two-branch deep neural networks for audio modality andtext modality (lyrics). Different modality data are converted to the samecanonical space where inter modal canonical correlation analysis is utilized asan objective function to calculate the similarity of temporal structures. Thisis the first study on understanding the correlation between language and musicaudio through deep architectures for learning the paired temporal correlationof audio and lyrics. Pre-trained Doc2vec model followed by fully-connectedlayers (fully-connected deep neural network) is used to represent lyrics. Twosignificant contributions are made in the audio branch, as follows: i)pre-trained CNN followed by fully-connected layers is investigated forrepresenting music audio. ii) We further suggest an end-to-end architecturethat simultaneously trains convolutional layers and fully-connected layers tobetter learn temporal structures of music audio. Particularly, our end-to-enddeep architecture contains two properties: simultaneously implementing featurelearning and cross-modal correlation learning, and learning jointrepresentation by considering temporal structures. Experimental results, usingaudio to retrieve lyrics or using lyrics to retrieve audio, verify theeffectiveness of the proposed deep correlation learning architectures incross-modal music retrieval.
机译:小研究侧重于跨模型相关学习,其中截止了诸如音频和歌词等不同数据模式的颞下结构。源于音乐全部的时间结构的特征,我们有动力学习Audad和歌词之间的深度连续相关性。在这项工作中,我们提出了一个深入的跨模型相关学习建筑,涉及两个分支深层神经网络,用于音频模态和文本模态(歌词)。将不同的模态数据转换为模态规范相关性分析的示例性函数的示例性函数来计算时间结构的相似性。本据了解通过深度架构了解语言与Musicaudio之间的相关性,以了解音频和歌词的配对时间相关性。预先训练的DOC2VEC模型,然后用于完全连接(完全连接的深神经网络)来表示歌词。双重贡献在音频分支中进行,如下:i)预先训练的CNN,然后是完全连接的层进行了调查的推出音乐音频。 ii)我们进一步提出了端到端的建筑,同时列车卷积卷积层和完全连接的层Tobetter学习音乐音频的时间结构。特别是,我们的端到端的架构包含两个属性:同时通过考虑时间结构来实现特征性学习和跨模型相关学习,以及学习联合特性。实验结果,使用Ravio检索歌词或使用歌词来检索音频,验证所提出的深度相关学习架构增量 - 模态音乐检索的无效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号