首页> 外文会议>IEEE International Conference on Acoustics, Speech and Signal Processing >Transcribing Lyrics from Commercial Song Audio: the First Step Towards Singing Content Processing
【24h】

Transcribing Lyrics from Commercial Song Audio: the First Step Towards Singing Content Processing

机译:从商用歌音频转录歌词:唱歌内容处理的第一步

获取原文

摘要

Spoken content processing (such as retrieval and browsing) is maturing, but the singing content is still almost completely left out. Songs are human voice carrying plenty of semantic information just as speech, and may be considered as a special type of speech with highly flexible prosody. The various problems in song audio, for example the significantly changing phone duration over highly flexible pitch contours, make the recognition of lyrics from song audio much more difficult. This paper reports an initial attempt towards this goal. We collected music-removed version of English songs directly from commercial singing content. The best results were obtained by TDNN-BLSTM with data augmentation with 3-fold speed perturbation plus some special approaches. The WER achieved (73.90%) was significantly lower than the baseline (96.21 %), but still relatively high.
机译:口头内容处理(例如检索和浏览)正在成熟,但唱歌内容仍然仍然完全遗漏。歌曲是人类的语音携带大量语义信息作为演讲,并且可以被视为具有高度灵活的韵律的特殊类型。歌曲音频中的各种问题,例如在高度灵活的音高轮廓上显着改变电话持续时间,使歌曲音频的歌词更加困难。本文报告了对此目标的初步尝试。我们直接从商业唱歌内容收集音乐版英语歌曲版本。通过TDNN-BLSTM获得最佳结果,具有3倍速度扰动加上一些特殊方法的数据增强。达到的萎缩(73.90%)明显低于基线(96.21%),但仍然比较高。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号