A Neural Text-to-Speech Model Utilizing Broadcast Data Mixed with Background Music

机译：利用与背景音乐混合的广播数据的神经文本到语音模型

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Recently, it has become easier to obtain speech data from various media such as the internet or YouTube, but directly utilizing them to train a neural text-to-speech (TTS) model is difficult. The proportion of clean speech is insufficient and the remainder includes background music. Even with the global style token (GST). Therefore, we propose the following method to successfully train an end-to-end TTS model with limited broadcast data. First, the background music is removed from the speech by introducing a music filter. Second, the GST-TTS model with an auxiliary quality classifier is trained with the filtered speech and a small amount of clean speech. In particular, the quality classifier makes the embedding vector of the GST layer focus on representing the speech quality (filtered or clean) of the input speech. The experimental results verified that the proposed method synthesized much more high-quality speech than conventional methods.

机译：最近，从诸如因特网或YouTube等各种媒体获取语音数据变得更容易，但直接利用它们训练神经文本到语音（TTS）模型很难。清洁语音的比例不足，其余包括背景音乐。即使是全球风格的令牌（GST）。因此，我们提出了通过有限的广播数据成功培训结束TTS模型的以下方法。首先，通过引入音乐滤波器，从语音中移除背景音乐。其次，具有辅助质量分类器的GST-TTS模型具有滤波的语音和少量的清洁语音培训。特别地，质量分类器使得GST层的嵌入向量集中于表示输入语音的语音质量（过滤或清洁）。实验结果证实，所提出的方法合成比常规方法更高的高质量语音。

著录项

来源
《IEEE International Conference on Acoustics, Speech and Signal Processing》|2021年|6603-6607|共5页
会议地点
作者
Hanbin Bae; Jae-Sung Bae; Young-Sun Joo; Young-Ik Kim; Hoon-Young Cho;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Learning systems; Conferences; Signal processing; Media; Data models; Acoustics; Speech processing;

机译：学习系统;会议;信号处理;媒体;数据模型;声学;语音处理;

相似文献

外文文献
中文文献
专利

1. AUDIO CLASSIFICATION OF MUSIC/SPEECH MIXED SIGNALS USING SINUSOIDAL MODELING WITH SVM AND NEURAL NETWORK APPROACH [J] . PEJMAN MOWLAEE, ABOLGHASEM SAYADIYAN Journal of Circuits, Systems, and Computers . 2013,第2期

机译：使用SVM和神经网络方法的正弦建模对音乐/语音混合信号进行音频分类
2. Construction and practice of music education professional personnel training model under the background of the big data [J] . Bai Yan Basic & clinical pharmacology & toxicology. . 2019,第S1期

机译：大数据背景下的音乐教育专业人才培训模式的建设与实践
3. Construction and practice of music education professional personnel training model under the background of the big data [J] . Bai Yan Basic & clinical pharmacology & toxicology. . 2019,第S1期

机译：大数据背景下的音乐教育专业人才培训模式的建设与实践
4. A Cost Model for Scheduling On-Demand Data Broadcast in Mixed-Type Request Environments [C] . Lei, Ming, Vrbsky, GLOBECOM 2007, 2007 IEEE Global Telecommunications Conference . 2007

机译：混合类型请求环境中调度点播数据的成本模型
5. Process control utilizing data-based models: Applications of statistical techniques and neural networks. [D] . Chen, Gang. 1996

机译：利用基于数据的模型进行过程控制：统计技术和神经网络的应用。
6. Combining Computational Models of Cognition and Neural Data to Learn about Mixed Task Strategies [O] . Gilles de Hollander 2016

机译：结合认知和神经数据的计算模型来了解混合任务策略
7. Enhancing Neural Data-To-Text Generation Models with External Background Knowledge [O] . Shuang Chen, Jinpeng Wang, Xiaocheng Feng, 2019

机译：增强内部背景知识的神经数据到文本生成模型

A Neural Text-to-Speech Model Utilizing Broadcast Data Mixed with Background Music

摘要

著录项

相似文献

相关主题

期刊订阅