首页> 外文期刊>Journal of circuits, systems and computers >High-Quality Many-to-Many Voice Conversion Using Transitive Star Generative Adversarial Networks with Adaptive Instance Normalization
【24h】

High-Quality Many-to-Many Voice Conversion Using Transitive Star Generative Adversarial Networks with Adaptive Instance Normalization

机译:使用具有自适应实例标准化的高质量多对多的语音转换,使用传递之星生成对抗网络

获取原文
获取原文并翻译 | 示例

摘要

This paper proposes a novel high-quality nonparallel many-to-many voice conversion method based on transitive star generative adversarial networks with adaptive instance normalization (Trans-StarGAN-VC with AdaIN). First, we improve the structure of generator with TransNets to make full use of hierarchical features associated with speech naturalness. In TransNets, many shortcut connections share hierarchical features between encoding and decoding part to capture sufficient linguistic and semantic information, which helps to provide natural sounding converted speech and accelerate the convergence of training process. Second, by incorporating AdaIN for style transfer, we enable the generator to learn sufficient speaker characteristic information directly from speech instead of using attribute labels, which also provides a promising framework for one-shot VC. Objective and subjective experiments with nonparallel training data show that our method significantly outperforms StarGAN-VC in both speech naturalness and speaker similarity. The mean values of mean opinion score (MOS) and ABX are increased by 24.5% and 10.7%, respectively. The comparison of spectrogram also shows that our method can provide more complete harmonic structures and details, and effectively bridge the gap between converted speech and target speech.
机译:本文提出了一种基于具有自适应实例归一化的传递明星生成对抗网络的新型高质量非平行多对多语音转换方法(Trans-Stargan-VC具有Adain)。首先,我们改善发电机的结构与轨道圈,充分利用与语音自然相关的分层特征。在Transnet中,许多快捷方式连接在编码和解码部分之间共享分层特征,以捕获足够的语言和语义信息,这有助于提供自然探测转换的语音并加速训练过程的收敛。其次,通过纳入Adain进行风格转移,我们使得发电机能够直接从语音中学习足够的扬声器特征信息而不是使用属性标签,这也为单次VC提供了一个有希望的框架。非平行培训数据的客观和主观实验表明,我们的方法在语音自然和扬声器相似性中显着优于Stargan-VC。平均意见评分(MOS)和ABX的平均值分别增加了24.5%和10.7%。谱图的比较还表明,我们的方法可以提供更完整的谐波结构和细节,有效地弥合转换语音和目标语音之间的差距。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号