首页> 外文期刊>日本音響学会誌/The Journal of the Acoustical Society of Japan >Abstracts of Papers in Acoustical Science and Technology
【24h】

Abstracts of Papers in Acoustical Science and Technology

机译:声学科学与技术的论文摘要

获取原文
获取原文并翻译 | 示例
       

摘要

We propose non-parallel and many-to-many voice conversion (VC) using variational autoencoders (VAEs) that constructs VC models for converting arbitrary speakers' characteristics into those of other arbitrary speakers without parallel speech corpora for training the models. Although VAEs conditioned by one-hot coded speaker codes can achieve non-parallel VC, the phonetic contents of the converted speech tend to vanish, resulting in degraded speech quality. Another issue is that they cannot deal with unseen speakers not included in training corpora. To overcome these issues, we incorporate deep-neural-network-based automatic speech recognition (ASR) and automatic speaker verification (ASV) into the VAE-based VC. Since phonetic contents are given as phonetic posteriorgrams predicted from the ASR models, the proposed VC can overcome the quality degradation. Our VC utilizes d-vec-tors extracted from the ASV models as continuous speaker representations that can deal with unseen speakers. Experimental results demonstrate that our VC outperforms the conventional VAE-based VC in terms of mel-cepstral distortion and converted speech quality. We also investigate the effects of hyperparameters in our VC and reveal that 1) a large d-vector dimensionality that gives the better ASV performance does not necessarily improve converted speech quality, and 2) a large number of pre-stored speakers improves the quality.
机译:我们提出了使用变化的AutoEncoders(VAE)的非平行和多对多的语音转换(VC),该转换器构建VC模型,用于将任意扬声器的特性转换为其他任意扬声器的特征,而无需并行语音语料库来培训模型。虽然由单热编码扬声器代码调节的VAE可以实现非平行VC,但转换后的语音的语音内容倾向于消失,导致语音质量降级。另一个问题是,他们无法应对不包括在培训的看不见者。为了克服这些问题,我们将深度神经网络的自动语音识别(ASR)和自动扬声器验证(ASV)纳入基于VAE的VC。由于给出了从ASR模型预测的语音后验的语音内容,所以提出的VC可以克服质量劣化。我们的VC利用从ASV型号提取的D-Vec-Tors作为可处理看不见者的连续扬声器表示。实验结果表明,我们的VC在Mel-Cepstral失真和转换语音质量方面优于传统的VAE基VC。我们还调查了高级参数在VC中的效果,并揭示了1)一个大的D形维度,提供更好的ASV性能不一定改善转换的语音质量,而2)大量预先存储的扬声器提高了质量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号