首页> 外文会议>European Signal Processing Conference >3WRBM-based speech factor modeling for arbitrary-source and non-parallel voice conversion
【24h】

3WRBM-based speech factor modeling for arbitrary-source and non-parallel voice conversion

机译:基于3WRBM的语音因子建模,用于任意源和非并行语音转换

获取原文

摘要

In recent years, voice conversion (VC) becomes a popular technique since it can be applied to various speech tasks. Most existing approaches on VC must use aligned speech pairs (parallel data) of the source speaker and the target speaker in training, which makes hard to handle it. Furthermore, VC methods proposed so far require to specify the source speaker in conversion stage, even though we just want to obtain the speech of the target speaker from the other speakers in many cases of VC. In this paper, we propose a VC method where it is not necessary to use any parallel data in the training, nor to specify the source speaker in the conversion. Our approach models a joint probability of acoustic, phonetic, and speaker features using a three-way restricted Boltzmann machine (3WRBM). Speaker-independent (SI) and speaker-dependent (SD) parameters in our model are simultaneously estimated under the maximum likelihood (ML) criteria using a speech set of multiple speakers. In conversion stage, phonetic features are at first estimated in a probabilistic manner given a speech of an arbitrary speaker, then a voice-converted speech is produced using the SD parameters of the target speaker. Our experimental results showed not only that our approach outperformed other non-parallel VC methods, but that the performance of the arbitrary-source VC was close to those of the traditional source-specified VC in our approach.
机译:近年来,语音转换(VC)成为一种流行的技术,因为它可以应用于各种语音任务。在VC上,大多数现有方法都必须在训练中使用源说话者和目标说话者的对齐语音对(并行数据),这使其难以处理。此外,到目前为止提出的VC方法需要在转换阶段指定源说话者,即使在许多VC情况下,我们只是想从其他说话者那里获取目标说话者的语音。在本文中,我们提出了一种VC方法,该方法无需在训练中使用任何并行数据,也无需在转换中指定源说话者。我们的方法使用三向受限玻尔兹曼机(3WRBM)对声学,语音和扬声器功能的联合概率进行建模。我们的模型中的说话者无关(SI)和说话者无关(SD)参数是使用多个说话者的语音集在最大似然(ML)标准下同时估算的。在转换阶段,首先以给定任意讲话者的语音的概率方式估计语音特征,然后使用目标讲话者的SD参数生成语音转换后的语音。我们的实验结果表明,我们的方法不仅优于其他非并行VC方法,而且在我们的方法中,任意源VC的性能都接近于传统的特定于源VC的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号