首页> 外文会议>IEEE Information Technology, Networking, Electronic and Automation Control Conference >Mandarin-Tibetan Bilingual Cross-language Voice Conversion Based on Semi-hidden Markov Model

Mandarin-Tibetan Bilingual Cross-language Voice Conversion Based on Semi-hidden Markov Model




In recent years, deep learning based voice conversion (VC) has significantly improved the performance of the conversion system. However, such systems generally require a large amount of parallel corpus from the source speakers and the target speaker, and the parallel corpus of Mandarin-Tibetan bilingual is difficult to obtain. In order to solve this problem, we propose a method based on the semi-hidden Markov model (HSMM) using Mandarin-Tibetan bilingual average voice model and speaker adaptive technology for VC. This method does not require parallel corpus. Firstly, it obtains the Mandarin-Tibetan bilingual average voice model using mixed language multi-speaker corpus training, then uses a small number of source speaker corpora to adaptively convert the average voice model to obtain the speaker-related acoustic model. Finally, the text corresponding to the source speaker's speech is translated, and the context-dependent labels of the translated text is input into the speaker-related acoustic model, the converted speech is output to realize cross-language VC. The experimental results show the effectiveness of this method, the converted speech MOS score: 3.37 points; DMOS score: 3.00 points.
机译:近年来,基于深度学习的语音转换(VC)大大提高了转换系统的性能。然而,这样的系统通常需要来自源说话者和目标说话者的大量平行语料,并且普通话-藏语双语者的平行语料难以获得。为了解决这个问题,我们提出了一种基于半隐马尔可夫模型(HSMM)的方法,该方法使用汉语-藏语双语平均语音模型和针对VC的说话者自适应技术。此方法不需要并行语料库。首先,它使用混合语言多说话者语料库训练获得普通话-藏语双语平均语音模型,然后使用少量源说话者语料库自适应地转换平均语音模型,以获得说话者相关的声学模型。最后,翻译与源说话者语音相对应的文本,并将翻译后的文本的上下文相关标签输入到与说话者相关的声学模型中,输出转换后的语音以实现跨语言VC。实验结果证明了该方法的有效性,转换后的语音MOS得分:3.37分; DMOS得分:3.00分。



  • 外文文献
  • 中文文献
  • 专利


京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号