首页> 外文会议>International Conference on Bio-inspired Systems and Signal Processing >MODEL-MAPPING BASED VOICE CONVERSION SYSTEM: A Novel Approach to Improve Voice Similarity and Naturalness using Model-based Speech Synthesis Techniques
【24h】

MODEL-MAPPING BASED VOICE CONVERSION SYSTEM: A Novel Approach to Improve Voice Similarity and Naturalness using Model-based Speech Synthesis Techniques

机译:基于模型映射的语音转换系统:一种新的方法,可以使用基于模型的语音合成技术提高语音相似性和自然的方法

获取原文

摘要

In this paper we present a novel voice conversion application in which no any knowledge of source speakers is available, but only sufficient utterances from a target speaker and a number of other speakers are in hand. Our approach consists in two separate stages. At the training stage, we estimate a speaker dependent (SD) Gaussian mixture model (GMM) for the target speaker and additionally, we also estimate a speaker independent (SI) GMM by using the data from a number of speakers other than the source speaker. A mapping correlation between the SD and the SI model is maintained during the training process in terms of each phone label. At the conversion stage, we use the SI GMM to recognize each input frame and find the closest Gaussian mixture for it. Next, according to a mapping list, the counterpart Gaussian of the SD GMM is obtained and then used to generate a parameter vector for each frame vector. Finally all the generated vectors are concatenated to synthesize speech of the target speaker. By using the proposed model-mapping approach, we can not only avoid the over-fitting problem by keeping the number of mixtures of the SI GMM to a fixed value, but also simultaneously improve voice quality in terms of similarity and naturalness by increasing the number of mixtures of the SD GMM. Experiments showed the effectiveness of this method.
机译:在本文中,我们提出一个新的语音转换应用程序在其中没有源扬声器任何知识可用,但是从目标讲话者和其他一些扬声器的仅足够的话语是在手。我们的方法是由两个独立的阶段。在训练阶段,我们估计一个说话者相关(SD)高斯混合模型(GMM)为目标说话者,并且另外,我们还通过使用从数大于源扬声器其它扬声器的数据估计说话者无关(SI)GMM 。所述SD和SI模型之间的映射相关过程中的每个电话标签方面的训练过程被保持。在转换阶段,我们使用的是SI GMM识别每个输入帧,并为它找到最接近的高斯混合。接着,根据映射表中,SD GMM的对应高斯获得,然后用于生成每个帧矢量的参数矢量。最后,所有产生的矢量被连接到目标扬声器的合成语音。通过使用该模型映射方法,我们不仅可以通过保持SI GMM的混合物的数量为固定值避免过度装修的问题,但也同时通过增加数量来提高相似度和自然方面的语音质量的SD GMM的混合物。实验表明这种方法的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号