首页> 外文期刊>電子情報通信学会技術研究報告. 音声. Speech >Statistical sequence-to-frame mapping techniques for voice conversion
【24h】

Statistical sequence-to-frame mapping techniques for voice conversion

机译:Statistical sequence-to-frame mapping techniques for voice conversion

获取原文
获取原文并翻译 | 示例
       

摘要

Voice conversion, a task to transform one speaker's voice to another's, can be regarded as a problem to find a mapping function between voice spaces of two speakers. GMM-based statistical mapping methods have been widely used for voice conversion. However, the classical GMM-based techniques make use of a frame-to-frame mapping function, which largely ignores the contextual information existing over a speech sequence and usually causes over-smoothness of converted speech. It is well known that HMM yields an efficient method to model the density of a whole speech sequence and has found successes in speech recognition and synthesis. Inspired by this fact, this paper studies how to use HMM for voice conversion. We derive an HMM-based sequence-to-frame mapping function with statistical analysis. Different from previous HMM-based voice conversion methods that used forced alignment for segmentation and transform frames aligned to a state with its associated linear transformation, our method has a soft mapping function as a weighted summation of linear transformations. The weights are calculated as the HMM posterior probabilities of frames. We also propose and compare two methods to learn the parameters of our mapping functions, namely least square error estimation and maximum likelihood estimation. We carried out experiments to examine the proposed HMM-based method for voice conversion.

著录项

获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号