首页> 外文期刊>Neurocomputing >Novel approach of MFCC based alignment and WD-residual modification for voice conversion using RBF
【24h】

Novel approach of MFCC based alignment and WD-residual modification for voice conversion using RBF

机译:基于RBF的基于MFCC的对齐和WD残差修改的新方法用于语音转换

获取原文
获取原文并翻译 | 示例
       

摘要

The voice conversion system modifies the speaker specific characteristics of the source speaker to that of the target speaker, so it perceives like target speaker. The speaker specific characteristics of the speech signal are reflected at different levels such as the shape of the vocal tract, shape of the glottal excitation and long term prosody. The shape of the vocal tract is represented by Line Spectral Frequency (LSF) and the shape of glottal excitation by Linear Predictive (LP) residuals. In this paper, the fourth level wavelet packet transform is applied to LP-residual to generate the sixteen sub-bands. This approach not only reduces the computational complexity but also presents a genuine transformation model over state of the art statistical prediction methods. In voice conversion, the alignment is an essential process which aligns the features of the source and target speakers. In this paper, the Mel Frequency Cepstrum Coefficients (MFCC) based warping path is proposed to align the LSF and LP-residual sub-bands using proposed constant source and constant target alignment. The conventional alignment technique is compared with two proposed approaches namely, constant source and constant target. Analysis shows that, constant source alignment using MFCC warping path performs slightly better than the constant target alignment and the state-of-the-art alignment approach. Generalized mapping models are developed for each sub-band using Radial Basis Function neural network (RBF) and are compared with Gaussian Mixture mapping model (GMM) and residual selection approach. Various subjective and objective evaluation measures indicate significant performance of RBF based residual mapping approach over the state-of-the-art approaches. (C) 2016 Elsevier B.V. All rights reserved.
机译:语音转换系统将源说话者的说话者特定特征修改为目标说话者的特征,因此感觉像目标说话者。语音信号的说话者特定特征在不同级别上得到反映,例如声道的形状,声门的激励形状和长期韵律。声道的形状由线频谱频率(LSF)表示,声门的激发形状由线性预测(LP)残差表示。本文将第四级小波包变换应用于LP残差生成16个子带。这种方法不仅降低了计算复杂性,而且在最先进的统计预测方法上提供了一种真正的转换模型。在语音转换中,对齐是将源扬声器和目标扬声器的功能对齐的重要过程。在本文中,提出了基于梅尔频率倒谱系数(MFCC)的翘曲路径,以使用建议的恒定源和恒定目标对准来对准LSF和LP残余子带。将常规的对准技术与两种提出的方​​法进行比较,即恒定光源和恒定目标。分析表明,使用MFCC翘曲路径进行恒定源对准比恒定目标对准和最新对准方法要好一些。使用径向基函数神经网络(RBF)为每个子带开发了通用的映射模型,并将其与高斯混合映射模型(GMM)和残差选择方法进行了比较。各种主观和客观的评估方法都表明,基于RBF的残差映射方法的性能优于最新方法。 (C)2016 Elsevier B.V.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号