...
首页> 外文期刊>Audio, Speech, and Language Processing, IEEE/ACM Transactions on >Spectral Mapping Using Prior Re-Estimation of i-Vectors and System Fusion for Voice Conversion
【24h】

Spectral Mapping Using Prior Re-Estimation of i-Vectors and System Fusion for Voice Conversion

机译:使用i-Vector的预先重新估计和系统融合进行语音转换的频谱映射

获取原文
获取原文并翻译 | 示例
           

摘要

In this paper, we propose a new voice conversion (VC) method using i-vectors which consider low-dimensional representation of speech utterances. An attempt is made to restrict the i-vector variability in the intermediate computation of total variability (T) matrix by using a novel approach that uses modified-prior distribution of the intermediate i-vectors. This T-modification improves the speaker individuality conversion. For further improvement of conversion score and to keep a better balance between similarity and quality, band-wise spectrogram fusion between conventional joint density Gaussian mixture model (JDGMM) and i-vector based converted spectrograms is employed. The fused spectrogram retains more spectral details and leverages the complementary merits of each subsystem. Experiments in terms of objective and subjective evaluation are conducted extensively on CMU ARCTIC database. The results show that the proposed technique can produce a better trade-off between similarity and quality score than other state-of-the-art baseline VC methods. Furthermore, it works better than JDGMM in limited VC training data. The proposed VC performs moderately better (both objective and subjective) than mixture of factor analyzer based baseline VC. In addition, the proposed VC provides better quality converted speech as compared to maximum likelihood-GMM VC with dynamic feature constraint.
机译:在本文中,我们提出了一种新的使用i-vector的语音转换(VC)方法,该方法考虑了语音的低维表示。试图通过使用使用中间i向量的修改后的先验分布的新颖方法,在总可变性(T)矩阵的中间计算中限制i向量的可变性。这种T修饰可以改善说话者的个性转换。为了进一步提高转换得分并在相似度和质量之间保持更好的平衡,在常规联合密度高斯混合模型(JDGMM)和基于i-vector的转换光谱图之间采用了带状谱图融合。融合的频谱图保留了更多的频谱细节,并充分利用了每个子系统的互补优势。在CMU ARCTIC数据库上进行了客观和主观评估方面的实验。结果表明,与其他最新的基线VC方法相比,所提出的技术可以在相似度和质量得分之间产生更好的折衷。此外,在有限的VC培训数据中,它比JDGMM更好。与基于因子分析器的基线VC的混合相比,拟议的VC的性能(客观和主观)要好一些。另外,与具有动态特征约束的最大似然-GMM VC相比,所提出的VC提供了更好质量的语音转换。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号