首页> 外文期刊>Audio, Speech, and Language Processing, IEEE Transactions on >Voice Conversion Using Dynamic Frequency Warping With Amplitude Scaling, for Parallel or Nonparallel Corpora
【24h】

Voice Conversion Using Dynamic Frequency Warping With Amplitude Scaling, for Parallel or Nonparallel Corpora

机译:针对并行或非并行语料库,使用具有幅度缩放比例的动态频率规整的语音转换

获取原文
获取原文并翻译 | 示例

摘要

In Voice Conversion (VC), the speech of a source speaker is modified to resemble that of a particular target speaker. Currently, standard VC approaches use Gaussian mixture model (GMM)-based transformations that do not generate high-quality converted speech due to “over-smoothing” resulting from weak links between individual source and target frame parameters. Dynamic Frequency Warping (DFW) offers an appealing alternative to GMM-based methods, as more spectral details are maintained in transformation; however, the speaker timbre is less successfully converted because spectral power is not adjusted explicitly. Previous work combines separate GMM- and DFW-transformed spectral envelopes for each frame. This paper proposes a more effective DFW-based approach that (1) does not rely on the baseline GMM methods, and (2) functions on the acoustic class level. To adjust spectral power, an amplitude scaling function is used that compares the average target and warped source log spectra for each acoustic class. The proposed DFW with Amplitude scaling (DFWA) outperforms standard GMM and hybrid GMM-DFW methods for VC in terms of both speech quality and timbre conversion, as is confirmed in extensive objective and subjective testing. Furthermore, by not requiring time-alignment of source and target speech, DFWA is able to perform equally well using parallel or nonparallel corpora, as is demonstrated explicitly.
机译:在语音转换(VC)中,将源说话者的语音修改为类似于特定目标说话者的语音。当前,标准的VC方法使用基于高斯混合模型(GMM)的转换,由于各个源和目标帧参数之间的弱链接导致“过度平滑”,因此不会生成高质量的转换语音。动态频率整形(DFW)提供了一种有吸引力的替代方法,可以替代基于GMM的方法,因为在变换中可以保留更多频谱细节。但是,由于没有显式调整频谱功率,因此扬声器音色转换不太成功。先前的工作为每帧组合了单独的GMM和DFW变换的频谱包络。本文提出了一种更有效的基于DFW的方法,该方法(1)不依赖于基线GMM方法,(2)在声学级别上起作用。为了调整频谱功率,使用了一个幅度缩放函数,该函数将每个声学类别的平均目标频谱和扭曲的源对数频谱进行比较。拟议的具有幅度缩放比例(DFWA)的DFW在语音质量和音色转换方面均优于标准的GMM和VC混合GMM-DFW方法,这在广泛的客观和主观测试中得到了证实。此外,通过不要求对源语音和目标语音进行时间对齐,DFWA能够使用并行或非并行语料库同样出色地执行操作,如已明确证明的那样。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号