首页> 外文会议>Asia-Pacific Signal and Information Processing Association Annual Summit and Conference >On the use of I-vectors and average voice model for voice conversion without parallel data
【24h】

On the use of I-vectors and average voice model for voice conversion without parallel data

机译:关于使用I矢量和平均语音模型进行无并行数据的语音转换

获取原文
获取外文期刊封面目录资料

摘要

Recently, deep and/or recurrent neural networks (DNNs/RNNs) have been employed for voice conversion, and have significantly improved the performance of converted speech. However, DNNs/RNNs generally require a large amount of parallel training data (e.g., hundreds of utterances) from source and target speakers. It is expensive to collect such a large amount of data, and impossible in some applications, such as cross-lingual conversion. To solve this problem, we propose to use average voice model and i-vectors for long short-term memory (LSTM) based voice conversion, which does not require parallel data from source and target speakers. The average voice model is trained using other speakers' data, and the i-vectors, a compact vector representing the identities of source and target speakers, are extracted independently. Subjective evaluation has confirmed the effectiveness of the proposed approach.
机译:最近,深度和/或递归神经网络(DNN / RNN)已用于语音转换,并且已大大改善了转换语音的性能。然而,DNN / RNN通常需要来自源说话者和目标说话者的大量并行训练数据(例如,数百个发声)。收集如此大量的数据非常昂贵,并且在某些应用程序中(例如跨语言转换)是不可能的。为了解决此问题,我们建议使用平均语音模型和i-vector进行基于长短期记忆(LSTM)的语音转换,该方法不需要源和目标说话者的并行数据。使用其他说话者的数据训练平均语音模型,并且独立提取i矢量(表示源说话者和目标说话者身份的紧凑矢量)。主观评估已经证实了该方法的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号