首页> 外文会议>2017 IEEE Automatic Speech Recognition and Understanding Workshop >Sparse representation of phonetic features for voice conversion with and without parallel data
【24h】

Sparse representation of phonetic features for voice conversion with and without parallel data

机译:带有或不带有并行数据的语音转换的语音特征的稀疏表示

获取原文
获取原文并翻译 | 示例

摘要

This paper presents a voice conversion framework that uses phonetic information in an exemplar-based voice conversion approach. The proposed idea is motivated by the fact that phone-dependent exemplars lead to better estimation of activation matrix, therefore, possibly better conversion. We propose to use the phone segmentation results from automatic speech recognition (ASR) to construct a sub-dictionary for each phone. The proposed framework can work with or without parallel training data. With parallel training data, we found that phonetic sub-dictionary outperforms the state-of-the-art baseline in objective and subjective evaluations. Without parallel training data, we use Phonetic PosteriorGrams (PPGs) as the speaker-independent exemplars in the phonetic sub-dictionary to serve as a bridge between speakers. We report that such technique achieves a competitive performance without the need of parallel training data.
机译:本文提出了一种语音转换框架,该框架在基于示例的语音转换方法中使用了语音信息。所提出的想法是受以下事实激励的:与电话相关的示例导致对激活矩阵的更好估计,因此,可能会有更好的转换。我们建议使用自动语音识别(ASR)的电话细分结果为每个电话构建一个子词典。所提出的框架可以使用或不使用并行训练数据。利用并行的训练数据,我们发现,在客观和主观评估中,语音子词典的性能优于最新的基线。如果没有并行的训练数据,我们将语音后语法(PPG)用作语音子词典中与说话者无关的示例,从而充当说话者之间的桥梁。我们报告说,这种技术不需要并行的训练数据就可以达到竞争性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号