On the use of I-vectors and average voice model for voice conversion without parallel data

机译：关于使用I矢量和平均语音模型进行无并行数据的语音转换

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Recently, deep and/or recurrent neural networks (DNNs/RNNs) have been employed for voice conversion, and have significantly improved the performance of converted speech. However, DNNs/RNNs generally require a large amount of parallel training data (e.g., hundreds of utterances) from source and target speakers. It is expensive to collect such a large amount of data, and impossible in some applications, such as cross-lingual conversion. To solve this problem, we propose to use average voice model and i-vectors for long short-term memory (LSTM) based voice conversion, which does not require parallel data from source and target speakers. The average voice model is trained using other speakers' data, and the i-vectors, a compact vector representing the identities of source and target speakers, are extracted independently. Subjective evaluation has confirmed the effectiveness of the proposed approach.

机译：最近，深度和/或递归神经网络（DNN / RNN）已用于语音转换，并且已大大改善了转换语音的性能。然而，DNN / RNN通常需要来自源说话者和目标说话者的大量并行训练数据（例如，数百个发声）。收集如此大量的数据非常昂贵，并且在某些应用程序中（例如跨语言转换）是不可能的。为了解决此问题，我们建议使用平均语音模型和i-vector进行基于长短期记忆（LSTM）的语音转换，该方法不需要源和目标说话者的并行数据。使用其他说话者的数据训练平均语音模型，并且独立提取i矢量（表示源说话者和目标说话者身份的紧凑矢量）。主观评估已经证实了该方法的有效性。

著录项

来源
《Asia-Pacific Signal and Information Processing Association Annual Summit and Conference》|2016年|1-6|共6页
会议地点
作者
Jie Wu; Zhizheng Wu; Lei Xie;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Speech; Training; Feature extraction; Data models; Adaptation models; Data mining; Speech processing;

机译：语音;训练;特征提取;数据模型;适应模型;数据挖掘;语音处理;

相似文献

外文文献
中文文献
专利

1. Spectral Mapping Using Prior Re-Estimation of i-Vectors and System Fusion for Voice Conversion [J] . Monisankha Pal, Goutam Saha Audio, Speech, and Language Processing, IEEE/ACM Transactions on . 2017,第11期

机译：使用i-Vector的预先重新估计和系统融合进行语音转换的频谱映射
2. Real-Time Full-Band Voice Conversion with Sub-Band Modeling and Data-Driven Phase Estimation of Spectral Differentials [J] . Takaaki SAEKI, Yuki SAITO, Shinnosuke TAKAMICHI, IEICE transactions on information and systems . 2021,第7期

机译：具有子带建模的实时全带语音转换和频谱差分的数据驱动阶段估计
3. TEXT-INDEPENDENT VOICE CONVERSION BASED ON CHINESE PHONEME CLASSIFICATION AND KERNEL EIGENVOICES GAUSSIAN MIXTURE MODEL [J] . YANPING LI, LINGHUA ZHANG, HUI DING International Journal of Information Acquisition . 2011,第4期

机译：基于汉语语音分类和核本征语音高斯混合模型的文本无关语音转换
4. On the use of I-vectors and average voice model for voice conversion without parallel data [C] . Jie Wu, Zhizheng Wu, Lei Xie Asia-Pacific Signal and Information Processing Association Annual Summit and Conference . 2016

机译：关于使用i-vors和平均语音模型的语音转换而没有并行数据的语音转换
5. Posteriorgram-to-Acoustic Modeling for Unconstrained Voice Conversion with Deep Learning [D] . Sun, Lifa. 2017

机译：用于深度学习的无约束语音转换的后部图到声音建模
6. Applying FSL to the FIAC Data: Model-Based and Model-Free Analysis of Voice and Sentence Repetition Priming [O] . Christian F. Beckmann, Mark Jenkinson, Mark W. Woolrich, 2006

机译：将FSL应用于FIAC数据：语音和句子重复启动的基于模型和无模型的分析
7. Non-Parallel Voice Conversion Using I-Vector PLDA: Towards Unifying Speaker Verification and Transformation [O] . Kinnunen, Tomi, Juvela, Lauri, Alku, Paavo, 2017

机译：使用I-Vector pLDa实现非并行语音转换：实现统一的说话人验证和转换

On the use of I-vectors and average voice model for voice conversion without parallel data

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅