首页> 外文会议>European Signal Processing Conference >3WRBM-based speech factor modeling for arbitrary-source and non-parallel voice conversion

【24h】

3WRBM-based speech factor modeling for arbitrary-source and non-parallel voice conversion

机译：基于3WRBM的语音因子建模，用于任意源和非并行语音转换

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In recent years, voice conversion (VC) becomes a popular technique since it can be applied to various speech tasks. Most existing approaches on VC must use aligned speech pairs (parallel data) of the source speaker and the target speaker in training, which makes hard to handle it. Furthermore, VC methods proposed so far require to specify the source speaker in conversion stage, even though we just want to obtain the speech of the target speaker from the other speakers in many cases of VC. In this paper, we propose a VC method where it is not necessary to use any parallel data in the training, nor to specify the source speaker in the conversion. Our approach models a joint probability of acoustic, phonetic, and speaker features using a three-way restricted Boltzmann machine (3WRBM). Speaker-independent (SI) and speaker-dependent (SD) parameters in our model are simultaneously estimated under the maximum likelihood (ML) criteria using a speech set of multiple speakers. In conversion stage, phonetic features are at first estimated in a probabilistic manner given a speech of an arbitrary speaker, then a voice-converted speech is produced using the SD parameters of the target speaker. Our experimental results showed not only that our approach outperformed other non-parallel VC methods, but that the performance of the arbitrary-source VC was close to those of the traditional source-specified VC in our approach.

机译：近年来，语音转换（VC）成为一种流行的技术，因为它可以应用于各种语音任务。在VC上，大多数现有方法都必须在训练中使用源说话者和目标说话者的对齐语音对（并行数据），这使其难以处理。此外，到目前为止提出的VC方法需要在转换阶段指定源说话者，即使在许多VC情况下，我们只是想从其他说话者那里获取目标说话者的语音。在本文中，我们提出了一种VC方法，该方法无需在训练中使用任何并行数据，也无需在转换中指定源说话者。我们的方法使用三向受限玻尔兹曼机（3WRBM）对声学，语音和扬声器功能的联合概率进行建模。我们的模型中的说话者无关（SI）和说话者无关（SD）参数是使用多个说话者的语音集在最大似然（ML）标准下同时估算的。在转换阶段，首先以给定任意讲话者的语音的概率方式估计语音特征，然后使用目标讲话者的SD参数生成语音转换后的语音。我们的实验结果表明，我们的方法不仅优于其他非并行VC方法，而且在我们的方法中，任意源VC的性能都接近于传统的特定于源VC的性能。

著录项

来源
《European Signal Processing Conference》|2016年|607-611|共5页
会议地点
作者
Toru Nakashika; Yasuhiro Minami;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Speech; Acoustics; Training; Data models; Probabilistic logic; Europe; Signal processing;

机译：语音;声学;培训;数据模型;概率逻辑;欧洲;信号处理;

相似文献

外文文献
中文文献
专利

1. Joint Adversarial Training of Speech Recognition and Synthesis Models for Many-to-One Voice Conversion Using Phonetic Posteriorgrams [J] . Yuki SAITO, Kei AKUZAWA, Kentaro TACHIBANA IEICE transactions on information and systems . 2020,第9期

机译：使用拼音后验措施的多对一语音转换的语音识别和综合模型的联合对抗训练
2. Acoustic Modeling Using Restricted Boltzmann Machines and Deep Belief Networks for Statistical Parametric Speech Synthesis and Voice Conversion [J] . Zhen-Hua Ling, Ling-Hui Chen, Li-Rong Dai 電子情報通信学会技術研究報告. 音声. Speech . 2013,第366期

机译：使用受限Boltzmann机和Deep Belief网络进行声学建模以进行统计参数语音合成和语音转换
3. Esophageal Speech Enhancement Based on Statistical Voice Conversion with Gaussian Mixture Models [J] . Hironori DOI, Keigo NAKAMURA, Tomoki TODA, IEICE transactions on information and systems . 2010,第9期

机译：基于高斯混合模型的统计语音转换的食道语音增强
4. 3WRBM-based speech factor modeling for arbitrary-source and non-parallel voice conversion [C] . Toru Nakashika, Yasuhiro Minami European Signal Processing Conference . 2016

机译：基于3WRBM的语音因子模型，用于任意源和非并行语音转换
5. Speech synthesis algorithms for voice conversion. [D] . Hsiao, Yung-Sheng. 1996

机译：用于语音转换的语音合成算法。
6. A preliminary study on improving the recognition of esophageal speech using a hybrid system based on statistical voice conversion [O] . Othman Lachhab, Joseph Di Martino, Elhassane Ibn Elhaj, -1

机译：基于统计语音转换的混合系统改善食道语音识别的初步研究
7. Mixture of Factor Analyzers Using Priors from Non-Parallel Speech for Voice Conversion [O] . Zhizheng Wu, Tomi Kinnunen, Eng Siong Chng, 2013

机译：因子分析器的混合使用来自非并行语音的引物进行语音转换
8. Speech Spectral Moment Convergence. Voiced-Voiceless Consonant Contrasts in Whispered Speech [R] . Golomb, S. W. 1967

机译：语音谱瞬间收敛。低语言中的浊音 - 无声辅音对比

3WRBM-based speech factor modeling for arbitrary-source and non-parallel voice conversion

摘要

著录项

相似文献

相关主题

期刊订阅