A Continuous Vocoder Using Sinusoidal Model for Statistical Parametric Speech Synthesis

机译：使用正弦模型的连续声码器用于统计参数语音合成

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In our earlier work in statistical parametric speech synthesis, we proposed a source-filter based vocoder using continuous FO (contFO) in combination with Maximum Voiced Frequency (MVF), which was successfully used with deep learning. The advantage of a continuous vocoder in this scenario is that vocoder parameters are simpler to model than conventional vocoders with discontinuous FO. However, our vocoder lacks some degree of naturalness and still not achieving a high-quality speech synthesis compared to the well-known vocoders (e.g. STRAIGHT or WORLD). Previous studies have shown that human voice can be modelled effectively as a sum of sinusoids. In this paper, we firstly address the design of a continuous vocoder using sinusoidal synthesis model that is applicable in statistical frameworks. The same three parameters of the analysis part from our previous model have been also extracted and used for this study. For refining the output of the contFO estimation, post-processing approach is utilized to reduce the unwanted voiced component of unvoiced speech sounds, resulting in a smoother contFO track. During synthesis, a sinusoidal model with minimum phase is applied to reconstruct speech. Finally, we have compared the voice quality of the proposed system to the STRAIGHT and WORLD vocoders. Experimental results from objective and subjective evaluations have shown that the proposed vocoder gives state-of-the-art vocoders performance in synthesized speech while outperforming the previous work of our continuous FO based source-filter vocoder.

机译：在我们早期的统计参量语音合成工作中，我们提出了一种基于源滤波器的声码器，它结合了连续FO（contFO）和最大语音频率（MVF），已成功用于深度学习。在这种情况下，连续声码器的优势在于，与具有不连续FO的常规声码器相比，声码器参数更易于建模。但是，与众所周知的声码器（例如STRAIGHT或WORLD）相比，我们的声码器缺乏某种程度的自然性，并且仍无法实现高质量的语音合成。先前的研究表明，人声可以有效地建模为正弦波的总和。在本文中，我们首先解决使用正弦综合模型的连续声码器的设计，该模型可用于统计框架。我们先前模型中分析部分的相同三个参数也已提取并用于本研究。为了完善contFO估计的输出，采用了后处理方法来减少清音语音中不需要的浊音成分，从而使contFO音轨更平滑。在合成过程中，将具有最小相位的正弦模型应用于语音重建。最后，我们将建议系统的语音质量与STRAIGHT和WORLD声码器进行了比较。来自主观和主观评估的实验结果表明，所提出的声码器在合成语音中提供了最先进的声码器性能，同时胜过了我们基于FO的连续源声码器声码器的先前工作。

著录项

来源
《International Conference on speech and computer》|2018年|11-20|共10页
会议地点
作者
Mohammed Salah Al-Radhi; Tamas Gabor Csapo; Geza Nemeth;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Continuous vocoder; Speech synthesis; Sinusoidal model ContFO;

机译：连续声码器语音合成;正弦模型ContFO;

相似文献

外文文献
中文文献
专利

1. A continuous vocoder for statistical parametric speech synthesis and its evaluation using an audio-visual phonetically annotated Arabic corpus [J] . Mohammed Salah Al-Radhi, Omnia Abdo, Tamas Gabor Csapo, Computer speech and language . 2020,第Mara期

机译：用于统计参量语音合成的连续声码器及其使用视听注解的阿拉伯语语料库的评估
2. Continuous Noise Masking Based Vocoder for Statistical Parametric Speech Synthesis [J] . Mohammed Salah AL-RADHI, Tamás Gábor CSAPó, Géza NéMETH IEICE transactions on information and systems . 2020,第5期

机译：基于连续噪声掩蔽的统计参数语音合成声码器
3. Harmonics Plus Noise Model Based Vocoder for Statistical Parametric Speech Synthesis [J] . Selected Topics in Signal Processing, IEEE Journal of . 2014,第2期

机译：统计参数语音合成的基于谐波加噪声模型的声码器
4. A Continuous Vocoder Using Sinusoidal Model for Statistical Parametric Speech Synthesis [C] . Mohammed Salah Al-Radhi, Tamas Gabor Csapo, Geza Nemeth International Conference on Speech and Computer . 2018

机译：使用正弦模型进行统计参数致辞合成的连续声子
5. Statistical Parametric Speech Synthesis using Deep Learning Architectures [D] . Kang, Shiyin. 2016

机译：使用深度学习架构的统计参数致辞
6. Discriminative Multi-Stream Postfilters Based on Deep Learning for Enhancing Statistical Parametric Speech Synthesis [O] . Marvin Coto-Jiménez 2021

机译：基于深度学习的判别多流破旧用于增强统计参数致辞综合
7. Analysis/Synthesis Comparison of Vocoders Utilized in Statistical Parametric Speech Synthesis [O] . Airaksinen Manu 2012

机译：统计参量语音合成中声码器的分析/综合比较

A Continuous Vocoder Using Sinusoidal Model for Statistical Parametric Speech Synthesis

摘要

著录项

相似文献

相关主题

期刊订阅