Pitch-Scaled Spectrum Based Excitation Model for HMM-based Speech Synthesis

Zhengqi Wen; Jianhua Tao; Shifeng Pan; Yang Wang

首页> 外文期刊>Journal of VLSI signal processing systems for signal, image, and video technology >Pitch-Scaled Spectrum Based Excitation Model for HMM-based Speech Synthesis

【24h】

Pitch-Scaled Spectrum Based Excitation Model for HMM-based Speech Synthesis

机译：基于音高的频谱激励基于HMM的语音合成模型

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The speech generated by hidden Markov model (HMM)-based speech synthesis systems (HTS) suffers from a 'buzzing' sound, which is due to an over-simplified vocoding technique. This paper proposes a new excitation model that uses a pitch-scaled spectrum for the parametric representation of speech in HTS. A residual signal produced using inverse filtering retains the detailed harmonic structure of speech that is not part of the linear prediction (LP) spectrum. By using pitch-scaled spectrums, we can compensate the LP spectrum using the detailed harmonic structure of the residual signal. This spectrum can be compressed using a periodic excitation parameter so that it can used to train HTS. We define an aperiodic measure as the harmonics-to-noise ratio, and calculate a voicing-cut off frequency to fit the aperiodic measure to a sigmoid function. We combine the LP coefficient, pitch-scaled spectrum, and sigmoid function to create a new parametric representation of speech. Listening tests were carried out to evaluate the effectiveness of the proposed technique. This vocoder received a mean opinion score of 4.0 in analysis-synthesis experiments, before dimensionality reduction. By integrating this vocoder into HTS, we improved the sound of the synthesized speech compared with the pulse train excitation model, and demonstrated an even better result than STRAIGHT-HTS.

机译：基于隐马尔可夫模型（HMM）的语音合成系统（HTS）生成的语音遭受“嗡嗡”声，这是由于语音编码技术过于简化所致。本文提出了一种新的激励模型，该模型将音调缩放频谱用于HTS中语音的参数表示。使用逆滤波产生的残差信号保留了语音的详细谐波结构，这不是线性预测（LP）频谱的一部分。通过使用音调标度频谱，我们可以使用残差信号的详细谐波结构来补偿LP频谱。可以使用周期性激励参数来压缩该频谱，以便可以将其用于训练HTS。我们将非周期性量度定义为谐波噪声比，并计算一个音调截止频率，以使非周期性量度适合S型函数。我们结合LP系数，音高缩放频谱和S形函数来创建语音的新参数表示。进行听力测试以评估所提出技术的有效性。在降维之前，该声码器在分析综合实验中获得的平均意见得分为4.0。通过将此声码器集成到HTS中，与脉冲序列激励模型相比，我们改善了合成语音的声音，并证明了比STRAIGHT-HTS更好的结果。

著录项

来源
《Journal of VLSI signal processing systems for signal, image, and video technology》 |2014年第3期|423-435|共13页
作者
Zhengqi Wen; Jianhua Tao; Shifeng Pan; Yang Wang;
展开▼
作者单位

National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Science, Beijing, China;

National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Science, Beijing, China;

National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Science, Beijing, China;

National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Science, Beijing, China;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Speech synthesis; HMM-based speech synthesi Parametric representation of speech; Excitation model; Pitch-scaled spectrum;

机译：语音合成;基于HMM的语音合成语音的参数表示;激励模型节距频谱;

相似文献

外文文献
中文文献
专利

1. Statistical Approaches to Excitation Modeling in HMM-Based Speech Synthesis [J] . June Sig SUNG, Doo Hwa HONG, Hyun Woo KOO, IEICE transactions on information and systems . 2013,第2期

机译：基于HMM的语音合成中激励建模的统计方法
2. Statistical Approaches to Excitation Modeling in HMM-Based Speech Synthesis [J] . June Sig SUNG, Doo Hwa HONG, Hyun Woo KOO, IEICE Transactions on Information and Systems . 2013,第2期

机译：基于HMM的语音合成中激励建模的统计方法
3. Application of eigenvoice technique to spectrum and pitch pattern modeling in HMM-based speech synthesis [J] . Atsushi Sawabe, Kengo Shichiri, Takayoshi Yoshimura, 電子情報通信学会技術研究報告. 信号処理. Signal Processing . 2001,第323期

机译：特征语音技术在基于HMM的语音合成中的频谱和音高模式建模中的应用
4. Pitch-scaled spectrum based excitation model for HMM-based Speech Synthesis [C] . Wen Zhengqi, Tao Jianhua, Hain Horst-Udo IEEE International Conference on Signal Processing . 2012

机译：基于音高的频谱激励模型用于基于HMM的语音合成
5. Analysis and synthesis on speech based on an human auditory modeling [D] . Lee, Minkyu. 1996

机译：基于人类听觉建模的言语分析与综合
6. Neural Spike-Train Analyses of the Speech-Based Envelope Power Spectrum Model [O] . Varsha H. Rallapalli, Michael G. Heinz 2016

机译：基于语音的包络功率谱模型的神经峰值训练分析
7. An Excitation Model for HMM-Based Speech Synthesis Based on Residual Modeling [O] . Ranniery Maia, Tomoki Toda, Heiga Zen, 2007

机译：基于残留建模的基于HMM的语音合成激励模型

Pitch-Scaled Spectrum Based Excitation Model for HMM-based Speech Synthesis

摘要

著录项

相似文献

相关主题

期刊订阅