Applying a Speaker-Dependent Speech Compression Technique to Concatenative TTS Synthesizers

Lee C.-H.; Jung S.-K.; Kang H.-G.

首页> 外文期刊>IEEE transactions on audio, speech and language processing >Applying a Speaker-Dependent Speech Compression Technique to Concatenative TTS Synthesizers

【24h】

Applying a Speaker-Dependent Speech Compression Technique to Concatenative TTS Synthesizers

机译：将基于扬声器的语音压缩技术应用于串联TTS合成器

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

This paper proposes a new speaker-dependent coding algorithm to efficiently compress a large speech database for corpus-based concatenative text-to-speech (TTS) engines while maintaining high fidelity. To achieve a high compression ratio and meet the fundamental requirements of concatenative TTS synthesizers, such as partial segment decoding and random access capability, we adopt a nonpredictive analysis-by-synthesis scheme for speaker-dependent parameter estimation and quantization. The spectral coefficients are quantized by using a memoryless split vector quantization (VQ) approach that does not use frame correlation. Considering that excitation signals of a specific speaker show low intra-variation especially in the voiced regions, the conventional adaptive codebook for pitch prediction is replaced by a speaker-dependent pitch-pulse codebook trained by a corpus of single-speaker speech signals. To further improve the coding efficiency, the proposed coder flexibly combines nonpredictive and predictive type method considering the structure of the TTS system. By applying the proposed algorithm to a Korean TTS system, we could obtain comparable quality to the G.729 speech coder and satisfy all the requirements that TTS system needs. The results are verified by both objective and subjective quality measurements. In addition, the decoding complexity of the proposed coder is around 55% lower than that of G.729 annex A

机译：本文提出了一种新的基于说话者的编码算法，可以有效地压缩大型语音数据库，以用于基于语料库的级联文本到语音（TTS）引擎，同时保持高保真度。为了达到较高的压缩率并满足串联TTS合成器的基本要求，例如部分片段解码和随机访问能力，我们采用了非预测性的合成分析方案来进行与说话者相关的参数估计和量化。通过使用不使用帧相关性的无记忆分割矢量量化（VQ）方法对频谱系数进行量化。考虑到特定说话者的激励信号显示出低的内部变化，特别是在浊音区域中，用于音调预测的常规自适应码本被由单说话者语音信号语料库训练的与说话者相关的音调-脉冲码本所代替。为了进一步提高编码效率，提出的编码器考虑了TTS系统的结构，灵活地将非预测型和预测型方法相结合。通过将所提出的算法应用于韩国的TTS系统，我们可以获得与G.729语音编码器相当的质量，并满足TTS系统所需的所有要求。通过客观和主观质量测量来验证结果。此外，建议的编码器的解码复杂度比G.729附件A低约55％。

著录项

来源
《IEEE transactions on audio, speech and language processing》 |2007年第2期|p.632-640|共9页
作者
Lee C.-H.; Jung S.-K.; Kang H.-G.;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类自动化技术、计算机技术;
关键词
computational complexity; decoding; speech coding; speech synthesis; vector quantisation; G.729 speech coder; concatenative TTS synthesizers; corpus-based concatenative text-to-speech engines; decoding complexity; nonpredictive analysis-by-synthesis scheme; partia;

机译：计算复杂度;解码;语音编码;语音合成;矢量量化;G.729语音编码器;级联TTS合成器;基于语料的级联文本到语音引擎;解码复杂度;非预测性综合分析方案;部分;

相似文献

外文文献
中文文献
专利

1. Developing Concatenative Based Text to Speech Synthesizer for Tigrigna Language [J] . Mezgebe Araya Keletay, Hussien Seid Worku Internet of Things and Cloud Computing . 2020,第2期

机译：为TIGrigna语言开发基于连接的文本到语音合成器
2. Quality Preserving Compression of a Concatenative Text-To-Speech Acoustic Database [J] . Shoham T., Malah D., Shechtman S. Audio, Speech, and Language Processing, IEEE Transactions on . 2012,第3期

机译：级联文本语音语音数据库的质量保留压缩
3. A Speaker-Dependent Approach to Single-Channel Joint Speech Separation and Acoustic Modeling Based on Deep Neural Networks for Robust Recognition of Multi-Talker Speech [J] . Yan-Hui Tu, Jun Du, Chin-Hui Lee Journal of signal processing systems for signal, image, and video technology . 2018,第7期

机译：基于说话者的基于深度神经网络的单通道联合语音分离和声学建模方法，用于多语音对话的鲁棒识别
4. Subjective and Spectrogram Analysis of Speech Synthesizer for Marathi TTS Using Concatenative Synthesis [C] . Shirbahadurkar S.D., Bormane D.S., Kazi R.L. 2010 International Conference on Recent Trends in Information, Telecommunication and Computing . 2010

机译：基于级联合成的Marathi TTS语音合成器的主观和频谱图分析
5. Advances in speaker-dependent concatenative speech synthesis. [D] . Chappell, David Thomas. 2000

机译：说话者相关的级联语音合成技术的进步。
6. One-against-All Weighted Dynamic Time Warping for Language-Independent and Speaker-Dependent Speech Recognition in Adverse Conditions [O] . Xianglilan Zhang, Jiping Sun, Zhigang Luo 2010

机译：不利条件下与语言无关和与说话者相关的语音识别的一对多加权动态时间规整
7. Combining missing-feature theory, speech enhancement, and speaker-dependent/-independent modeling for speech separation [O] . Ji Ming, Timothy J. Hazen, James R. Glass 2013

机译：将缺失特征理论，语音增强和说话者相关/独立建模结合起来进行语音分离

Applying a Speaker-Dependent Speech Compression Technique to Concatenative TTS Synthesizers

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅