首页> 外文期刊>IEEE transactions on audio, speech and language processing >Applying a Speaker-Dependent Speech Compression Technique to Concatenative TTS Synthesizers
【24h】

Applying a Speaker-Dependent Speech Compression Technique to Concatenative TTS Synthesizers

机译:将基于扬声器的语音压缩技术应用于串联TTS合成器

获取原文
获取原文并翻译 | 示例

摘要

This paper proposes a new speaker-dependent coding algorithm to efficiently compress a large speech database for corpus-based concatenative text-to-speech (TTS) engines while maintaining high fidelity. To achieve a high compression ratio and meet the fundamental requirements of concatenative TTS synthesizers, such as partial segment decoding and random access capability, we adopt a nonpredictive analysis-by-synthesis scheme for speaker-dependent parameter estimation and quantization. The spectral coefficients are quantized by using a memoryless split vector quantization (VQ) approach that does not use frame correlation. Considering that excitation signals of a specific speaker show low intra-variation especially in the voiced regions, the conventional adaptive codebook for pitch prediction is replaced by a speaker-dependent pitch-pulse codebook trained by a corpus of single-speaker speech signals. To further improve the coding efficiency, the proposed coder flexibly combines nonpredictive and predictive type method considering the structure of the TTS system. By applying the proposed algorithm to a Korean TTS system, we could obtain comparable quality to the G.729 speech coder and satisfy all the requirements that TTS system needs. The results are verified by both objective and subjective quality measurements. In addition, the decoding complexity of the proposed coder is around 55% lower than that of G.729 annex A
机译:本文提出了一种新的基于说话者的编码算法,可以有效地压缩大型语音数据库,以用于基于语料库的级联文本到语音(TTS)引擎,同时保持高保真度。为了达到较高的压缩率并满足串联TTS合成器的基本要求,例如部分片段解码和随机访问能力,我们采用了非预测性的合成分析方案来进行与说话者相关的参数估计和量化。通过使用不使用帧相关性的无记忆分割矢量量化(VQ)方法对频谱系数进行量化。考虑到特定说话者的激励信号显示出低的内部变化,特别是在浊音区域中,用于音调预测的常规自适应码本被由单说话者语音信号语料库训练的与说话者相关的音调-脉冲码本所代替。为了进一步提高编码效率,提出的编码器考虑了TTS系统的结构,灵活地将非预测型和预测型方法相结合。通过将所提出的算法应用于韩国的TTS系统,我们可以获得与G.729语音编码器相当的质量,并满足TTS系统所需的所有要求。通过客观和主观质量测量来验证结果。此外,建议的编码器的解码复杂度比G.729附件A低约55%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号