首页> 外文学位 >Improving high quality concatenative text-to-speech synthesis using the circular linear prediction model.
【24h】

Improving high quality concatenative text-to-speech synthesis using the circular linear prediction model.

机译:使用圆形线性预测模型改善高质量的串联文本到语音合成。

获取原文
获取原文并翻译 | 示例

摘要

Current high quality text-to-speech (TTS) systems are based on unit selection from a large database that is both contextually and prosodically rich. These systems, albeit capable of natural voice quality, are computationally expensive and require a very large footprint. Their success is attributed to the dramatic reduction of storage costs in recent times. However, for many TTS applications a smaller footprint is becoming a standard requirement. This thesis presents a new method for representing speech segments that can improve the quality and/or reduce the footprint current concatenative TTS systems. The circular linear prediction (CLP) model is revisited and combined with the constant pitch transform (CPT) to provide a robust representation of speech signals that allows for limited prosodic movements without a perceivable loss in quality. The CLP model assumes that each frame of voiced speech is an infinitely periodic signal. This assumption allows for LPC modeling using the covariance method, with the efficiency of the autocorrelation method. The CPT is combined with this model to provide a database that is uniform in pitch for matching the target prosody during synthesis. With this representation, limited prosody modifications and unit concatenation can be performed without causing audible artifacts. For resolving artifacts caused by pitch modifications in voicing transitions, a method has been introduced for reducing peakiness in the LP spectra by constraining the line spectral frequencies. Two experiments have been conducted to demonstrate the potential for the capabilities of CLP/CPT method. The first is a listening test to determine the ability of this model to realize prosody modifications without perceivable degradation. Utterances are resynthesized using the CLP/CPT method with emphasized prosodics to increase intelligibility in harsh environments. The second experiment compares the quality of utterances synthesized by unit-selection based limited-domain TTS against the CLP/CPT method. The results demonstrate that the CLP/CPT representation, applied to current concatenative TTS systems, can reduce the size of the database and increase the prosodic richness without noticeable degradation in voice quality.
机译:当前的高质量文本语音转换(TTS)系统基于从上下文和韵律丰富的大型数据库中选择的单元。这些系统尽管具有自然的语音质量,但在计算上却很昂贵,并且占用空间非常大。他们的成功归因于近来存储成本的大幅度降低。但是,对于许多TTS应用而言,较小的占用空间已成为标准要求。本文提出了一种表示语音片段的新方法,该方法可以提高质量和/或减少占用空间的串联TTS系统。重新讨论了圆形线性预测(CLP)模型,并将其与恒定音高变换(CPT)结合使用,以提供语音信号的可靠表示,从而可以实现有限的韵律运动而不会造成质量上的损失。 CLP模型假设有声语音的每个帧都是一个无限周期的信号。该假设允许使用协方差方法进行LPC建模,并具有自相关方法的效率。 CPT与该模型结合以提供一个音高均匀的数据库,以在合成过程中匹配目标韵律。通过这种表示,可以执行有限的韵律修改和单元连接,而不会引起可听见的伪影。为了解决在音调过渡中由音调变化引起的伪影,已经引入了一种通过限制线谱频率来减小LP谱中峰值的方法。已经进行了两个实验,以证明CLP / CPT方法功能的潜力。第一个是听力测试,用于确定该模型实现韵律修改而不会引起可察觉的退化的能力。使用CLP / CPT方法和强调韵律的韵律来重新合成说话,以提高恶劣环境下的清晰度。第二个实验比较了基于单元选择的有限域TTS与CLP / CPT方法合成的话语质量。结果表明,将CLP / CPT表示应用于当前的串联TTS系统,可以减小数据库的大小并增加韵律丰富度,而语音质量不会明显下降。

著录项

  • 作者

    Shukla, Sunil Ravindra.;

  • 作者单位

    Georgia Institute of Technology.;

  • 授予单位 Georgia Institute of Technology.;
  • 学科 Engineering Electronics and Electrical.
  • 学位 Ph.D.
  • 年度 2007
  • 页码 158 p.
  • 总页数 158
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 无线电电子学、电信技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号