...
首页> 外文期刊>Speech Communication >Tone-Group F_0 selection for modeling focus prominence in small-footprint speech synthesis
【24h】

Tone-Group F_0 selection for modeling focus prominence in small-footprint speech synthesis

机译:语音组F_0选择,用于在小尺寸语音合成中模拟焦点突出

获取原文
获取原文并翻译 | 示例

摘要

This work targets to improve the naturalness of synthetic intonational contours in Text-to-Speech synthesis through the provision of prominence, which is a major expression of human speech. Focusing on the tonal dimension of emphasis, we present a robust unit-selection methodology for generating realistic F_0 curves in cases where focus prominence is required. The proposed approach is based on selecting Tone-Group units from commonly used prosodic corpora that are automatically transcribed as patterns of syllables. In contrast to related approaches, patterns represent only the most perceivable sections of the sampled curves and are encoded to serve morphologically different sequence of syllables. This results in a minimization of the required amount of units so as to achieve sufficient coverage within the database. Nevertheless, this optimization enables the application of high-quality F_0 generation to small-footprint text-to-speech synthesis. For generic F_0 selection we query the database based on sequences of ToBI labels, though other intonational frameworks can be used as well. To realize focus prominence on specific Tone-Groups the selection also incorporates a level indicator of emphasis. We set up a series of listening tests by exploiting a database built from a 482-utterance corpus, which featured partially purpose-uttered emphasis. The results showed a clear subjective preference of the proposed model against a linear regression one in 75% of the cases when used in generic synthesis. Furthermore, this model provided ambiguous percept of emphasis in an experiment featuring major and minor degrees of prominence.
机译:这项工作旨在通过提供突出性来提高文本到语音合成中合成国际轮廓的自然性,这是人类语音的主要表达方式。着重于强调的音调维度,我们提出了一种稳健的单位选择方法,可在需要突出焦点的情况下生成逼真的F_0曲线。所提出的方法是基于从常用的韵律语料库中选择音调组单元,这些音调组单元会自动转录为音节模式。与相关方法相反,模式仅代表采样曲线的最易察觉的部分,并且被编码以服务于形态上不同的音节序列。这导致所需单元数量的最小化,以便在数据库内实现足够的覆盖范围。但是,这种优化可以将高质量的F_0生成应用到小尺寸的文本到语音合成中。对于通用F_0选择,我们也可以根据ToBI标签序列查询数据库,尽管也可以使用其他国际框架。为了使重点突出于特定的音色组,该选择还结合了重点的等级指示器。我们利用由482个话语语料库构建的数据库建立了一系列听力测试,该数据库具有部分目的明确的重点。结果表明,在一般合成中使用的模型中,有75%的情况明显反对线性回归。此外,在以主要和次要突出程度为特征的实验中,该模型提供了含糊不清的强调重点。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号