Tone-Group F_0 selection for modeling focus prominence in small-footprint speech synthesis

Gerasimos Xydas; Georgios Kouroupetroglou

首页> 外文期刊>Speech Communication >Tone-Group F_0 selection for modeling focus prominence in small-footprint speech synthesis

【24h】

Tone-Group F_0 selection for modeling focus prominence in small-footprint speech synthesis

机译：语音组F_0选择，用于在小尺寸语音合成中模拟焦点突出

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

This work targets to improve the naturalness of synthetic intonational contours in Text-to-Speech synthesis through the provision of prominence, which is a major expression of human speech. Focusing on the tonal dimension of emphasis, we present a robust unit-selection methodology for generating realistic F_0 curves in cases where focus prominence is required. The proposed approach is based on selecting Tone-Group units from commonly used prosodic corpora that are automatically transcribed as patterns of syllables. In contrast to related approaches, patterns represent only the most perceivable sections of the sampled curves and are encoded to serve morphologically different sequence of syllables. This results in a minimization of the required amount of units so as to achieve sufficient coverage within the database. Nevertheless, this optimization enables the application of high-quality F_0 generation to small-footprint text-to-speech synthesis. For generic F_0 selection we query the database based on sequences of ToBI labels, though other intonational frameworks can be used as well. To realize focus prominence on specific Tone-Groups the selection also incorporates a level indicator of emphasis. We set up a series of listening tests by exploiting a database built from a 482-utterance corpus, which featured partially purpose-uttered emphasis. The results showed a clear subjective preference of the proposed model against a linear regression one in 75% of the cases when used in generic synthesis. Furthermore, this model provided ambiguous percept of emphasis in an experiment featuring major and minor degrees of prominence.

机译：这项工作旨在通过提供突出性来提高文本到语音合成中合成国际轮廓的自然性，这是人类语音的主要表达方式。着重于强调的音调维度，我们提出了一种稳健的单位选择方法，可在需要突出焦点的情况下生成逼真的F_0曲线。所提出的方法是基于从常用的韵律语料库中选择音调组单元，这些音调组单元会自动转录为音节模式。与相关方法相反，模式仅代表采样曲线的最易察觉的部分，并且被编码以服务于形态上不同的音节序列。这导致所需单元数量的最小化，以便在数据库内实现足够的覆盖范围。但是，这种优化可以将高质量的F_0生成应用到小尺寸的文本到语音合成中。对于通用F_0选择，我们也可以根据ToBI标签序列查询数据库，尽管也可以使用其他国际框架。为了使重点突出于特定的音色组，该选择还结合了重点的等级指示器。我们利用由482个话语语料库构建的数据库建立了一系列听力测试，该数据库具有部分目的明确的重点。结果表明，在一般合成中使用的模型中，有75％的情况明显反对线性回归。此外，在以主要和次要突出程度为特征的实验中，该模型提供了含糊不清的强调重点。

著录项

来源
《Speech Communication 》 |2006年第9期| p.1057-1078| 共22页
作者
Gerasimos Xydas; Georgios Kouroupetroglou;
展开▼
作者单位

University of Athens, Department of Informatics and Telecommunications, Division of Communication and Signal Processing, Panepistimiopolis, Ilisia, GR-15784 Athens, Greece;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类语言、文字 ;
关键词
text-to-speech synthesis; tone-group unit-selection; intonation and emphasis in speech synthesis;

机译：文本到语音合成;音组单位选择;语音合成中的语调和强调;

相似文献

外文文献
中文文献
专利

1. Synthesis of F_0 contours using generation process model parameters predicted from unlabeled corpora: application to emotional speech synthesis [J] . Keikichi Hirose, Kentaro Sato, Yasufumi Asano, Speech Communication . 2005 ,第3a4期

机译：使用从未标记的语料库预测的生成过程模型参数来合成F_0轮廓：在情感语音合成中的应用
2. F_0 Modeling for Isarn Speech Synthesis using Deep Neural Networks and Syllable-level Feature Representation [J] . Janyoi Pongsathon, Seresangtakul Pusadee The international arab journal of information technology . 2020 ,第6期

机译：使用深神经网络和音节级特征表示，ISARN语音合成的F_0建模
3. A Vector Quantized Variational Autoencoder (VQ-VAE) Autoregressive Neural $F_0$ Model for Statistical Parametric Speech Synthesis [J] . Xin Wang, Shinji Takaki, Junichi Yamagishi, Audio, Speech, and Language Processing, IEEE/ACM Transactions on . 2020 ,第期

机译：矢量量化变形AutoEncoder（VQ-VAE）自动增加神经$ F_0 $模型用于统计参数致辞
4. Modelling F_0 Dynamics in Unit Selection Based Speech Synthesis [C] . Daniel Tihelka, Jindrich Matousek, Zdenek Hanzlicek International conference on text, speech and dialogue . 2014

机译：基于单元选择的语音合成中的F_0动力学建模
5. Articulatory speech synthesis and speech production modelling. [D] . Huang, Jun. 2001

机译：发音语音合成和语音产生建模。
6. A TOP-DOWN AUDITORY ATTENTION MODEL FOR LEARNING TASK DEPENDENT INFLUENCES ON PROMINENCE DETECTION IN SPEECH [O] . Ozlem Kalinli, Shrikanth Narayanan -1

机译：学习对语音突出程度影响的自上而下的音频注意模型
7. Tone-Group F_0 selection for modeling focus prominence In Small-Footprint Speech synthesis [O] . Gerasimos Xydas, Georgios Kouroupetroglou 2006

机译：在小脚印语音合成中使用音调组F_0进行焦点突出建模

Tone-Group F_0 selection for modeling focus prominence in small-footprint speech synthesis

摘要

著录项

相似文献

相关主题

期刊订阅