Using variable-sized speech segments as targets for concatenative Speech-to-Speech synthesis

Atsushi MASAKI; HidekiKASHIOKA; Nick CAMPBELL

首页> 外文期刊>電子情報通信学会技術研究報告. 音声. Speech >Using variable-sized speech segments as targets for concatenative Speech-to-Speech synthesis

【24h】

Using variable-sized speech segments as targets for concatenative Speech-to-Speech synthesis

机译：使用可变大小的语音段作为连接性语音合成的目标

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Concatenative speech synthesis is growing in popularity due to the high naturalness of its resulting voice quality, but it is still domain-specific and has not yet been tested with conversational speech. We propose a method of unit selection that will overcome some of the problems that have prevented this development. In particular, we address two problems; one is the need for an extremely large database of labelled speech, the other is the incorporation of paralinguistic information in the speech synthesis. In our proposed 'speech-to-speech' method, we use acoustic criteria to segment the database into variable-sized units, and then use an acoustic waveform as a target for the unit-selection search. In a final stage, prosodic criteria are applied to select the optimal sequence of units for the output waveform generation. In this paper, we describe the techniques for segmenting the large speech database and the acoustic criteria used for unit selection. We present results comparing two methods of speech database segmentation, and further results from accuracy based on phonetic labels and a perceptual test which confirm the intelligibility and naturalness and accuracy of dictation.

机译：由于其语音质量的高度自然，所以连接性语音合成越来越受欢迎，但它仍然是域名的，尚未通过会话语音测试。我们提出了一种单位选择的方法，将克服一些阻止这种发展的问题。特别是，我们解决了两个问题;一个是需要一个非常大的标记语音数据库，另一个是在语音合成中的汇编信息的结合。在我们提出的“演讲到语音”方法中，我们使用声标在变量大小的单元中将数据库进行分段，然后使用声波形式作为单位选择搜索的目标。在最后阶段，应用韵律标准来选择输出波形生成的最佳单位序列。在本文中，我们描述了用于分割大语音数据库的技术和用于单元选择的声学标准。我们呈现结果比较两种语音数据库分割方法，以及基于拼音标签的精度和感知测试的进一步产生，这证实了听证性和自然性和准确性。

著录项

来源
《電子情報通信学会技術研究報告. 音声. Speech》 |2003年第264期|共6页
作者
Atsushi MASAKI; HidekiKASHIOKA; Nick CAMPBELL;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 jpn
中图分类电报、传真;
关键词
Concatenative speech synthesis; Unit selection; Speech-to-Speech synthesis; Variable-sized speech units; Acoustic-based selection;

机译：连接性语音合成;单位选择;语音致辞合成;可变大小的语音单元;基于声学的选择;

相似文献

外文文献
中文文献
专利

1. Using variable-sized speech segments as targets for concatenative Speech-to-Speech synthesis [J] . Atsushi MASAKI, HidekiKASHIOKA, Nick CAMPBELL 電子情報通信学会技術研究報告. 音声. Speech . 2003,第264期

机译：使用可变大小的语音段作为级联语音到语音合成的目标
2. Personalising speech-to-speech translation: Unsupervised cross-lingual speaker adaptation for HMM-based speech synthesis [J] . John Dines, Hui Liang, Lakshmi Saheer, Computer speech and language . 2013,第2期

机译：个性化语音到语音翻译：基于HMM的语音合成的无监督跨语言说话者自适应
3. An evaluation of cost functions sensitively capturing local degradation of naturalness for segment selection in concatenative speech synthesis [J] . Toda T, Kawai H, Tsuzaki M, Speech Communication . 2006,第1期

机译：成本函数的评估，可以灵敏地捕获级联语音合成中用于段选择的自然局部降级
4. SPEECH UNIT SELECTION BASED ON TARGET VALUES DRIVEN BY SPEECH DATA IN CONCATENATIVE SPEECH SYNTHESIS [C] . Toshio Hirai, Seiichi Tenpaku, Kiyohiro Shikano IEEE Workshop on Speech Synthesis . 2003

机译：语音单元选择基于由语音数据驱动的目标值在连接语音合成中
5. Improving high quality concatenative text-to-speech synthesis using the circular linear prediction model. [D] . Shukla, Sunil Ravindra. 2007

机译：使用圆形线性预测模型改善高质量的串联文本到语音合成。
6. Perception of interrupted speech: Cross-rate variation in the intelligibility of gated and concatenated sentences [O] . Valeriy Shafiro, Stanley Sheft, Robert Risley -1

机译：语音中断的感知：门控和级联句子的可懂度的交叉速率变化
7. An Evaluation of Cost Functions Sensitively Capturing Local Degradation of Naturalness for Segment Selection in Concatenative Speech Synthesis [O] . Tomoki Toda A, Kiyohiro Shikano 2013

机译：成本函数的评价敏感地捕捉局部降解自然性，用于连接语音合成中的片段选择
8. Part Ⅰ SEGMENTATION TECHNIQUES IS SPEECH 3YOTHBSIS Part Ⅱ A SEGMENT INVENTORY FOR SPEECH SYNTHESIS [R] . Gordon E. Peterson, William S-Y Wang 1958

机译：第一部分分段技术是语音合成第二部分语音合成的分段库存

Using variable-sized speech segments as targets for concatenative Speech-to-Speech synthesis

摘要

著录项

相似文献

相关主题

期刊订阅