Concatenative Speech Synthesis Based on the Plural Unit Selection and Fusion Method

Tatsuya MIZUTANI; Takehiko KAGOSHIMA

首页> 外文期刊>IEICE Transactions on Information and Systems >Concatenative Speech Synthesis Based on the Plural Unit Selection and Fusion Method

【24h】

Concatenative Speech Synthesis Based on the Plural Unit Selection and Fusion Method

机译：基于复数单元选择和融合方法的级联语音合成

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper proposes a novel speech synthesis method to generate human-like natural speech. The conventional unit-selection-based synthesis method selects speech units from a large database, and concatenates them with or without modifying the prosody to generate synthetic speech. This method features highly human-like voice quality. The method, however, has a problem that a suitable speech unit is not necessarily selected. Since the unsuitable speech unit selection causes discontinuity between the consecutive speech units, the synthesized speech quality deteriorates. It might be considered that the conventional method can attain higher speech quality if the database size increases. However, preparation of a larger database requires a longer recording time. The narrator's voice quality does not remain constant throughout the recording period. This fact deteriorates the database quality, and still leaves the problem of unsuitable selection. We propose the plural unit selection and fusion method which avoids this problem. This method integrates the unit fusion used in the unit-training-based method with the conventional unit-selection-based method. The proposed method selects plural speech units for each segment, fuses the selected speech units for each segment, modifies the prosody of the fused speech units, and concatenates them to generate synthetic speech. This unit fusion creates speech units which are connected to one another with much less voice discontinuity, and realizes high quality speech. A subjective evaluation test showed that the proposed method greatly improves the speech quality compared with the conventional method. Also, it showed that the speech quality of the proposed method is kept high regardless of the database size, from small (10 minutes) to large (40 minutes). The proposed method is a new framework in the sense that it is a hybrid method between the unit-selection-based method and the unit-training-based method. In the framework, the algorithms of the unit selection and the unit fusion are exchangeable for more efficient techniques. Thus, the framework is expected to lead to new synthesis methods.

机译：本文提出了一种新颖的语音合成方法，可以生成类似人的自然语音。常规的基于单元选择的合成方法从大型数据库中选择语音单元，并在修改或不修改韵律的情况下将它们连接起来以生成合成语音。这种方法具有非常类似于人的语音质量。然而，该方法具有的问题是，不一定要选择合适的语音单元。由于不合适的语音单元选择导致连续语音单元之间的不连续，因此合成语音质量变差。可以认为，如果数据库大小增加，则常规方法可以获得更高的语音质量。但是，准备更大的数据库需要更长的记录时间。叙述者的声音质量在整个录制期间不会保持恒定。这个事实降低了数据库的质量，仍然留下了不合适的选择的问题。我们提出避免这种问题的复数单元选择和融合方法。该方法将基于单元训练的方法中使用的单元融合与常规基于单元选择的方法进行了集成。所提出的方法为每个片段选择多个语音单元，为每个片段融合所选的语音单元，修改融合的语音单元的韵律，并将它们连接起来以生成合成语音。这种单元融合产生了语音单元，它们彼此连接而语音不连续性要少得多，并且可以实现高质量的语音。主观评估测试表明，与传统方法相比，该方法极大地提高了语音质量。而且，它表明，无论数据库大小如何，从小（10分钟）到大（40分钟），所提方法的语音质量都保持较高。从基于单元选择的方法和基于单元训练的方法的混合方法的角度来看，所提出的方法是一个新的框架。在该框架中，单元选择和单元融合的算法可以互换，以实现更高效的技术。因此，该框架有望导致新的合成方法。

著录项

来源
《IEICE Transactions on Information and Systems》 |2005年第11期|p.2565-2572|共8页
作者
Tatsuya MIZUTANI; Takehiko KAGOSHIMA;
展开▼
作者单位

Semiconductor Company, Toshiba Corporation;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类无线电电子学、电信技术;
关键词
speech synthesis; plural unit selection; unit fusion; unit training; sense of stability and sense of voice;

机译：语音合成;多个单元选择;单元融合;单元训练;稳定性和声音感;
入库时间 2022-08-18 00:29:55

相似文献

外文文献
中文文献
专利

1. Fast Concatenative Speech Synthesis Using Pre-Fused Speech Units Based on the Plural Unit Selection and Fusion Method [J] . Masatsune TAMURA, Tatsuya MIZUTANI, Takehiko KAGOSHIMA IEICE Transactions on Information and Systems . 2007,第2期

机译：基于多个单元选择和融合方法的预融合语音单元快速级联语音合成
2. An Efficient Unit-Selection Method for Concatenative Text-to-Speech Synthesis Systems [J] . Mario Zganec, Zganec Gros Jerneja Journal of computing and information technology . 2008,第1期

机译：级联文本语音合成系统的有效单位选择方法
3. An Efficient Unit-selection Method for Concatenative Text-to-speech Synthesis Systems [J] . Jerneja Zganec Gros, Mario Zganec Journal of Computing and Information Technology . 2008,第1期

机译：级联文本语音合成系统的有效单位选择方法
4. Scalable Concatenative Speech Synthesis Based on the Plural Unit Selection and Fusion Method [C] . Tamura, M., Mizutani, . 2005

机译：基于多个单元选择和融合方法的可扩展级联语音合成
5. An epoxide- and organoalane-based methodology for the iterative construction of polypropionate units: Application to the synthesis of streptovaricin D and U ansa chains. [D] . Torres-Irizarry, Wildeliz. 2007

机译：一种基于环氧化物和有机铝烷的聚丙烯酸酯单元的迭代构建方法：在链霉菌素D和U ansa链的合成中的应用。
6. On the Selection of Non-Invasive Methods Based on Speech Analysis Oriented to Automatic Alzheimer Disease Diagnosis [O] . Karmele López-de-Ipiña, Jesus-Bernardino Alonso, Carlos Manuel Travieso, 2013

机译：基于语音分析的阿尔茨海默氏病自动诊断非侵入性方法选择
7. An Efficient Unit-Selection Method for Concatenative Text-to-Speech Synthesis Systems [O] . Mario, Zganec, Zganec Gros, Jerneja 2008

机译：级联文本语音合成系统的有效单位选择方法

Concatenative Speech Synthesis Based on the Plural Unit Selection and Fusion Method

摘要

著录项

相似文献

相关主题

期刊订阅