首页> 外文期刊>IEICE Transactions on Information and Systems >Concatenative Speech Synthesis Based on the Plural Unit Selection and Fusion Method
【24h】

Concatenative Speech Synthesis Based on the Plural Unit Selection and Fusion Method

机译:基于复数单元选择和融合方法的级联语音合成

获取原文
获取原文并翻译 | 示例
       

摘要

This paper proposes a novel speech synthesis method to generate human-like natural speech. The conventional unit-selection-based synthesis method selects speech units from a large database, and concatenates them with or without modifying the prosody to generate synthetic speech. This method features highly human-like voice quality. The method, however, has a problem that a suitable speech unit is not necessarily selected. Since the unsuitable speech unit selection causes discontinuity between the consecutive speech units, the synthesized speech quality deteriorates. It might be considered that the conventional method can attain higher speech quality if the database size increases. However, preparation of a larger database requires a longer recording time. The narrator's voice quality does not remain constant throughout the recording period. This fact deteriorates the database quality, and still leaves the problem of unsuitable selection. We propose the plural unit selection and fusion method which avoids this problem. This method integrates the unit fusion used in the unit-training-based method with the conventional unit-selection-based method. The proposed method selects plural speech units for each segment, fuses the selected speech units for each segment, modifies the prosody of the fused speech units, and concatenates them to generate synthetic speech. This unit fusion creates speech units which are connected to one another with much less voice discontinuity, and realizes high quality speech. A subjective evaluation test showed that the proposed method greatly improves the speech quality compared with the conventional method. Also, it showed that the speech quality of the proposed method is kept high regardless of the database size, from small (10 minutes) to large (40 minutes). The proposed method is a new framework in the sense that it is a hybrid method between the unit-selection-based method and the unit-training-based method. In the framework, the algorithms of the unit selection and the unit fusion are exchangeable for more efficient techniques. Thus, the framework is expected to lead to new synthesis methods.
机译:本文提出了一种新颖的语音合成方法,可以生成类似人的自然语音。常规的基于单元选择的合成方法从大型数据库中选择语音单元,并在修改或不修改韵律的情况下将它们连接起来以生成合成语音。这种方法具有非常类似于人的语音质量。然而,该方法具有的问题是,不一定要选择合适的语音单元。由于不合适的语音单元选择导致连续语音单元之间的不连续,因此合成语音质量变差。可以认为,如果数据库大小增加,则常规方法可以获得更高的语音质量。但是,准备更大的数据库需要更长的记录时间。叙述者的声音质量在整个录制期间不会保持恒定。这个事实降低了数据库的质量,仍然留下了不合适的选择的问题。我们提出避免这种问题的复数单元选择和融合方法。该方法将基于单元训练的方法中使用的单元融合与常规基于单元选择的方法进行了集成。所提出的方法为每个片段选择多个语音单元,为每个片段融合所选的语音单元,修改融合的语音单元的韵律,并将它们连接起来以生成合成语音。这种单元融合产生了语音单元,它们彼此连接而语音不连续性要少得多,并且可以实现高质量的语音。主观评估测试表明,与传统方法相比,该方法极大地提高了语音质量。而且,它表明,无论数据库大小如何,从小(10分钟)到大(40分钟),所提方法的语音质量都保持较高。从基于单元选择的方法和基于单元训练的方法的混合方法的角度来看,所提出的方法是一个新的框架。在该框架中,单元选择和单元融合的算法可以互换,以实现更高效的技术。因此,该框架有望导致新的合成方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号