首页> 外文期刊>IEICE Transactions on Information and Systems >A VoiceFont Creation Framework for Generating Personalized Voices
【24h】

A VoiceFont Creation Framework for Generating Personalized Voices

机译:用于生成个性化语音的VoiceFont创建框架

获取原文
获取原文并翻译 | 示例
           

摘要

This paper presents a new framework for effectively creating VoiceFonts for speech synthesis. A VoiceFont in this paper represents a voice inventory aimed at generating personalized voices. Creating well-formed voice inventories is a time-consuming and laborious task. This has become a critical issue for speech synthesis systems that make an attempt to synthesize many high quality voice personalities. The framework we propose here aims to drastically reduce the burden with a twofold approach. First, in order to substantially enhance the accuracy and robustness of automatic speech segmentation, we introduce a multi-layered speech segmentation algorithm with a new measure of segmental reliability. Secondly, to minimize the amount of human intervention in the process of VoiceFont creation, we provide easy-to-use functions in a data viewer and compiler to facilitate checking and validation of the automatically extracted data. We conducted experiments to investigate the accuracy of the automatic speech segmentation, and its robustness to speaker and style variations. The results of the experiments on six speech corpora with a fairly large variation of speaking styles show that the speech segmentation algorithm is quite accurate and robust in extracting segments of both phonemes and accentual phrases. In addition, to subjectively evaluate VoiceFonts created by using the framework, we conducted a listening test for speaker recognizability. The results show that the voice personalities of synthesized speech generated by the VoiceFont-based speech synthesizer are fairly close to those of the donor speakers.
机译:本文提出了一个有效地创建用于语音合成的VoiceFonts的新框架。本文中的VoiceFont代表旨在生成个性化语音的语音清单。创建格式正确的语音清单是一项耗时且费力的任务。对于已经尝试合成许多高质量语音个性的语音合成系统,这已经成为一个关键问题。我们在此提出的框架旨在通过双重途径来大大减轻负担。首先,为了显着提高自动语音分割的准确性和鲁棒性,我们引入了一种具有分段可靠性新度量的多层语音分割算法。其次,为了最大程度地减少在VoiceFont创建过程中的人工干预,我们在数据查看器和编译器中提供了易于使用的功能,以帮助检查和验证自动提取的数据。我们进行了实验,以研究自动语音分割的准确性及其对说话者和样式变化的鲁棒性。对六种语音语料库的实验结果表明,语音语料变化很大,该语音分割算法在提取音素和重音短语的片段时非常准确且鲁棒。此外,为了主观评估使用该框架创建的VoiceFonts,我们对说话者的可识别性进行了听力测试。结果表明,基于VoiceFont的语音合成器生成的合成语音的语音个性与供体说话者的语音个性相当接近。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号