首页> 外文期刊>Archives of acoustics >SYNTHESIS OF FUNDAMENTAL FREQUENCY CONTOURS FOR STANDARD CHINESE BASED ON SUPERPOSITIONAL AND TONE NUCLEUS MODELS
【24h】

SYNTHESIS OF FUNDAMENTAL FREQUENCY CONTOURS FOR STANDARD CHINESE BASED ON SUPERPOSITIONAL AND TONE NUCLEUS MODELS

机译:基于叠加和音调核模型的标准汉语基础频率轮廓综合

获取原文
获取原文并翻译 | 示例
       

摘要

A method for generating sentence F_0 contours of Standard Chinese speech is developed. It is based on superposing tone components on phrase components in logarithmic frequency. While tone components are language specific, phrase components are assumed to be more language universal. Taking this situation into account, the method treats two kinds of components differently. The tone components are generated by concatenating Fo patterns of tone nuclei, which are predicted by a corpus-based scheme, while the phrase components are generated by rules. Experiments on F_0 contour generation were conducted using 100 news utterances by a female speaker. First experiments were conducted on the generation of tone components, with phrase components of the original utterances being used unchanged. The results showed that the method could generate F_0 contours close to those of target speech. Speech synthesis was conducted by substituting original F_0 contours to generated ones by TD-PSOLA. A high score 4.5 in 5-point scale was obtained on average as the result of listening experiments on the quality of synthetic speech. Second experiments were on the generated phrase components, with the tone components extracted from the original utterances. Although the synthetic speech with generated Fo contours sounded mostly natural, there were occasional "degraded sounds", because of mismatch between the phrase and the tone components. To cope with the mismatch, a two-step method was developed, where information of the phrase contours was used for the prediction of tone components. Validity on the method was shown through perceptual experiments on synthesized speech.
机译:提出了一种生成标准汉语语音句子F_0轮廓的方法。它基于以对数频率将音调成分叠加在短语成分上。虽然语气成分是特定于语言的,但短语成分被认为是更通用的语言。考虑到这种情况,该方法对两种成分进行不同的处理。音调成分是通过串联由基于语料库的方案预测的音调核的Fo模式而生成的,而短语成分是由规则生成的。女发言人使用100条新闻话语进行了F_0轮廓生成的实验。在音调成分的产生上进行了第一实验,原始话语的短语成分未改变。结果表明,该方法可以生成接近目标语音的F_0轮廓。通过用TD-PSOLA将原始的F_0轮廓替换为生成的轮廓来进行语音合成。通过听取合成语音质量的实验,平均得到5分制的高分4.5。第二个实验是在生成的乐句成分上进行的,从原始话语中提取了音调成分。尽管具有生成的Fo轮廓的合成语音听起来大多自然,但由于短语和音调成分之间不匹配,因此偶尔会出现“降级的声音”。为了解决不匹配问题,开发了一种两步方法,其中将短语轮廓信息用于预测声调成分。通过对合成语音的感知实验证明了该方法的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号