首页> 外文期刊>Computer speech and language >Acoustic speech unit segmentation for concatenative synthesis
【24h】

Acoustic speech unit segmentation for concatenative synthesis

机译:语音语音单元分割,用于级联合成

获取原文

摘要

Synthesis by concatenation of natural speech improves perceptual results when phonemes and syllables are segmented at places where spectral variations are small [Klatt, D., 1987. Review of text-to-speech conversion for English. J. Acoust. Soc. Am 82 (3), 737-793]. An automatic segmentation method is explored here, using a tool based on a combination of Entropy Coding, Multiresolution Analysis, and Kohonen's Self Organized Maps. The segmentation method considers that there are no limits imposed by any linguistic unit. Resulting waveforms represent phone chains dominated by spectral dynamic structures. Each acoustic unit obtained could be composed of a variety of phonemes or a segmented part of them at the unit boundary. The number of units and unit structure are speaker dependent, i.e. rate, segmental and suprasegmen-tal distinctive features affect them as dynamic structure varies. Results obtained from two databases - one male, one female - of 741 sentences each show this dependence, presenting a different number of units and occurrences for each speaker. Nevertheless, both speakers show a high occurrence of three (36-24%) and four (29-27%) phoneme sequences. Vowel-consonant-vowel sequences are the most frequent type (9.7-8.3%). Consonant-vowel syllables, which are phonemically frequent in Spanish (58%), are less represented (6.6-3.2%) using this method. The relevance of half phone segmentation is verified given that 66% for the female speaker and 53% for the male speaker, of the total units start and end with a segmented phone. Perceptual experiments showed that concatenated speech, created with dynamic acoustic units, was judged more natural than with diphone units.
机译:当音素和音节在频谱变化较小的地方进行分割时,通过自然语音的级联进行合成可以改善感官效果[Klatt,D.,1987.审查英语的文本到语音转换。 J. Acoust。 Soc。 Am 82(3),737-793]。本文探索了一种自动分割方法,该方法使用了基于熵编码,多分辨率分析和Kohonen自组织图的组合的工具。分割方法认为任何语言单元都没有限制。产生的波形代表了由频谱动态结构主导的电话链。所获得的每个声学单元可以由多种音素组成,或者在单元边界处它们的一部分。单元的数量和单元结构取决于说话者,即,速率,分段和超音阶的独特特征会随着动态结构的变化而影响它们。从两个数据库(一个男性,一个女性)获得的741个句子的结果都表明了这种依赖性,每个说话者呈现出不同数量的单位和出现次数。然而,两个说话者都显示出三个(36-24%)和四个(29-27%)音素序列的高发率。元音-辅音-元音序列是最常见的类型(9.7-8.3%)。使用此方法,辅音元音节在西班牙语中的语音频率很高(58%),但代表较少(6.6-3.2%)。验证了一半电话细分的相关性,因为在总单位中,以讲话者开头和结尾的女性为66%,男性为53%。感知实验表明,使用动态声学单元创建的级联语音比使用双音单元更自然。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号