首页> 外文会议>WSEAS International Conference on Acoustics Music: Theory Applications >Singing voice synthesis in Spanish by concatenation of syllables based on the TD-PSOLA algorithm
【24h】

Singing voice synthesis in Spanish by concatenation of syllables based on the TD-PSOLA algorithm

机译:基于TD-PSOLA算法的音节串联在西班牙语中唱歌语音合成

获取原文
获取外文期刊封面目录资料

摘要

The present work shows the development of a Spanish singing voice synthesizer where a TD-PSOLA algorithm is applied. The main goal of the development was to test the hypothesis that while diphones are linguistically the units with the best intelligibility-flexibility compromise for the purposes of spoken voice synthesis, it is the syllables the best suited units for concatenation singing voice synthesis. Such hypothesis is particularly strong for Spanish, since its rules for syllable construction are comprehensive, relatively simple, and only a handful. To test the hypothesis a relatively small amount of vocals and syllables in Spanish were recorded by a soprano singer at both F4 and C5 tones, with duration of 1 second each (±0.2sec.). The modification of the syllables was carried only in regards to tone and duration. Matlab was used as the programming platform mainly because of the author's relative expertise on it. To evaluate the performance of the system several melodic tasks were asked of it including the singing of a popular Mexican song (Las Mananitas). Results show that a highly intelligible synthesized Spanish singing voice based on syllable concatenation can be achieved with minimum control mechanisms. While the time duration variation introduces very few noticeable digital errors, a transposition of up to a just fourth was possible without generating very obvious digital errors. A variation of 5% (0.05) in the frequency scale corresponds to a semitone variation in the equally tempered modern scale.
机译:目前的工作表明,应用TD-PSOLA算法的西班牙歌唱语音合成器的开发。该开发的主要目标是测试假设,而偶像是语言学的,而是为口语合成口语合成的最佳清晰度灵活性的单位,它是最适合串联歌唱语音合成的音节。这些假设对于西班牙语特别强大,因为它的音节建设规则是全面的,相对简单,而且只有少数。为了测试假设,SPRANO歌手在F4和C5音调中记录了西班牙语中的相对少量的人声和音节,每次持续为1秒(±0.2sec)。音节的修改仅在对音调和持续时间内携带。 Matlab被用作编程平台,主要是因为作者对其的相对专业知识。为了评估系统的表现,提出了几个旋律任务,包括歌唱墨西哥歌曲(Las Mananitas)。结果表明,最小控制机制,可以实现基于音节连接的高度可理解的合成西班牙歌唱语音。虽然时间持续时间变化引入非常少数明显的数字误差,但在不产生非常明显的数字误差的情况下,最多可能的转换可能是可能的。频率尺度中的5%(0.05)的变化对应于同等钢化现代规模的半音变化。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号