首页> 外文期刊>Computer speech and language >Intonation contour realisation for Standard Yorùbá text-to-speech synthesis: A fuzzy computational approach
【24h】

Intonation contour realisation for Standard Yorùbá text-to-speech synthesis: A fuzzy computational approach

机译:标准约鲁巴语文本到语音合成的语调轮廓实现:一种模糊的计算方法

获取原文
获取原文并翻译 | 示例

摘要

This paper presents a novel intonation modelling approach and demonstrates its applicability using the Standard Yorùbá language. Our approach is motivated by the theory that abstract and realised forms of intonation and other dimensions of prosody should be modelled within a modular and unified framework. In our model, this framework is implemented using the Relational Tree (R-Tree) technique. The R-Tree is a sophisticated data structure for representing a multi-dimensional waveform in the form of a tree. Our R-Tree for an utterance is generated in two steps. First, the abstract structure of the waveform, called the Skeletal Tree (S-Tree), is generated using tone phonological rules for the target language. Second, the numerical values of the perceptually significant peaks and valleys on the S-Tree are computed using a fuzzy logic based model. The resulting points are then joined by applying interpolation techniques. The actual intonation contour is synthesised by Pitch Synchronous Overlap Technique (PSOLA) using the Praat software. We performed both quantitative and qualitative evaluations of our model. The preliminary results suggest that, although the model does not predict the numerical speech data as accurately as contemporary data-driven approaches, it produces synthetic speech with comparable intelligibility and naturalness. Furthermore, our model is easy to implement, interpret and adapt to other tone languages.
机译:本文提出了一种新颖的语调建模方法,并使用标准的约鲁巴语言来证明其适用性。我们的方法是受以下理论激励的:应在模块化和统一的框架内对抽象和已实现的语调形式和韵律的其他维度进行建模。在我们的模型中,此框架是使用关系树(R-Tree)技术实现的。 R树是一种复杂的数据结构,用于以树的形式表示多维波形。我们的话语R树分为两个步骤。首先,使用目标语言的语音语音规则生成称为骨架树(S-Tree)的波形抽象结构。其次,使用基于模糊逻辑的模型计算S树上的感知有效峰和谷的数值。然后,通过应用插值技术将结果点合并在一起。实际的音调轮廓是通过使用Praat软件的音高同步重叠技术(PSOLA)合成的。我们对模型进行了定量和定性评估。初步结果表明,尽管该模型不能像现代数据驱动方法那样准确地预测数字语音数据,但它可以产生可比的清晰度和自然度的合成语音。此外,我们的模型易于实现,解释和适应其他音调语言。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号