首页> 外文期刊>Computer speech and language >A modular holistic approach to prosody modelling for Standard Yoruba speech synthesis
【24h】

A modular holistic approach to prosody modelling for Standard Yoruba speech synthesis

机译:标准Yoruba语音合成的模块化整体韵律建模方法

获取原文
获取原文并翻译 | 示例
       

摘要

This paper presents a novel prosody model in the context of computer text-to-speech synthesis applications for tone languages. We have demonstrated its applicability using the Standard Yoruba (SY) language. Our approach is motivated by the theory that abstract and realised forms of various prosody dimensions should be modelled within a modular and unified framework [Coleman, J.S., 1994. Polysyllabic words in the YorkTalk synthesis system. In: Keating, P.A. (Ed.), Phonological Structure and Forms: Papers in Laboratory Phonology Ⅲ, Cambridge University Press, Cambridge, pp. 293-324]. We have implemented this framework using the Relational Tree (R-Tree) technique. R-Tree is a sophisticated data structure for representing a multi-dimensional waveform in the form of a tree. The underlying assumption of this research is that it is possible to develop a practical prosody model by using appropriate computational tools and techniques which combine acoustic data with an encoding of the phonological and phonetic knowledge provided by experts. To implement the intonation dimension, fuzzy logic based rules were developed using speech data from native speakers of Yoruba. The Fuzzy Decision Tree (FDT) and the Classification and Regression Tree (CART) techniques were tested in modelling the duration dimension. For practical reasons, we have selected the FDT for implementing the duration dimension of our prosody model. To establish the effectiveness of our prosody model, we have also developed a Stem-ML prosody model for SY. We have performed both quantitative and qualitative evaluations on our implemented prosody models. The results suggest that, although the R-Tree model does not predict the numerical speech prosody data as accurately as the Stem-ML model, it produces synthetic speech prosody with better intelligibility and naturalness. The R-Tree model is particularly suitable for speech prosody modelling for languages with limited language resources and expertise, e.g. African languages. Furthermore, the R-Tree model is easy to implement, interpret and analyse.
机译:本文在音调语言的计算机文本到语音合成应用程序的上下文中提出了一种新颖的韵律模型。我们已经使用标准Yoruba(SY)语言展示了其适用性。我们的方法是受以下理论激励的:应该在模块化和统一的框架内对各种韵律维度的抽象形式和已实现形式进行建模[Coleman,J.S.,1994年。YorkTalk合成系统中的复音词。在:基廷(Keating),P.A。 (编),语音结构和形式:《实验室语音学Ⅲ》,剑桥大学出版社,剑桥,第293-324页。我们已经使用关系树(R-Tree)技术实现了此框架。 R-Tree是一种复杂的数据结构,用于以树的形式表示多维波形。这项研究的基本假设是,可以通过使用适当的计算工具和技术来开发实用的韵律模型,这些工具和技术将声学数据与专家提供的语音学和语音学知识的编码相结合。为了实现语调维度,使用了来自约鲁巴语母语人士的语音数据开发了基于模糊逻辑的规则。在对持续时间维度进行建模时,测试了模糊决策树(FDT)和分类回归树(CART)技术。出于实际原因,我们选择FDT来实现韵律模型的持续时间维度。为了确定我们的韵律模型的有效性,我们还为SY开发了Stem-ML韵律模型。我们对实施的韵律模型进行了定量和定性评估。结果表明,尽管R-Tree模型不能像Stem-ML模型那样准确地预测数字语音韵律数据,但它会产生具有更好的清晰度和自然度的合成语音韵律。 R-Tree模型特别适用于语言资源和专业知识有限的语言的语音韵律建模。非洲语言。此外,R-Tree模型易于实现,解释和分析。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号