【24h】

End-to-End Speech Synthesis for Bangla with Text Normalization

机译:Bangla的端到端语音合成,文本标准化

获取原文

摘要

Text to speech synthesis is a well-researched area, yet no system has been developed which can claim to be as convincing as a human voice. An end-to-end system in the context of speech synthesis denotes a system capable of synthesizing speech from text using training data as minimal as transcribed audio data without any language-specific knowledge and phoneme dictionaries. But an end-to-end system should also have the capability to integrate any language-specific rules to improve its performance. In this paper, we propose an end-to-end speech synthesis system for Bangla (also known as Bengali) which uses a minimal front end and a neural network as its statistical parametric model. We also propose a Text Normalization Procedure(TNP) for Bangla and incorporate it to the end-to-end system. We have conducted extensive experiments using different models. From the feedback from the participants of the experiment, we have found out that they felt more positively towards the system if TNP is incorporated. A Wilcoxon signed-rank test was conducted to validate the results of the experiment and the probability of the results being like this because of experimental errors rather than TNP was calculated to be less than 5%.
机译:语音合成的文本是一个研究良好的区域,但没有制定任何系统,可以声称作为人类的声音令人信服。语音合成背景下的端到端系统表示能够使用培训数据作为转录音频数据的培训数据来合成来自文本的语音的系统,而没有任何语言特定的知识和音素词典。但是端到端的系统还应具有集成任何语言特定规则来提高其性能的能力。在本文中,我们提出了一种用于孟加拉(也称为孟加拉)的端到端语音合成系统,其使用最小的前端和神经网络作为其统计参数模型。我们还提出了孟加拉的文本规范化程序(TNP),并将其与端到端系统合并。我们使用不同的型号进行了广泛的实验。根据实验的参与者的反馈,我们发现,如果纳入TNP,它们会对系统施加更积极地感受到。进行威尔科逊符号级别测试以验证实验结果,并且由于实验误差而不是TNP的结果,结果的概率计算为小于5 %。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号