...
【24h】

Estimation Of Phrase Boundaries For Tamil Speech Synthesizer

机译:泰米尔语语音合成器的短语边界估计

获取原文
           

摘要

Given any arbitrary text in a language, a textto- speech (TTS) system is expected to produce high quality speech. One of the major language-specific information, in addition to list of phonemes is, the phrase boundaries in a given sentence, in the form of "comma" and other punctuations. Wherever a comma is present in the text, during parsing, the synthesizer will introduce a silence to represent it. This will improve the quality, and even in some cases, the proper meaning can be conveyed. However, for Tamil, in the written text, the phrase boundaries are not explicitly present, thus the quality of the HMM-based synthesizer is found to be poor, in the sense that, the individual words in the sentence sound very good, but as a sentence, it does not sound natural. For the language Tamil, estimating the phrase boundaries, from a given sentence, is still a research issue. A system without phrase boundary is built as a baseline system. Without any analysis carried out on given text, silence is introduced arbitrarily after each word, every two words, and every three words. Even though, there is an improvement in the naturalness in the synthesized speech, since phrase boundaries, in terms of pauses, are introduced arbitrarily, in many synthetic sentences the quality is annoyingly low. An analysis is carried out on word terminal syllables occuring at the phrase boundaries and the 50 most frequently occuring word terminal syllables are considered. Based on this analysis another system is built which gives phrase boundaries after the words that terminate in these syllables. Significant improvement is achieved when phrase boundaries are predicted using terminal syllables, however, certain phrase boundaries are not predicted due to absence of terminal syllables. So a final system is developed, where initially phrase boundaries are predicted based on the word terminal syllable and then if the number of words in each phrase exceeds a threshold, a new phrase boundary is introduced at the midpoint of each phrase. This system produces high quality speech with a mean opinion score (MOS) of 4.23.
机译:给定某种语言中的任意文本,可以期望文本语音(TTS)系统产生高质量的语音。除音素列表外,主要的特定于语言的信息之一是给定句子中的短语边界,形式为“逗号”和其他标点符号。在文本中的任何地方,在解析过程中,合成器都会引入一个静默来表示它。这样可以提高质量,甚至在某些情况下也可以传达适当的含义。但是,对于泰米尔语,在书面文本中,短语边界没有明确显示,因此发现基于HMM的合成器的质量很差,从某种意义上说,句子中的单个单词听起来非常好,但是一句话,听起来不自然。对于泰米尔语来说,从给定的句子估计短语边界仍然是一个研究问题。没有短语边界的系统被构建为基准系统。在没有对给定文本进行任何分析的情况下,在每个单词,每两个单词和每三个单词之后随意引入沉默。即使在合成语音的自然性方面有所改善,因为在暂停方面任意引入短语边界,在许多合成句子中质量也很低。对在短语边界处出现的单词末尾音节进行分析,并考虑了50个最频繁出现的单词末尾音节。基于此分析,构建了另一个系统,该系统在以这些音节结尾的单词之后给出短语边界。当使用末尾音节预测短语边界时,可以实现显着的改进,但是,由于缺少末尾音节,因此无法预测某些短语边界。因此,开发了一个最终系统,其中最初根据单词终端音节预测短语边界,然后如果每个短语中的单词数超过阈值,则会在每个短语的中点引入新的短语边界。该系统产生高质量的语音,平均意见得分(MOS)为4.23。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号