首页> 外文OA文献 >The automatic induction of concatenative units from machine readable dictionaries and corpora for speech synthesis
【2h】

The automatic induction of concatenative units from machine readable dictionaries and corpora for speech synthesis

机译:机器可读词典和语料库中的连接单元自动归纳,用于语音合成

摘要

The purpose of this research is to determine the best method for deciding on an optimal set of concatenative units for concatenative speech synthesis. Of the two main approaches to speech synthesis: segmental synthesis and rule-based synthesis, the former relies heavily on the successful choice of concatenative units. Segment al synthesis consists of concatenating segmental units (diphones, triphones, etc); rule-based synthesis consists of the computation of control parameters based on pre-established rules. Deciding on the set of diphones is quite straightforward in the sense that it suffices to take the phoneme inventory of a language, and simply combine each phoneme with every other one. For example, taking the approximately 35 French phonemes, 1225 phonemic pairs (35x35) constitute the complete and exhaustive starting diphone inventory. On the other hand, deciding on the set of triphones, quadriphones and larger units raises difficult questions about the nature of phonemes in a given language such as: (1) stability vs instability in a coarticulatory environment, (2) size of overall inventory, and (3) frequency of that unit in the language, in combination with factors (1) and (2). We report on experiments with four different databases, with comparisons between the resources regarding their n-gram frequency output. The first two databases consist of pronunciation field information from two dictionaries, the Encyclopedic Robert French dictionary with 85,000 headwords, and the smaller Collins Gem containing 15,000 words. For comparison, we use two text corpora, the Hansard (about 2.5 million words) and the smaller Tubach and Boe corpus (80,000 words); both corpora were processed by a set of grapheme-to-phoneme rules. A frequency extraction program was applied to all four resources to extract trigram phonemic frequencies; this serves as a basis for comparison between dictionary derived data and corpus derived, frequencies.
机译:这项研究的目的是确定用于确定级联语音合成的最佳级联单元的最佳方法。在语音合成的两种主要方法中,段合成和基于规则的合成中,前者严重依赖于级联单元的成功选择。段段综合由段段单元(双音,三音等)组成;基于规则的综合包括基于预先建立的规则的控制参数的计算。确定二重音的集合是很简单的,因为它足以记录一种语言的音素清单,并且只需将每个音素彼此组合即可。例如,采用大约35个法语音素,则1225个音素对(35x35)构成了完整而详尽的起始双音素库存。另一方面,决定使用三音器,四音器和更大的单元集会引起有关给定语言的音素性质的难题,例如:(1)在协同发音环境中的稳定性与不稳定性,(2)总体库存的大小, (3)语言中该单元的频率,结合因素(1)和(2)。我们报告了使用四个不同数据库进行的实验,并对资源之间的n克频率输出进行了比较。前两个数据库由来自两个字典的发音字段信息组成,其中罗伯特法语百科全书词典的词义为85,000,而较小的柯林斯宝石则包含15,000个单词。为了进行比较,我们使用两个文本语料库,即《议事录》(约250万个单词)和较小的Tubach和Boe语料库(80,000个单词);两种语料都由一组字素到音素规则处理。将频率提取程序应用于所有四个资源,以提取三字组音素频率。这是比较字典派生数据和语料库派生频率的基础。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号