首页> 外文学位 >Pronunciation modeling in speech synthesis.
【24h】

Pronunciation modeling in speech synthesis.

机译:语音合成中的语音建模。

获取原文
获取原文并翻译 | 示例

摘要

This dissertation proposes to investigate the area of pronunciation modeling in speech synthesis. By pronunciation modeling, we mean architectures and principles for generating high-quality human-like pronunciations. The term pronunciation modeling has previously been applied in the context of speech recognition (e.g. Byrne et al. 1997). In that context, it describes theories and procedures for handling the pronunciation variation that naturally occurs across speakers. In contrast, our work is in the domain of text-to-speech synthesis, which, as we will show, requires modeling the pronunciation variation of an individual whose speech the synthesizer is attempting to model. We will explain our methodology for learning and reproducing pronunciation variation on an individual basis, and show how most crucial features of such variation can be easily generated using the architecture we describe. Throughout the course of this exposition, we highlight contributions to linguistic theory that such a thorough analysis of individual variation provides. We describe the postlexical module of an English text-to-speech synthesizer. This module is responsible for transforming underlying lexical pronunciations from a lexical database into contextually appropriate surface postlexical pronunciations. This transformation is achieved by machine learning of a corpus of hand-labeled postlexical pronunciations that have been aligned with lexical pronunciations. The machine learning is conducted by a neural network, whose architecture and data encoding we describe. A thorough analysis of the performance of the postlexical module is offered, with attention to the relative success of the neural network at learning a wide range of postlexical phenomena. We examine the extent to which a symbolic approach to allophony is warranted, and provide an acoustic analysis that attempts to provide an answer to this question. Assessments of the success of currently existing theories of phonetics, phonology and their interface are offered, based on the experience of generating a complete postlexical phonology of English for use in synthetic speech.
机译:本文提出了语音合成中语音建模领域的研究。语音建模是指用于生成高质量的类人语音的体系结构和原理。术语“语音建模”以前已在语音识别的上下文中应用(例如Byrne等,1997)。在这种情况下,它描述了处理扬声器中自然发生的发音变化的理论和过程。相反,我们的工作属于文本到语音合成领域,正如我们将要展示的那样,它要求对合成器正在尝试建模的个人的语音变化进行建模。我们将解释我们用于个体学习和再现发音变化的方法,并说明如何使用我们描述的体系结构轻松生成这种变化的最关键特征。在本次博览会的整个过程中,我们重点介绍对语言理论所做的贡献,这些理论是对个体变异的全面分析所提供的。我们描述了英语文本语音合成器的词后模块。该模块负责将基础词汇发音从词汇数据库转换为上下文合适的表面后词汇发音。这种转换是通过机器学习已与词汇发音对齐的手标记后词汇发音的语料库来实现的。机器学习由神经网络进行,我们将描述其架构和数据编码。提供了对词后模块的性能的全面分析,并关注了神经网络在学习各种词后现象方面的相对成功。我们研究了在某种程度上应采用同种异形的象征方法的必要性,并提供了声学分析,试图为这个问题提供答案。基于生成完整的用于合成语音的英语后词汇语音学的经验,对现有语音学,语音学及其界面理论的成功进行了评估。

著录项

  • 作者

    Miller, Corey Andrew.;

  • 作者单位

    University of Pennsylvania.;

  • 授予单位 University of Pennsylvania.;
  • 学科 Language Linguistics.
  • 学位 Ph.D.
  • 年度 1998
  • 页码 234 p.
  • 总页数 234
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号