Pronunciation modeling in speech synthesis.

机译：语音合成中的语音建模。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

This dissertation proposes to investigate the area of pronunciation modeling in speech synthesis. By pronunciation modeling, we mean architectures and principles for generating high-quality human-like pronunciations. The term pronunciation modeling has previously been applied in the context of speech recognition (e.g. Byrne et al. 1997). In that context, it describes theories and procedures for handling the pronunciation variation that naturally occurs across speakers. In contrast, our work is in the domain of text-to-speech synthesis, which, as we will show, requires modeling the pronunciation variation of an individual whose speech the synthesizer is attempting to model. We will explain our methodology for learning and reproducing pronunciation variation on an individual basis, and show how most crucial features of such variation can be easily generated using the architecture we describe. Throughout the course of this exposition, we highlight contributions to linguistic theory that such a thorough analysis of individual variation provides. We describe the postlexical module of an English text-to-speech synthesizer. This module is responsible for transforming underlying lexical pronunciations from a lexical database into contextually appropriate surface postlexical pronunciations. This transformation is achieved by machine learning of a corpus of hand-labeled postlexical pronunciations that have been aligned with lexical pronunciations. The machine learning is conducted by a neural network, whose architecture and data encoding we describe. A thorough analysis of the performance of the postlexical module is offered, with attention to the relative success of the neural network at learning a wide range of postlexical phenomena. We examine the extent to which a symbolic approach to allophony is warranted, and provide an acoustic analysis that attempts to provide an answer to this question. Assessments of the success of currently existing theories of phonetics, phonology and their interface are offered, based on the experience of generating a complete postlexical phonology of English for use in synthetic speech.

机译：本文提出了语音合成中语音建模领域的研究。语音建模是指用于生成高质量的类人语音的体系结构和原理。术语“语音建模”以前已在语音识别的上下文中应用（例如Byrne等，1997）。在这种情况下，它描述了处理扬声器中自然发生的发音变化的理论和过程。相反，我们的工作属于文本到语音合成领域，正如我们将要展示的那样，它要求对合成器正在尝试建模的个人的语音变化进行建模。我们将解释我们用于个体学习和再现发音变化的方法，并说明如何使用我们描述的体系结构轻松生成这种变化的最关键特征。在本次博览会的整个过程中，我们重点介绍对语言理论所做的贡献，这些理论是对个体变异的全面分析所提供的。我们描述了英语文本语音合成器的词后模块。该模块负责将基础词汇发音从词汇数据库转换为上下文合适的表面后词汇发音。这种转换是通过机器学习已与词汇发音对齐的手标记后词汇发音的语料库来实现的。机器学习由神经网络进行，我们将描述其架构和数据编码。提供了对词后模块的性能的全面分析，并关注了神经网络在学习各种词后现象方面的相对成功。我们研究了在某种程度上应采用同种异形的象征方法的必要性，并提供了声学分析，试图为这个问题提供答案。基于生成完整的用于合成语音的英语后词汇语音学的经验，对现有语音学，语音学及其界面理论的成功进行了评估。

著录项

作者
Miller, Corey Andrew.;
展开▼
作者单位

University of Pennsylvania.;

展开▼
授予单位 University of Pennsylvania.;
学科 Language Linguistics.
学位 Ph.D.
年度 1998
页码 234 p.
总页数 234
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Talker-Specific Pronunciation or Speech Error? Discounting (or not) Atypical Pronunciations During Speech Perception [J] . Liu Linda, Jaeger T. Florian Journal of experimental psychology. human perception and performance . 2019,第12期

机译：Talker特定的发音或语音错误？在语音看法期间折扣（或不是）非典型发音
2. Multilingual recognition of non-native speech using acoustic model transformation and pronunciation modeling [J] . G. Bouselmi, D. Fohr, I. Illina International journal of speech technology . 2012,第2期

机译：使用声学模型转换和语音建模对非母语语音进行多语言识别
3. Cross-word Arabic pronunciation variation modeling for speech recognition [J] . Dia AbuZeina, Wasfi Al-Khatib, Moustafa Elshafei, International journal of speech technology . 2011,第3期

机译：用于语音识别的跨字阿拉伯语发音变化建模
4. Speech Recognition Based Pronunciation Evaluation Using Pronunciation Variations and Anti-models for Non-native Language Learners [C] . Yoo Rhee Oh, Jeon Gue Park, Yun Keun Lee Advanced information technology in education . 2011

机译：基于语音识别的语音变异和反模型的非母语学习者语音评估
5. Pronunciation Variation Modeling for Automatic Speech Recognition [D] . Zheng, Jing 2014

机译：自动语音识别的语音变化建模
6. Models of speech synthesis. [O] . R Carlson 1995

机译：语音合成模型。
7. The generation of regional pronunciations of English for speech synthesis. [O] . Fitt Susan 1997

机译：用于语音合成的英语区域发音的生成。

Pronunciation modeling in speech synthesis.

摘要

著录项

相似文献

相关主题

期刊订阅