首页> 外文OA文献 >Automatic Dialect and Accent Recognition and its Application to Speech Recognition
【2h】

Automatic Dialect and Accent Recognition and its Application to Speech Recognition

机译:方言和重音自动识别及其在语音识别中的应用

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

A fundamental challenge for current research on speech science and technology is understanding and modeling individual variation in spoken language. Individuals have their own speaking styles, depending on many factors, such as their dialect and accent as well as their socioeconomic background. These individual differences typically introduce modeling difficulties for large-scale speaker-independent systems designed to process input from any variant of a given language. This dissertation focuses on automatically identifying the dialect or accent of a speaker given a sample of their speech, and demonstrates how such a technology can be employed to improve Automatic Speech Recognition (ASR). In this thesis, we describe a variety of approaches that make use of multiple streams of information in the acoustic signal to build a system that recognizes the regional dialect and accent of a speaker. In particular, we examine frame-based acoustic, phonetic, and phonotactic features, as well as high-level prosodic features, comparing generative and discriminative modeling techniques. We first analyze the effectiveness of approaches to language identification that have been successfully employed by that community, applying them here to dialect identification. We next show how we can improve upon these techniques. Finally, we introduce several novel modeling approaches -- Discriminative Phonotactics and kernel-based methods. We test our best performing approach on four broad Arabic dialects, ten Arabic sub-dialects, American English vs. Indian English accents, American English Southern vs. Non-Southern, American dialects at the state level plus Canada, and three Portuguese dialects. Our experiments demonstrate that our novel approach, which relies on the hypothesis that certain phones are realized differently across dialects, achieves new state-of-the-art performance on most dialect recognition tasks. This approach achieves an Equal Error Rate (EER) of 4% for four broad Arabic dialects, an EER of 6.3% for American vs. Indian English accents, 14.6% for American English Southern vs. Non-Southern dialects, and 7.9% for three Portuguese dialects. Our framework can also be used to automatically extract linguistic knowledge, specifically the context-dependent phonetic cues that may distinguish one dialect form another. We illustrate the efficacy of our approach by demonstrating the correlation of our results with geographical proximity of the various dialects. As a final measure of the utility of our studies, we also show that, it is possible to improve ASR. Employing our dialect identification system prior to ASR to identify the Levantine Arabic dialect in mixed speech of a variety of dialects allows us to optimize the engine's language model and use Levantine-specific acoustic models where appropriate. This procedure improves the Word Error Rate (WER) for Levantine by 4.6% absolute; 9.3% relative. In addition, we demonstrate in this thesis that, using a linguistically-motivated pronunciation modeling approach, we can improve the WER of a state-of-the art ASR system by 2.2% absolute and 11.5% relative WER on Modern Standard Arabic.
机译:当前语音科学和技术研究的一个基本挑战是理解和建模口语中的个体差异。每个人都有自己的说话风格,这取决于许多因素,例如他们的方言和口音以及他们的社会经济背景。这些个体差异通常会给大型独立于说话者的系统带来建模困难,这些系统旨在处理来自给定语言的任何变体的输入。本文着重于在给定语音样本的情况下自动识别说话者的方言或口音,并演示如何使用这种技术来改善自动语音识别(ASR)。在本文中,我们描述了多种方法,这些方法利用声音信号中的多个信息流来构建识别区域方言和说话人口音的系统。特别是,我们比较了生成和判别建模技术,研究了基于帧的声学,语音和音韵学特征以及高级韵律特征。我们首先分析该社区已成功采用的语言识别方法的有效性,并将其应用于方言识别。接下来,我们展示如何改进这些技术。最后,我们介绍了几种新颖的建模方法-判别现象学和基于内核的方法。我们在四种广泛的阿拉伯方言,十种阿拉伯亚方言,美国英语对印度英语的口音,美国英语南部对非南部语言,州级以及加拿大的美国方言和三种葡萄牙语方言中测试最佳表现方法。我们的实验表明,我们的新颖方法依赖于某些电话在方言中的实现方式不同的假设,可在大多数方言识别任务上实现最新的性能。这种方法对四种广泛的阿拉伯方言的均等错误率(EER)为4%,对美国和印度英语的口音的EER为6.3%,对美国英语的南部方言和非南部方言为14.6%,对于三种方言为7.9%葡萄牙语方言。我们的框架还可以用于自动提取语言知识,特别是上下文相关的语音提示,可以将一种方言与另一种方言区分开。我们通过展示我们的结果与各种方言在地理上的邻近性的相关性来说明我们的方法的有效性。作为我们研究效用的最终衡量标准,我们还表明,有可能改善ASR。在ASR之前使用我们的方言识别系统来识别多种方言的混合语音中的黎凡特阿拉伯方言,这使我们能够优化引擎的语言模型,并在适当的地方使用黎凡特专有的声学模型。此过程将黎凡特的单词错误率(WER)绝对提高了4.6%;相对9.3%。此外,我们在本文中证明,采用语言动机的语音建模方法,可以将现代ASR的最新ASR系统的WER绝对值提高2.2%,相对WER值提高11.5%。

著录项

  • 作者

    Biadsy Fadi;

  • 作者单位
  • 年度 2011
  • 总页数
  • 原文格式 PDF
  • 正文语种 {"code":"en","name":"English","id":9}
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号