首页> 外文学位 >Advancements in Acoustic Based Language Identification/Recognition
【24h】

Advancements in Acoustic Based Language Identification/Recognition

机译:基于声学的语言识别/识别的进展

获取原文
获取原文并翻译 | 示例

摘要

With over 6,000 languages spoken worldwide, effective language recognition(LR) is needed prior to employing any speech technologies. Language identification (LID) is essential in speech pre-processing which is typically followed by automatic speech recognition or target speech post-processing. There are closed-set and open-set LID tasks according to the specific test condition. In real scenarios, closed-set robust language identification is usually hindered by mismatch factors such as background noise, channel, and speech duration. In addition, unknown/out-of-set (OOS) language rejection is another major challenge for open-set LID because of the increased cost/resources necessary in collecting effective OOS data. To address the close-set LID problem, this dissertation focuses on advancements based on diverse acoustic features and back-ends, and their influence on LID system fusion. A set of distinct acoustic features are considered, which are grouped into three categories: classical features, innovative features, and extensional features. In addition, both front-end concatenation and back-end fusion are considered. The results suggest that no single feature type is universally vital across all LID tasks and that a fusion of a diverse set is needed to ensure sustained LID performance in challenging scenarios. More specifically, the proposed hybrid fusion method improves LID system performance by +38.5% and +46.2% on the highly noisy DARPA RATS dataset and the large scale NIST LRE-09 dataset, respectively. To address a related scenario, for closely spaced dialect identification, two types of unsupervised deep learning methods are introduced for feature extraction. First, an unsupervised bottleneck feature extraction diagram is proposed, which is derived from the traditional bottleneck structure but trained with estimated phonetic label knowledge. Secondly, two types of latent variable learning algorithms are introduced to speech feature processing based on generative modeling auto-encoder. Compared with the baseline MFCC i-Vector system, the proposed methods can achieve up to a relative 58% performance improvement for a 4-way Chinese dialect corpus. For open-set LID, we propose three effective and flexible OOS candidate selection methods in order to boost OOS language rejection and improve overall classification performance. Specifically, two selection strategies are proposed at the front-end feature level, (i) k-means clustering selection and (ii) complementary candidate selection with a minimum Kullback-Leibler divergence versus the closed-set as a baseline. In addition, a (iii) general candidate selection method is proposed according to an engineering perspective based language relationship, which is explored based on the back-end score vectors of each language. With these proposed selection methods, data enhancement will be more effective and efficient than that based on an alternative baseline random selection option. To the best of our knowledge, this is the first major effort on effective OOS language selection to improve OOS rejection in open-set LID. As speech technology is employed in more diverse consumer, commercial, government, social, and global human engagement scenarios, advancing effective LR is needed as individual language diversity expanded for voice engagement and communication/electronic interaction.
机译:在全球使用超过6,000种语言时,在使用任何语音技术之前需要有效的语言识别(LR)。语言识别(LID)在语音预处理中必不可少,通常需要进行自动语音识别或目标语音后处理。根据特定的测试条件,有封闭式和开放式LID任务。在实际场景中,闭集鲁棒语言识别通常受不匹配因素(例如背景噪声,通道和语音持续时间)的阻碍。此外,由于收集有效OOS数据所需的成本/资源增加,未知/不合(OOS)语言拒绝是开放式LID的另一个主要挑战。为了解决封闭式LID问题,本文重点研究基于多种声学特征和后端的先进技术及其对LID系统融合的影响。考虑了一组独特的声学特征,这些特征分为三类:古典特征,创新特征和扩展特征。此外,还考虑了前端串联和后端融合。结果表明,在所有LID任务中,没有一个单一的特征类型具有普遍意义,并且需要融合多种多样的集合以确保在具有挑战性的场景下持续的LID性能。更具体地说,在高噪声的DARPA RATS数据集和大规模NIST LRE-09数据集上,所提出的混合融合方法将LID系统性能分别提高了+ 38.5%和+ 46.2%。为了解决相关情况,对于近距离的方言识别,引入了两种类型的无监督深度学习方法来进行特征提取。首先,提出了无监督的瓶颈特征提取图,该图是从传统的瓶颈结构派生而来的,并经过估计的语音标签知识训练。其次,在基于生成模型自动编码器的语音特征处理中引入了两种类型的潜在变量学习算法。与基础MFCC i-Vector系统相比,该方法可以使4种方言汉语语料库的性能最高提高58%。对于开放式LID,我们提出了三种有效且灵活的OOS候选者选择方法,以提高OOS语言的拒绝率并提高整体分类性能。具体来说,在前端特征级别上提出了两种选择策略:(i)k-均值聚类选择和(ii)以最小Kullback-Leibler散度相对于封闭集为基线的互补候选者选择。另外,根据基于工程视角的语言关系,提出了一种(iii)通用的候选人选择方法,该方法是基于每种语言的后端得分向量进行探索的。使用这些建议的选择方法,数据增强将比基于替代基线随机选择选项的数据增强更加有效。据我们所知,这是有效选择OOS语言以改善开放式LID中OOS拒绝率的第一项重大工作。随着语音技术被用于更加多样化的消费者,商业,政府,社会和全球人类参与场景中,随着个人语言多样性扩展到语音参与和通信/电子交互,需要提高有效的LR。

著录项

  • 作者

    Zhang, Qian.;

  • 作者单位

    The University of Texas at Dallas.;

  • 授予单位 The University of Texas at Dallas.;
  • 学科 Electrical engineering.;Computer science.;Statistics.
  • 学位 Ph.D.
  • 年度 2017
  • 页码 146 p.
  • 总页数 146
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 康复医学;
  • 关键词

  • 入库时间 2022-08-17 11:38:57

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号