首页> 外文学位 >Graph-based Semi-Supervised Learning in Acoustic Modeling for Automatic Speech Recognition.
【24h】

Graph-based Semi-Supervised Learning in Acoustic Modeling for Automatic Speech Recognition.

机译:用于自动语音识别的声学建模中基于图的半监督学习。

获取原文
获取原文并翻译 | 示例

摘要

Acoustic models require a large amount of training data. However, lots of labor is required to annotate the training data for automatic speech recognition. More importantly, the performance of the acoustic model could degenerate during test time, where the conditions of test data differ from the training data in speaker characteristics, channel and recording environment. To compensate for the deviation between training and test conditions, we investigate a graph-based semi-supervised learning approach to acoustic modeling in automatic speech recognition.;Graph-based semi-supervised learning (SSL) is a widely used semi-supervised learning method in which the labeled data and unlabeled data are jointly represented as a weighted graph, and the information is propagated from the labeled data to the unlabeled data. The key assumption that graph-based SSL makes is that data samples lie on a low dimensional manifold, where samples that are close to each other are expected to have the same class label. More importantly, by exploiting the relationship between training and test samples, graph-based SSL implicitly adapts to the test data.;In this thesis, we address several key challenges in applying graph-based SSL to acoustic modeling. We first investigate and compare several state-of-the-art graph-based SSL algorithms on a benchmark dataset. In addition, we propose novel graph construction methods that allow graph-based SSL to handle variable-length input features. We next investigate the efficacy of graph-based SSL in context of a fully-fledged DNN-based ASR system. We compare two different integration frameworks for graph-based learning. First, we propose a lattice-based late integration framework that combines graph-based SSL with the DNN-based acoustic modeling and evaluate the framework on continuous word recognition tasks. Second, we propose an early integration framework using neural graph embeddings and compare two different neural graph embedding features that capture the information of the manifold at different levels. The embedding features are used as input to a DNN system and are shown to outperform the conventional acoustic feature inputs on several medium-to-large vocabulary conversational speech recognition tasks.
机译:声学模型需要大量的训练数据。但是,需要大量的工作来注释训练数据以进行自动语音识别。更重要的是,声学模型的性能可能会在测试期间退化,因为测试数据的条件与扬声器特性,声道和录制环境中的训练数据不同。为了弥补训练条件和测试条件之间的偏差,我们研究了一种基于图的半监督学习方法,用于自动语音识别中的声学建模。;基于图的半监督学习(SSL)是一种广泛使用的半监督学习方法其中标记数据和未标记数据共同表示为加权图,信息从标记数据传播到未标记数据。基于图的SSL所做的关键假设是数据样本位于低维流形上,其中彼此接近的样本应具有相同的类标签。更重要的是,通过利用训练样本与测试样本之间的关系,基于图的SSL隐式地适应了测试数据。在本文中,我们解决了将基于图的SSL应用于声学建模的几个关键挑战。我们首先在基准数据集上研究和比较几种基于图形的最新SSL算法。此外,我们提出了新颖的图构建方法,该方法允许基于图的SSL处理可变长度输入特征。接下来,我们将在成熟的基于DNN的ASR系统的背景下研究基于图的SSL的功效。我们比较了两种不同的基于图的学​​习集成框架。首先,我们提出了一个基于晶格的后期集成框架,该框架将基于图的SSL与基于DNN的声学建模相结合,并对连续单词识别任务进行评估。其次,我们提出了一种使用神经图嵌入的早期集成框架,并比较了两种不同的神经图嵌入功能,这些特征捕获了不同级别的流形信息。嵌入特征被用作DNN系统的输入,并且在某些中到大词汇量的对话式语音识别任务中,其表现优于传统的声学特征输入。

著录项

  • 作者

    Liu, Yuzong.;

  • 作者单位

    University of Washington.;

  • 授予单位 University of Washington.;
  • 学科 Artificial intelligence.
  • 学位 Ph.D.
  • 年度 2016
  • 页码 168 p.
  • 总页数 168
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

  • 入库时间 2022-08-17 11:50:25

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号