首页> 外文学位 >Graph-based Semi-Supervised Learning in Acoustic Modeling for Automatic Speech Recognition.

【24h】

Graph-based Semi-Supervised Learning in Acoustic Modeling for Automatic Speech Recognition.

机译：用于自动语音识别的声学建模中基于图的半监督学习。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Acoustic models require a large amount of training data. However, lots of labor is required to annotate the training data for automatic speech recognition. More importantly, the performance of the acoustic model could degenerate during test time, where the conditions of test data differ from the training data in speaker characteristics, channel and recording environment. To compensate for the deviation between training and test conditions, we investigate a graph-based semi-supervised learning approach to acoustic modeling in automatic speech recognition.;Graph-based semi-supervised learning (SSL) is a widely used semi-supervised learning method in which the labeled data and unlabeled data are jointly represented as a weighted graph, and the information is propagated from the labeled data to the unlabeled data. The key assumption that graph-based SSL makes is that data samples lie on a low dimensional manifold, where samples that are close to each other are expected to have the same class label. More importantly, by exploiting the relationship between training and test samples, graph-based SSL implicitly adapts to the test data.;In this thesis, we address several key challenges in applying graph-based SSL to acoustic modeling. We first investigate and compare several state-of-the-art graph-based SSL algorithms on a benchmark dataset. In addition, we propose novel graph construction methods that allow graph-based SSL to handle variable-length input features. We next investigate the efficacy of graph-based SSL in context of a fully-fledged DNN-based ASR system. We compare two different integration frameworks for graph-based learning. First, we propose a lattice-based late integration framework that combines graph-based SSL with the DNN-based acoustic modeling and evaluate the framework on continuous word recognition tasks. Second, we propose an early integration framework using neural graph embeddings and compare two different neural graph embedding features that capture the information of the manifold at different levels. The embedding features are used as input to a DNN system and are shown to outperform the conventional acoustic feature inputs on several medium-to-large vocabulary conversational speech recognition tasks.

机译：声学模型需要大量的训练数据。但是，需要大量的工作来注释训练数据以进行自动语音识别。更重要的是，声学模型的性能可能会在测试期间退化，因为测试数据的条件与扬声器特性，声道和录制环境中的训练数据不同。为了弥补训练条件和测试条件之间的偏差，我们研究了一种基于图的半监督学习方法，用于自动语音识别中的声学建模。；基于图的半监督学习（SSL）是一种广泛使用的半监督学习方法其中标记数据和未标记数据共同表示为加权图，信息从标记数据传播到未标记数据。基于图的SSL所做的关键假设是数据样本位于低维流形上，其中彼此接近的样本应具有相同的类标签。更重要的是，通过利用训练样本与测试样本之间的关系，基于图的SSL隐式地适应了测试数据。在本文中，我们解决了将基于图的SSL应用于声学建模的几个关键挑战。我们首先在基准数据集上研究和比较几种基于图形的最新SSL算法。此外，我们提出了新颖的图构建方法，该方法允许基于图的SSL处理可变长度输入特征。接下来，我们将在成熟的基于DNN的ASR系统的背景下研究基于图的SSL的功效。我们比较了两种不同的基于图的学习集成框架。首先，我们提出了一个基于晶格的后期集成框架，该框架将基于图的SSL与基于DNN的声学建模相结合，并对连续单词识别任务进行评估。其次，我们提出了一种使用神经图嵌入的早期集成框架，并比较了两种不同的神经图嵌入功能，这些特征捕获了不同级别的流形信息。嵌入特征被用作DNN系统的输入，并且在某些中到大词汇量的对话式语音识别任务中，其表现优于传统的声学特征输入。

著录项

作者
Liu, Yuzong.;
展开▼
作者单位

University of Washington.;

展开▼
授予单位 University of Washington.;
学科 Artificial intelligence.
学位 Ph.D.
年度 2016
页码 168 p.
总页数 168
原文格式 PDF
正文语种 eng
中图分类
关键词
入库时间 2022-08-17 11:50:25

相似文献

外文文献
中文文献
专利

1. Graph-Based Semisupervised Learning for Acoustic Modeling in Automatic Speech Recognition [J] . Yuzong Liu, Katrin Kirchhoff Audio, Speech, and Language Processing, IEEE/ACM Transactions on . 2016,第11期

机译：基于图的半监督学习在自动语音识别中的声学建模
2. Mixture distribution modeling for scalable graph-based semi-supervised learning [J] . Li Zhi, Li Chaozhuo, Yang Liqun, Knowledge-Based Systems . 2020,第Jul20期

机译：可扩展图基半监督学习的混合分布模型
3. Multi-View and Multi-Objective Semi-Supervised Learning for HMM-Based Automatic Speech Recognition [J] . Xiaodong Cui Audio, Speech, and Language Processing, IEEE Transactions on . 2012,第7期

机译：基于HMM的自动语音识别的多视图多目标半监督学习
4. Graph-based semi-supervised acoustic modeling in DNN-based speech recognition [C] . Yuzong Liu, Kirchhoff Katrin IEEE Workshop on Spoken Language Technology . 2014

机译：基于DNN的语音识别中基于图的半监督声学建模
5. Segmental models with an exploration of acoustic and lexical grouping in automatic speech recognition. [D] . He, Yanzhang. 2015

机译：在自动语音识别中探索声学和词汇分组的分段模型。
6. Morpho-Phonetic Effects in Speech Production: Modeling the Acoustic Duration of English Derived Words With Linear Discriminative Learning [O] . Simon David Stein, Ingo Plag 2021

机译：语音生产中的语音拼音效应：用线性鉴别学习建模英语衍生词的声学持续时间
7. Graph-based Semi-Supervised Learning in Acoustic Modeling for Automatic Speech Recognition [O] . Liu, Yuzong 2016

机译：语音建模中基于图的半监督学习的自动语音识别

Graph-based Semi-Supervised Learning in Acoustic Modeling for Automatic Speech Recognition.

摘要

著录项

相似文献

相关主题

期刊订阅