Ensemble acoustic modeling in Automatic Speech Recognition.

机译：在自动语音识别中集成声学建模。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Combining multiple acoustic models to improve the overall acoustic model quality is a young and promising direction in Automatic Speech Recognition (ASR). Previous works on acoustic modeling of speech signals such as Random Forests (RFs) of Phonetic Decision Trees (PDTs) has produced significant improvements in recognition accuracy. In this dissertation, several new approaches of using data sampling to construct an Ensemble of Acoustic Models (EAM) for speech recognition are proposed. A straightforward method of data sampling is Cross Validation (CV) data partition. In the direction of improving inter-model diversity within an EAM for speaker independent speech recognition, we propose Speaker Clustering (SC) based data sampling and develop two algorithms, including the Likelihood based Speaker Clustering (LSC) and speaker model Distance based Speaker Clustering (DSC). In the direction of improving base model quality as well as inter-model diversity, we further investigate the effects of several successful techniques of single model training in speech recognition on the proposed ensemble acoustic models, including Cross Validation Expectation Maximization (CVEM), Discriminative Training (DT), and Multiple Layer Perceptron (MLP) features. We also propose using an ensemble of Multiple models with Different Mixture Sizes (MDMS) to improve EAM quality. We have evaluated the proposed methods on TIMIT speaker-independent phoneme recognition task as well as on a telemedicine automatic captioning task of speaker-dependent continuous speech recognition. The proposed EAMs have led to significant improvements in recognition accuracy over conventional Hidden Markov Model (HMM) baseline systems, and the integration of ensemble acoustic models with CVEM, DT and MLP has also significantly improved the accuracy performances of CVEM, DT, and MLP based single model systems. We further investigated the largely unstudied factor of inter-model diversity, and proposed several methods to explicit measure inter-model diversity. We demonstrate a positive relation between enlarging inter-model diversity and increasing EAM quality.;HMM-based acoustic models built from data sampling EAM are generally very large, especially when a large number of models or full covariance matrices are used for Gaussian densities. Therefore, compacting the acoustic model to a reasonable size for practical applications while maintaining a reasonable performance is needed. Toward this goal, in this dissertation, we discuss and investigate several distance measures and algorithms for clustering methods. The distance measures include Entropy, KL, Bhattacharyya, Chernoff and their weighted versions. For clustering algorithms, besides the conventional greedy agglomerative clustering, algorithms such as N-Best distance Refinement (NBR), K-step LookAhead (KLA), Breadth-First Search (BFS) are proposed. Experiments on the TIMIT task have shown that in comparison with the original EAM model, the compacted models using the clustering methods can maintain the model accuracy, while the size of the compacted model is largely decreased. Experiments in compacting EAM on a Pashto ASR task have shown that the proposed clustering methods can lead to better quality than the conventional clustering methods.;Unlike the implicit PDT based states tying that has been used in most ASR systems as well as in the recent RF based PDTs, explicit PDT (EPDT) state tying that allows Phoneme data Sharing (PS) is considered for its potential capability in capturing pronunciation variations. The ensemble approach of combining multiple acoustic models is applied to the EPDT, where a combination of explicit PDT and implicit PDT models has been investigated to reduce phone confusions.

机译：组合多种声学模型以提高整体声学模型质量是自动语音识别（ASR）的一个年轻且有希望的方向。先前有关语音信号声学建模的工作，例如语音决策树（PDT）的随机森林（RF），已经在识别精度上取得了显着提高。本文提出了几种新的利用数据采样技术来构造语音识别模型的方法。数据采样的一种直接方法是交叉验证（CV）数据分区。为了改善EAM中用于说话人独立语音识别的模型间多样性，我们提出了基于说话人聚类（SC）的数据采样并开发了两种算法，包括基于似然的说话人聚类（LSC）和基于说话人模型的基于距离的说话人聚类（ DSC）。在改善基本模型质量以及模型间多样性的方向上，我们进一步研究了语音识别中单模型训练的几种成功技术对所提出的集成声学模型的影响，包括交叉验证期望最大化（CVEM），判别训练（DT）和多层感知器（MLP）功能。我们还建议使用具有不同混合物尺寸（MDMS）的多个模型的集合来提高EAM质量。我们已经评估了TIMIT独立于说话者的音素识别任务以及依赖于说话者的连续语音识别的远程医疗自动字幕任务的拟议方法。与常规的隐马尔可夫模型（HMM）基准系统相比，拟议的EAM导致识别精度有了显着提高，集成声学模型与CVEM，DT和MLP的集成也显着提高了基于CVEM，DT和MLP的准确性单模型系统。我们进一步研究了模型间多样性的很大程度上未被研究的因素，并提出了几种显式测量模型间多样性的方法。我们证明了扩大模型间的多样性与提高EAM质量之间存在正相关关系。从数据采样EAM构建的基于HMM的声学模型通常非常大，尤其是当大量模型或完全协方差矩阵用于高斯密度时。因此，需要在保持合理性能的同时将声学模型压缩到适合实际应用的尺寸。为了达到这个目标，本文讨论并研究了几种聚类方法的距离度量和算法。距离度量包括Entropy，KL，Bhattacharyya，Chernoff及其加权版本。对于聚类算法，除了常规的贪婪聚类聚类之外，还提出了N-最佳距离细化（NBR），K步超前（KLA），广度优先搜索（BFS）等算法。 TIMIT任务的实验表明，与原始EAM模型相比，使用聚类方法的压缩模型可以保持模型的准确性，而压缩模型的大小却大大减小了。在Pashto ASR任务上压缩EAM的实验表明，与传统的聚类方法相比，所提出的聚类方法可以带来更好的质量。与大多数ASR系统以及最近的RF中使用的基于隐式PDT的状态绑定不同在基于PDT的基础上，考虑允许音素数据共享（PS）的显式PDT（EPDT）状态绑定具有捕获语音变化的潜在能力。结合多个声学模型的整体方法被应用于EPDT，其中已经研究了显式PDT和隐式PDT模型的组合以减少电话混乱。

著录项

作者
Chen, Xin.;
展开▼
作者单位

University of Missouri - Columbia.;

展开▼
授予单位 University of Missouri - Columbia.;
学科 Computer Science.
学位 Ph.D.
年度 2011
页码 121 p.
总页数 121
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. A model of auditory perception as front end for automatic speech recognition. [J] . Tchorz J, Kollmeier B The Journal of the Acoustical Society of America . 1999,第4aPta1期

机译：听觉感知模型作为自动语音识别的前端。
2. Do We Need STRFs for Cocktail Parties? On the Relevance of Physiologically Motivated Features for Human Speech Perception Derived from Automatic Speech Recognition. [J] . B Kollmeier, M R René Sch?dler, A Meyer, Advances in Experimental Medicine and Biology . 2013,第Null期

机译：鸡尾酒会需要STRF吗？生理动机特征与自动语音识别衍生的人类语音感知的相关性。
3. Evaluation of speech intelligibility for children with cleft lip and palate by means of automatic speech recognition. [J] . Schuster M, Maier A, Haderlein T, International journal of pediatric otorhinolaryngology . 2006,第10期

机译：通过自动语音识别评估唇left裂儿童的语音清晰度。
4. Acoustic model merging using acoustic models from multilingual speakers for automatic speech recognition [C] . Tien-Ping Tan, Besacier L., Lecouteux B. International conference on asian language processing . 2014

机译：使用多语言扬声器的声学模型合并声学模型以实现自动语音识别
5. Graph-based Semi-Supervised Learning in Acoustic Modeling for Automatic Speech Recognition. [D] . Liu, Yuzong. 2016

机译：用于自动语音识别的声学建模中基于图的半监督学习。
6. Automatic speech recognition using articulatory features from subject-independent acoustic-to-articulatory inversion [O] . Prasanta Kumar Ghosh, Shrikanth Narayanan -1

机译：使用从独立于受试者的声学到发音反转的发音特征进行自动语音识别
7. Acoustic Model Merging Using Acoustic Models from Multilingual Speakers for Automatic Speech Recognition [O] . Tien-ping Tan, Laurent Besacier, Benjamin Lecouteux 2015

机译：声学模型融合使用多语言扬声器的声学模型进行自动语音识别

Ensemble acoustic modeling in Automatic Speech Recognition.

摘要

著录项

相似文献

相关主题

期刊订阅