首页> 外文学位 >Toward more effective acoustic model clustering by more efficient use of data in speech recognition.

【24h】

Toward more effective acoustic model clustering by more efficient use of data in speech recognition.

机译：通过在语音识别中更有效地使用数据来实现更有效的声学模型聚类。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

In current large vocabulary continuous speech recognition systems, multivariate Gaussian mixture distributions and context-dependent phones, typically triphones, are used to achieve high accuracy acoustic models. It is crucial to address the problem of how to estimate an extremely large number of model parameters from a limited amount of training data. The traditional approach uses phonetic decision tree based context clustering for reducing free parameters. However, this approach has several problems that might cause system performance degradation. All of these problems are due to the fact that the traditional approach does not efficiently use the limited training data and therefore fails to obtain effective acoustic models. Specifically, three problems are identified and addressed. The first problem is that all states clustered in a leaf node must share the same set of Gaussian components and mixture weights; no distinction is provided among those states. The second problem is due to the fact that triphones that are rarely seen in the training data might be poorly estimated and this causes an adverse effect on decision-tree clustering. The traditional approach lacks an effective mechanism to handle this. The third problem is that only single-Gaussian distributions are used to build decision trees whereas multiple-Gaussian mixture distributions are used in the final model set.; In this thesis, we propose to improve the quality of acoustic models by making use of training data more efficiently. We present a number of ways to address the problems in the traditional approach, namely, a two-level decision tree approach for the first problem, a two-stage decision tree based approach and a MAP-based approach for the second problem, and an approach using a new criterion and an effective clustering algorithm for the third problem. Each of these approaches has successfully reduced the word error rate (WER) of the traditional approach with a statistical significance. Finally the system combining all new approaches has achieved the best performance, which reduced the WER of the baseline system by 14% to 17% relative, with the sizes of acoustic models smaller than those of the baseline models by 8% to 11%.

机译：在当前的大词汇量连续语音识别系统中，使用多元高斯混合分布和上下文相关的电话（通常为三音机）来实现高精度声学模型。解决如何从数量有限的训练数据中估计大量模型参数的问题至关重要。传统方法使用基于语音决策树的上下文聚类来减少自由参数。但是，此方法有几个问题，可能会导致系统性能下降。所有这些问题都是由于传统方法无法有效利用有限的训练数据，因此无法获得有效的声学模型这一事实造成的。具体来说，确定并解决了三个问题。第一个问题是，聚集在叶节点中的所有状态必须共享同一组高斯分量和混合权重；这些州之间没有区别。第二个问题是由于以下事实：在训练数据中很少见到的三音素可能估算不佳，这会对决策树聚类产生不利影响。传统方法缺乏有效的机制来处理此问题。第三个问题是，仅单高斯分布用于构建决策树，而多高斯混合分布用于最终模型集。本文提出通过更有效地利用训练数据来提高声学模型的质量。我们提出了许多方法来解决传统方法中的问题，即针对第一个问题的两级决策树方法，基于两阶段决策树的方法和针对第二个问题的基于MAP的方法，以及方法使用新准则和有效的聚类算法解决第三个问题。这些方法中的每一种都成功地降低了具有统计意义的传统方法的单词错误率（WER）。最终，结合所有新方法的系统达到了最佳性能，相对于基准模型，声学模型的尺寸减小了8％至11％，从而使基准系统的WER相对降低了14％至17％。

著录项

作者
Liu, Chaojun.;
展开▼
作者单位

OGI School of Science & Engineering.;

展开▼
授予单位 OGI School of Science & Engineering.;
学科 Computer Science.; Engineering Electronics and Electrical.
学位 Ph.D.
年度 2002
页码 108 p.
总页数 108
原文格式 PDF
正文语种 eng
中图分类自动化技术、计算机技术;无线电电子学、电信技术;
关键词

相似文献

外文文献
中文文献
专利

1. An effective cluster-based model for robust speech detection and speech recognition in noisy environments [J] . Gorriz JM, Ramirez J, Segura JC, The Journal of the Acoustical Society of America . 2006,第1期

机译：在嘈杂环境中用于鲁棒语音检测和语音识别的有效基于群集的模型
2. Efficient data selection for speech recognition based on prior confidence estimation using speech and monophone models [J] . Satoshi Kobashikawa, Taichi Asami, Yoshikazu Yamaguchi, Computer speech and language . 2014,第6期

机译：基于语音和单音模型的先验置信度估计的语音识别有效数据选择
3. Effective Acoustic Modeling for Pronunciation Quality Scoring of Strongly Accented Mandarin Speech [J] . Fengpei GE, Changliang LIU, Jian SHAO, IEICE Transactions on Information and Systems . 2008,第10期

机译：针对重音普通话语音质量得分的有效声学建模
4. Model-based compensation of the additive noise for continuous speech recognition. Experiments using the AURORA II database and tasks [C] . J. C. Segura, A. de la Torre, M. C. Benitez, European conference on speech communication and technology . 2001

机译：基于模型的连续语音识别添加剂噪声补偿。使用Aurora II数据库和任务的实验
5. Graph-based Semi-Supervised Learning in Acoustic Modeling for Automatic Speech Recognition. [D] . Liu, Yuzong. 2016

机译：用于自动语音识别的声学建模中基于图的半监督学习。
6. SEMIPARAMETRIC EFFICIENT ESTIMATION FOR SHARED-FRAILTY MODELS WITH DOUBLY-CENSORED CLUSTERED DATA [O] . Yu-Ru Su, Jane-Ling Wang -1

机译：具有双删截聚类数据的共享脆弱模型的半参数有效估计
7. Unsupervised clustering of audio data for acoustic modelling in automatic speech recognition systems [O] . Goussard George Willem 2011

机译：用于自动语音识别系统中的声学建模的无监督的音频数据聚类
8. Segment-Based Acoustic Models for Continuous Speech Recognition. [R] . Ostendorf, M., Rohlicek, J. R. 1994

机译：基于分段的连续语音识别声学模型。

Toward more effective acoustic model clustering by more efficient use of data in speech recognition.

摘要

著录项

相似文献

相关主题

期刊订阅