首页> 外文期刊>Computer speech and language >Design of mixture of GMMs for Query-by-Example Spoken Term Detection
【24h】

Design of mixture of GMMs for Query-by-Example Spoken Term Detection

机译:用于示例查询口语检测的GMM混合设计

获取原文
获取原文并翻译 | 示例

摘要

This paper presents the design of a mixture of Gaussian Mixture Models (GMMs) for Query-by-Example Spoken Term Detection (QbE-STD). The speech data governs acoustically similar broad phonetic structures. To capture broad phonetic structure, we exploit additional information of broad phoneme classes (such as vowels, semi-vowels, nasals, fricatives, and plosives) for the training of the GMM. The mixture of GMMs is tied with GMMs of these broad phoneme classes, i.e., each GMM expresses the probability density function (pdf) of a broad phoneme category. The Expectation Maximization (EM) algorithm is used to obtain the GMM for each broad phoneme class. Thus, a mixture of GMMs represents the spoken query with the broad phonetic constraints. These constraints restrict the posterior probability within the broad class, which results into a better posteriorgram design. The novelty of our work lies in prior probability assignments (as weights of the mixture of GMMs) for better Gaussian posteriorgram design. The proposed simple yet effective posteriorgram outperform Gaussian posteriorgram because of its implicit constraints supplied by broad phonetic posteriors. The Maximum Term Weighted Value (MTWV) for SWS 2013 dataset is improved by 0.052, and 0.051 w.r.t. Gaussian posteriorgram for Mel Frequency Cepstral Coefficients (MFCC) and Perceptual Linear Prediction (PLP), respectively. We found that the proposed mixture of GMMs approach gave consistently better performance than the Gaussian posteriorgram across various evaluation factors, such as different cepstral representations, number of Gaussian components, the number of spoken examples per query, and effect of amount of labeled data used for broad phoneme posterior computation.
机译:本文介绍了用于示例查询口语检测(QbE-STD)的高斯混合模型(GMM)混合物的设计。语音数据控制着声学上相似的宽泛语音结构。为了捕获广泛的语音结构,我们利用广泛的音素类别(例如元音,半元音,鼻音,摩擦音和爆破音)的其他信息来训练GMM。 GMM的混合物与这些广泛音素类别的GMM绑定在一起,即,每个GMM都表示广泛音素类别的概率密度函数(pdf)。期望最大化(EM)算法用于获取每个广泛音素类别的GMM。因此,GMM的混合表示具有广泛语音限制的口头查询。这些约束将后验概率限制在广义分类内,这导致更好的后验图设计。我们工作的新颖之处在于可以进行更好的高斯后验图设计的先验概率分配(作为GMM混合的权重)。所提出的简单而有效的后验图优于高斯后验图,因为它由广泛的语音后代提供了隐式约束。 SWS 2013数据集的最大术语加权值(MTWV)提高了0.052和0.051 w.r.t.梅尔频率倒谱系数(MFCC)和感知线性预测(PLP)的高斯后验图。我们发现,在各种评估因素(例如不同的倒谱表示,高斯分量的数量,每个查询的口语示例的数量以及用于标记的数据量的影响)方面,建议的GMMs混合方法始终提供比高斯后验图更好的性能。广泛的音素后验计算。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号