...
首页> 外文期刊>Engineering Applications of Artificial Intelligence >A clustering based feature selection method in spectro-temporal domain for speech recognition
【24h】

A clustering based feature selection method in spectro-temporal domain for speech recognition

机译:光谱时域中基于聚类的语音识别特征选择方法

获取原文
获取原文并翻译 | 示例
           

摘要

Spectro-temporal representation of speech has become one of the leading signal representation approaches in speech recognition systems in recent years. This representation suffers from high dimensionality of the features space which makes this domain unsuitable for practical speech recognition systems. In this paper, a new clustering based method is proposed for secondary feature selection/extraction in the spectro-temporal domain. In the proposed representation, Gaussian mixture models (GMM) and weighted K-means (WKM) clustering techniques are applied to spectro-temporal domain to reduce the dimensions of the features space. The elements of centroid vectors and covariance matrices of clusters are considered as attributes of the secondary feature vector of each frame. To evaluate the efficiency of the proposed approach, the tests were conducted for new feature vectors on classification of phonemes in main categories of phonemes in TIMIT database. It was shown that by employing the proposed secondary feature vector, a significant improvement was revealed in classification rate of different sets of phonemes comparing with MFCC features. The average achieved improvements in classification rates of voiced plosives comparing to MFCC features is 5.9% using WKM clustering and 6.4% using GMM clustering. The greatest improvement is about 7.4% which is obtained by using WKM clustering in classification of front vowels comparing to MFCC features.
机译:语音的频谱时态表示已成为近年来语音识别系统中主要的信号表示方法之一。该表示遭受特征空间的高维的困扰,这使得该域不适用于实际的语音识别系统。本文提出了一种新的基于聚类的光谱时域特征选择/提取方法。在提出的表示中,将高斯混合模型(GMM)和加权K均值(WKM)聚类技术应用于光谱时域,以减小特征空间的尺寸。聚类的质心向量和协方差矩阵的元素被视为每个帧的次要特征向量的属性。为了评估该方法的有效性,针对TIMIT数据库中主要音素类别中音素分类的新特征向量进行了测试。结果表明,通过使用拟议的次要特征向量,与MFCC特征相比,不同音素集的分类率显着提高。与WMF聚类相比,与MFCC特征相比,语音爆破音的分类率平均提高了5.9%,而使用GMM聚类则达到了6.4%。与MFCC特征相比,通过在前元音分类中使用WKM聚类获得的最大改进约为7.4%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号