首页> 外文会议>International Symposium on Artificial Intelligence and Signal Processing >A feature extraction method for speech recognition based on temporal tracking of clusters in spectro-temporal domain
【24h】

A feature extraction method for speech recognition based on temporal tracking of clusters in spectro-temporal domain

机译:基于频谱 - 时域中簇的时间跟踪的语音识别特征提取方法

获取原文
获取外文期刊封面目录资料

摘要

In this paper, a novel approach is proposed for secondary feature extraction based on clusters tracking in spectro-temporal domain. Because of high dimensionality of the spectro-temporal features space, this domain is unsuitable for practical speech recognition systems. In order to reduce the dimensions of the feature space, weighted K-means (WKM) clustering technique is applied to spectro-temporal domain. The elements of mean vectors and covariance matrices of clusters are considered as the feature vector of each frame. However the cluster locations change gradually over the time. The main approach is based on the idea that the variations in clusters locations should be temporally tracked frame by frame and the parameters of these variations are considered in the extraction of secondary feature vectors of each speech frame. Several models are used to register the clusters in the new coming frame. In addition, a new architecture is proposed to classify the speech frames by a combining classifier using both tracked and non-tracked secondary features. The assessments were conducted for the proposed feature vectors on classification of several subsets of TIMIT database phonemes. Using tracked secondary feature vectors, the result was improved to 77.4% on voiced plosives classification which was relatively 1.8% higher than the results of non-tracked secondary feature vectors. The results on other subsets showed good improvement in classification rate too.
机译:本文基于光谱颞域域的簇跟踪,提出了一种新的方法。由于光谱时间特征空间的高维度,因此该域不适合实用语音识别系统。为了减少特征空间的尺寸,加权K-means(WKM)聚类技术应用于频谱时间域。簇的平均矢量和协方差矩阵的元素被认为是每个帧的特征向量。然而,群集位置随着时间的推移逐渐变化。主要方法是基于概念,即群集位置的变化应该按帧逐时地跟踪帧,并且在每个语音帧的辅助特征向量的提取中考虑这些变化的参数。几种模型用于在新的即将到来的框架中注册群集。另外,建议使用组合分类器使用跟踪和非跟踪的辅助特征来对语音帧分类语音帧。在拟议的特征向量上进行了评估,就若干时的次数数据库音素分类。使用跟踪的辅助特征向量,浊音涂层分类的结果得到了77.4%,比未跟踪的次要特征向量的结果高相对1.8%。其他子集的结果也显示出良好的分类率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号