首页> 外文学位 >Statistical modeling of heterogeneous features for speech processing tasks.
【24h】

Statistical modeling of heterogeneous features for speech processing tasks.

机译:语音处理任务的异构功能的统计建模。

获取原文
获取原文并翻译 | 示例

摘要

In this dissertation we describe novel approaches for the improvement of several stages of a sequence classification system. We present results on two tasks: speaker verification, the task of deciding whether a test sample corresponds to a certain target speaker; and nativeness classification, the task of deciding whether the speaker found in a test sample is a native speaker of the language he or she is speaking.;In this dissertation, we present a paradigm for transforming sequential features into fixed-length vectors that combines the advantages of generative and discriminative methods. Generative models are used to define the transform, and discriminative methods are used for classification of the resulting transformed observations. A set of prototype distributions is obtained using vector quantization on a labeled held-out set with a distortion measure that aims to minimize the classification error of the resulting transformation. The transform is obtained as the vector of posterior probabilities of the prototypes.;Prosody, the rhythmic and intonational aspect of speech, can be used to help solve many of the speech processing classification tasks. We apply the proposed transform to prosodic features, which present special challenges compared to the standard spectral features usually extracted from speech signals. The vectors resulting from the above transformation are modeled using support vector machines (SVMs). Results for speaker verification and nativeness classification comparing different approaches for the computation of the prototypes are presented. Results show that the optimal method for the extraction of the prototypes is highly dependent on the amount of data present in each sample, the number of samples used to train the SVMs and, possibly, the type of prosodic features being extracted.;Another contribution of the thesis is a general method for modeling prior information within the SVM framework. SVMs can be interpreted as a maximum a posteriori estimation of a model's parameters. In the standard formulation of SVM classification and regression, the prior distribution on the weight vector is implicitly assumed to be a multidimensional Gaussian with zero mean and identity covariance matrix. We relax the assumption that the covariance matrix is the identity matrix, allowing it to be a more general block diagonal matrix. In speaker verification this matrix can be estimated from a set of held-out speakers. We show relative improvements of 10% on the equal error rate of two speaker verification systems when using this method compared to the standard SVM approach.;Prosodic information may be just one of the information sources used to solve a certain speech classification problem. In general, many systems can be trained separately to perform the same classification task using different features or modeling techniques. The output of these individual systems can then be combined to obtain the final score which is then used to make the final decision. In this framework, individual systems are trained independently and their outputs combined by a simple function. In this dissertation, a method for training the individual systems to improve the performance of the final combined score is presented. The SVM objective function is modified to include a term that penalizes large values of a correlation coefficient between the system being trained and a pre-existing system with which the new system will be later combined. The new optimization problem can be transformed into a standard SVM problem with a new kernel that we call the anticorrelation kernel. A 20% relative gain is achieved on a combination of four speaker verification systems by using the proposed method when training the individual systems. (Abstract shortened by UMI.)
机译:在本文中,我们描述了用于改进序列分类系统几个阶段的新颖方法。我们提供两项任务的结果:说话人验证,确定测试样本是否对应于特定目标说话人的任务;以及自然性分类,即确定测试样本中的说话者是否是他或她说的语言的母语者。在本论文中,我们提出了一种将序列特征转换为固定长度向量的范例,该范例将生成和区分方法的优点。生成模型用于定义转换,判别方法用于对所得转换后的观测进行分类。使用矢量量化对带有标记的保持集进行矢量量化,可以得到一组原型分布,该失真度量旨在最小化最终转换的分类误差。变换是作为原型的后验概率的向量而获得的。韵律是语音的节奏和民族方面,可用于帮助解决许多语音处理分类任务。我们将提出的变换应用于韵律特征,与通常从语音信号中提取的标准频谱特征相比,它提出了特殊的挑战。使用支持向量机(SVM)对由上述变换产生的向量进行建模。介绍了说话人验证和本机分类的结果,比较了计算原型的不同方法。结果表明,提取原型的最佳方法高度依赖于每个样本中存在的数据量,用于训练SVM的样本数以及可能提取的韵律特征的类型。本文是在支持向量机框架内建模先验信息的通用方法。 SVM可以解释为模型参数的最大后验估计。在SVM分类和回归的标准公式中,权重向量上的先验分布被隐式假定为具有零均值和同一性协方差矩阵的多维高斯分布。我们放宽了协方差矩阵是恒等矩阵的假设,使其成为更通用的块对角矩阵。在扬声器验证中,可以从一组保持发言者的角度估计此矩阵。与标准的SVM方法相比,使用此方法时,我们证明了两个说话者验证系统的均等错误率相对提高了10%。;韵律信息可能只是用来解决特定语音分类问题的信息源之一。通常,可以使用不同的功能或建模技术对许多系统进行单独培训,以执行相同的分类任务。然后可以将这些单个系统的输出进行组合以获得最终分数,然后将其用于做出最终决策。在此框架中,各个系统都经过独立培训,其输出通过简单的功能进行组合。本文提出了一种训练单个系统以提高最终综合成绩的方法。对SVM目标函数进行了修改,使其包含一个术语,该术语惩罚受训练的系统与先前与之结合的新系统之间的相关系数的较大值。新的优化问题可以通过我们称为反相关内核的新内核转换为标准SVM问题。在训练单个系统时,通过使用建议的方法,在四个扬声器验证系统的组合上可获得20%的相对增益。 (摘要由UMI缩短。)

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号