...
首页> 外文期刊>Electronics and Electrical Engineering >Influence of the Number of Principal Components used to the Automatic Speaker Recognition Accuracy
【24h】

Influence of the Number of Principal Components used to the Automatic Speaker Recognition Accuracy

机译:用于自动扬声器识别精度的主要组件数量的影响

获取原文
获取原文并翻译 | 示例

摘要

It is widely accepted that voice is a behavioral trait that may be used as a biometric characteristic. Systems intended to exploit this property of voice must be able to make a credible representation of the observed voice, and equipped with reliable procedures for automatic speaker recognition [1]. Feature extraction is one of the key components of automatic speaker recognition systems. Mel-Frequency Cepstral Coefficients (MFCCs) and their first and second derivatives are often used as a feature set. The observed speech signals are characterized by their predictive nature. MFCCs may be considered as a direct consequence of discretization of the spectrum envelope, and their derivatives as linear combinations of any two adjacent MFCCs in the infinitesimal time. Therefore, it follows that these features are mutually redundant [2]. On the other hand, the basic problem of achieving a high level of reliability of speaker recognition lies in the fact that training and testing conditions differ. Since each feature is susceptible to environmental interference, the increased complexity of a speaker model (i.e., the increased dimensionality of feature vectors) implies an increased level of noise. From the methodological point of view, to prevent such a level of noise, it is necessary to focus the recognizer on what is important in the observed recognition object. In other words, reduction of the model complexity would reduce the accumulated noise level. These observations motivate possible efficiency improvements of a speaker recognizer. Transformation techniques that are usually applied for dimensionality reduction include Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA) and Nonlinear Discriminant Analysis (NLDA) [3]. All these techniques are intended to train the recognizer to focus on those elements that are essential for the observed data set. Recent work indicates that PCA is a prominent technique for dimension reduction. Conventional methods for PCA based on the full data covariance matrix require a large amount of training data [4]. In order to reduce the complexity of these methods where the eigenvector matrix of each speaker is calculated, methods for PCA that are performed on all the training data [4, 5] (and this paper), or on locally clustered data [6, 7] are introduced. In the domain of speaker recognition, PCA is often applied in Gaussian Mixture Models (GMMs) [4, 7]. In the next section, we introduce the reference environment and the modus of speaker modeling. Then, we consider the feature transformation and training of the model. Finally, we discuss the results of automatic speaker recognition.
机译:广泛认可的是,声音是一种行为特征,可以用作生物特征。打算利用这种声音属性的系统必须能够对所观察到的声音进行可靠的表示,并配备可靠的自动扬声器识别程序[1]。特征提取是自动扬声器识别系统的关键组件之一。熔融频率谱系数(MFCC)及其第一和第二衍生物通常用作特征集。观察到的语音信号的特征在于它们的预测性质。 MFCC可以被认为是光谱包膜离散化的直接后果,以及它们的衍生物作为无限时间内任何两个相邻的MFCC的线性组合。因此,遵循这些特征是相互冗余的[2]。另一方面,实现高度可靠性的扬声器识别的高度可靠性的基本问题在于训练和测试条件不同。由于每个特征易受环境干扰的影响,因此扬声器模型的复杂性增加(即,特征向量的增加的维度)意味着增加的噪声水平。从方法的角度来看,为了防止这种噪音水平,必须将识别器集中在观察到的识别对象中的重要性。换句话说,模型复杂性的降低会降低累积噪声水平。这些观察结果激发了扬声器识别器的可能效率。通常施用于维度还原的转化技术包括主要成分分析(PCA),线性判别分析(LDA)和非线性判别分析(NLDA)[3]。所有这些技术旨在训练识别器,专注于对观察到的数据集是必不可少的元素。最近的工作表明,PCA是尺寸减少的突出技术。基于完整数据协方差矩阵的PCA传统方法需要大量的训练数据[4]。为了降低计算每个扬声器的特征矩阵矩阵的这些方法的复杂性,对所有训练数据进行的PCA方法[4,5](以及本文),或在本地聚类数据上[6,7介绍了。在扬声器识别领域中,PCA通常在高斯混合模型(GMMS)中应用[4,7]。在下一节中,我们介绍了参考环境和扬声器建模模式。然后,我们考虑模型的功能转换和培训。最后,我们讨论了自动扬声器识别的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号