首页> 外文学位 >New covariance-based feature extraction methods for classification and prediction of high-dimensional data.
【24h】

New covariance-based feature extraction methods for classification and prediction of high-dimensional data.

机译:基于协方差的新特征提取方法,用于高维数据的分类和预测。

获取原文
获取原文并翻译 | 示例

摘要

When analyzing high dimensional data sets, it is often necessary to implement feature extraction methods in order to capture relevant discriminating information useful for the purposes of classification and prediction. The relevant information can typically be represented in lower-dimensional feature spaces, and a widely used approach for this is the principal component analysis (PCA) method. PCA efficiently compresses information into lower dimensions; however, studies indicate that it is not optimal for feature extraction especially when dealing with classification problems. Furthermore, for high-dimensional data having limited observations, as is typically the case with remote sensing data and nonstationary data such as financial data, covariance matrix estimation becomes unreliable, and this adversely affects the representation of data in the PCA domain. In this thesis, we first introduce a new feature extraction method called summed component analysis (SCA), which makes use of the structure of eigenvectors of the common covariance matrix to generate new features as sums of certain original features. Secondly, we present a variation of SCA, known as class summed component analysis (CSCA). CSCA takes advantage of the relative ease of computing the class covariance matrices and uses them to determine data transformations. Since the new features consist of simple sums of the original features, we are able to gain a conceptual meaning of the new representation of the data which is appealing for man-machine interface. We evaluate these methods on data sets with varying sample sizes and on financial time series, and are able to show improved classification and prediction accuracies.
机译:在分析高维数据集时,通常有必要实施特征提取方法,以捕获可用于分类和预测目的的相关区分信息。相关信息通常可以在低维特征空间中表示,为此,一种广泛使用的方法是主成分分析(PCA)方法。 PCA有效地将信息压缩到较低的维度;然而,研究表明,对于特征提取而言并不是最佳选择,特别是在处理分类问题时。此外,对于观测数据有限的高维数据(通常是遥感数据和非固定数据,例如金融数据),协方差矩阵估计变得不可靠,这会对PCA域中的数据表示产生不利影响。在本文中,我们首先介绍了一种新的特征提取方法,称为求和成分分析(SCA),该方法利用公共协方差矩阵的特征向量的结构来生成新特征,作为某些原始特征的总和。其次,我们提出了SCA的一种变体,称为类总和成分分析(CSCA)。 CSCA充分利用了计算类协方差矩阵的相对简便性,并使用它们来确定数据转换。由于新功能由原始功能的简单总和组成,因此我们可以获得对数据新表示形式的概念含义,这对于人机界面很有吸引力。我们在具有不同样本量的数据集和财务时间序列上评估这些方法,并且能够显示出改进的分类和预测准确性。

著录项

  • 作者

    Sofolahan, Mopelola A.;

  • 作者单位

    Purdue University.;

  • 授予单位 Purdue University.;
  • 学科 Engineering Electronics and Electrical.;Economics Finance.
  • 学位 Ph.D.
  • 年度 2013
  • 页码 93 p.
  • 总页数 93
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号